PITX1 motif 10 273 DNA sequences which were randomly sampled from the embryonic hind-limbs of a mouse, at an estimated 11.5 days of development. The motif which was desired was the “GGATTA” motif, however that was only ranked 18th. On the other hand, the top motif, which was “CAGCTG”, seemed to have many variations within the top 20 ranked motifs. When researched further, this was discovered to be the “E-box” motif (a bHLH — Basic Helix-Loop-Helix transcription factor). This was confirmed by the findings in previous scientific papers proving protein-protein interactions between these two motifs (Gino Poulin et al., 2000).
Upon discovering the connection between the E-box and PITX1 motifs, an attempt was made to prove the co-occurrence of these two motifs, potentially finding a specific offset. However, when the data was analyzed to find co-occurring sequences, it was statistically insignificant as proven by the hypergeometric test. Therefore, a new model of protein-protein interactions was proposed:
In this model, the DNA strand is bent to allow the PITX1 and bHLH motifs to interact. Interestingly, this model has also been proposed by a previous scientific paper (Gino Poulin et al., 2000):
C-Jun Motif The algorithm was able to identify “GAGTCA” which is very prominent in c-jun. The sample sequence is obtained from a human embryonic stem cell.
RUNX3 Motif The desired motif was “ACCACA” which was ranked the highest by the algorithm, proving its accuracy. The sequences analyzed were taken from a human lymphocyte cell infected with the Epstein-Barr virus to induce the expression of RUNX3.
STAT1 Motif The STAT1 transcription factor is essential to the cell cycle. The algorithm was not confident in finding the motif and did not find the desired STAT1 motif, instead ranking “GGGCGG” as the most significant motif. This may be due to the fact that the STAT1 motif’s middle position is not certain (it can vary between the four nitrogenous bases) meaning that it would likely be better to find 4-mers for this motif.
NANOG Motif The NANOG motif is involved in embryonic stem cell proliferation (differentiation) and self-renewal, making it essential to pluripotency of the cell. This sample sequence is taken from a human embryonic cell. The highest ranked motif by the algorithm’s analysis was “CAGCAG” which is very significant in the NANOG motif, proving the algorithm’s accuracy.