The procedure for creating the DNA motif discovery algorithm is shown below:
Obtain large data sets containing DNA sequences. Many of the sequences used for analysis were obtained from the ENCODE (Encyclopedia of DNA Elements) project. These data sets were in the FASTA file format.
Parse the data contained within the FASTA files using Python (a computer programming language). This was achieved through the use of Biopython, a library of tools for biological computation.
Try different approaches to DNA motif discovery such as comparing a biological sequence with a random sequence to find unique motifs.
Refine the algorithm in order to produce more accurate results e.g. using the binomial test for statistical significance by finding the p-value.
Test the algorithm of a wide range of experimentally obtained sequences for its speed and accuracy.