TitleDiscovering simple DNA sequences by the algorithmic significance method.
Publication TypeJournal Article
Year of Publication1993
AuthorsMilosavljević A, Jurka J
JournalComput Appl Biosci
Volume9
Issue4
Pagination407-11
Date Published1993 Aug
ISSN0266-7061
KeywordsAlgorithms, Base Sequence, DNA, Satellite, Humans, Molecular Sequence Data, Pattern Recognition, Automated, Repetitive Sequences, Nucleic Acid, Sequence Analysis, DNA, Tissue Plasminogen Activator
Abstract

A new method, 'algorithmic significance', is proposed as a tool for discovery of patterns in DNA sequences. The main idea is that patterns can be discovered by finding ways to encode the observed data concisely. In this sense, the method can be viewed as a formal version of the Occam's Razor principle. In this paper the method is applied to discover significantly simple DNA sequences. We define DNA sequences to be simple if they contain repeated occurrences of certain 'words' and thus can be encoded in a small number of bits. Such definition includes minisatellites and microsatellites. A standard dynamic programming algorithm for data compression is applied to compute the minimal encoding lengths of sequences in linear time. An electronic mail server for identification of simple sequences based on the proposed method has been installed at the Internet address pythia/anl.gov.

DOI10.1093/bioinformatics/9.4.407
Alternate JournalComput. Appl. Biosci.
PubMed ID8402207