a open book malign: Fast multiple alignment of ungapped DNA sequences using information theory and a relaxation method

@article{Schneider.Mastronarde1996,
author = "T. D. Schneider
 and D. N. Mastronarde",
title = "{Fast} multiple alignment of ungapped {DNA}
sequences using information theory and a relaxation method",
journal = "Discrete Applied Mathematics",
volume = "71",
pages = "259--268",
note = "
\htmladdnormallink
{https://alum.mit.edu/www/toms/papers/malign}
{https://alum.mit.edu/www/toms/papers/malign},
\htmladdnormallink
{http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2785095/}
{http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2785095/},
\htmladdnormallink
{https://doi.org/10.1016/S0166-218X(96)00068-6}
{https://doi.org/10.1016/S0166-218X(96)00068-6}",
pmid = "19953199",
pmcid = "PMC2785095",
year = "1996"}

The malign program, a part of the Delila system of programs.

The basic idea of the algorithm implemented by malign is quite simple: maximimize the information in the alignment of DNA sequnences to align them. Each sequence is extracted and then slid relative to the other sequences. The maximum information positions are noted and one is chosen randomly. An alignment can be considered to be a vector of numbers. The process is repeated and the alignment vectors are collected as a population. One may then inspect the highest information containing sequences using the malin program.

This algorithm works quite well for binding sites, since they can be considered rigid, but we never found a good way to extend it to sequences that have gaps. Introducing gaps makes the number of combinations explode and we don't know a good algorithm which isn't biased. Such an algorithm may exist but one would have to get rid of all the biased ones first.

2003 Aug 28: Erratum. Thanks to Chengpeng (Charlie) Bi for alerting me to an error, which is corrected in the current version.

color bar Small icon for Theory of Molecular Machines: physics,
chemistry, biology, molecular biology, evolutionary theory,
genetic engineering, sequence logos, information theory,
electrical engineering, thermodynamics, statistical
mechanics, hypersphere packing, gumball machines, Maxwell's
Daemon, limits of computers


Schneider Lab

origin: 1998 April 1
updated: 2021 Jul 14
color bar