Delila Program: palinf

palinf program

Documentation for the palinf program is below, with links to related programs in the "see also" section.

{   version = 2.37; (* of palinf.p 2013 Jul 25}

(* begin module describe.palinf *)
(*
name
      palinf: find palindromes, based on information theory

synopsis
      palinf(book: in, palinfp: in,
             fout: out, palinfeatures: out, output: out)

files

      book: a book from the Delila system

      palinfp: parameters to control palinf, one per line

         1. The minimum rsequence of the palindrome to detect.
            alternatively, if the number is negative, it is the
            desired significance of the detected peaks, given in
            standard deviations.

         2. (Optional) size (integer).  The largest size palindrome allowed;
            base pairs across both halves of the site.  if omitted, the
            entire sequence is used (which may be very expensive).
            if this number is even, the next higher odd number will be used.

         3. (Optional) If the first character of this line is an 'm' then
            palinf will plot palindrome size (m) versus information content
            (rsequence).  A sharply rising curve indicates a good palindrome.
            'x' means plot position (x) versus information content (rsequence).
            a different character, such as 'n', means to list
            the detected palindromes.

      fout: Locations of palindromes.

         In the m mode, the coordinate location of significant palindromes
         (ie ones that passed the criterian) is given followed by a graph
         that shows the structure and significance of the palindrome from
         center to the outside:

 at position   725
                                   1                   2                   3
   m even  odd<0.1.2.3.4.5.6.7.8.9.0.1.2.3.4.5.6.7.8.9.0.1.2.3.4.5.6.7.8.9.0
   1 -0.5 -0.5=. 1 2 3 4 5         .         .         .         .         .
   2  1.0  1.0 . = 2 3 4 5         .         .         .         .         .
   3  2.5  0.5 .o 1 e2  3. 4  5    .         .         .         .         .
   4  4.0  0.0 o  1  2 e3. 4  5    .         .         .         .         .
   5  5.5 -0.5o.   1   2 .e3   4   5         .         .         .         .
   6  7.0 -1.0o.   1   2 . 3 e 4   5         .         .         .         .
   7  8.5  0.5 .o   1    2    3 e  4    5    .         .         .         .
   8  8.0  2.0 .   o1    2    3e   4    5    .         .         .         .
   9  7.5  3.5 .    1 o  2    e    4    5    .         .         .         .
  10  7.0  3.0 .    1o   2   e3    4    5    .         .         .         .
  11  6.5  4.5 .     1  o. 2e    3 .   4     5         .         .         .
  12  6.0  6.0 .     1   . =     3 .   4     5         .         .         .
  13  7.5  7.5 .     1   . 2  =  3 .   4     5         .         .         .
  14  7.0  9.0 .     1   . 2 e   o .   4     5         .         .         .
  15  8.5 10.5 .      1  .   2  e  .o      4 .    5    .         .         .
  16  8.0 12.0 .      1  .   2 e   .3  o   4 .    5    .         .         .
  17  9.5 13.5 .      1  .   2    e.3     o4 .    5    .         .         .
  18  9.0 15.0 .      1  .   2   e .3      4 o    5    .         .         .
  19  8.5 16.5 .       1 .     2e  .   3     . 4o      5         .         .
  20  8.0 18.0 .       1 .     e   .   3     . 4   o   5         .         .
  21  7.5 19.5 .       1 .    e2   .   3     . 4      o5         .         .
  22  7.0 21.0 .       1 .   e 2   .   3     . 4       5 o       .         .
  23  6.5 22.5 .       1 .  e  2   .   3     . 4       5    o    .         .
  24  8.0 22.0 .       1 .     e   .   3     . 4       5   o     .         .
  25  7.5 21.5 .        1.    e  2 .      3  .     4   .  o 5    .         .
  26  7.0 21.0 .        1.   e   2 .      3  .     4   . o  5    .         .
  27  8.5 20.5 .        1.      e2 .      3  .     4   .o   5    .         .
  28  8.0 20.0 .        1.     e 2 .      3  .     4   o    5    .         .
  29  7.5 19.5 .        1.    e  2 .      3  .     4  o.    5    .         .
  30  7.0 21.0 .        1.   e   2 .      3  .     4   . o  5    .         .
  31  6.5 20.5 .         1  e      2         3         4o        5         .
  32  6.0 22.0 .         1 e       2         3         4   o     5         .
  33  7.5 23.5 .         1    e    2         3         4      o  5         .
  34  7.0 25.0 .         1   e     2         3         4         o         .
  35  6.5 24.5 .         1  e      2         3         4        o5         .
 at   725           25.0 bits

      The horizontal axis is in bits, the vertical axis is in bases.  The
      numbers are the standard deviations.  With this chart one can determine
      the significance of each palindrome.  Clearly there is a strong (nearly
      standard deviations) odd palindrome at coordinate 725.

      In the x mode, the sequence is given:

                                     1                   2
    x bp even  odd<0.1.2.3.4.5.6.7.8.9.0.1.2.3.4.5.6.7.8.9.0.1.2.3.4.5.6.7.8
     2 a -0.5  1.5e. 1o2 3 4 5         .         .         .         .
     3 c -0.5 -0.5=.  1  2  3. 4  5    .         .         .         .
     4 a -0.5  3.0e.  1  o  3. 4  5    .         .         .         .
     5 g -0.5  1.5e.  o1   2 . 3   4   5         .         .         .
     6 t -0.5  1.5e.  o1   2 . 3   4   5         .         .         .
     7 a  1.5  1.5 .  = 1    2    3    4    5    .         .         .
     8 a -0.5  3.5e.    1 o  2    3    4    5    .         .         .
     9 g  0.5 -0.5o.e   1    2    3    4    5    .         .         .
    10 a -0.5  1.5e.  o 1    2    3    4    5    .         .         .
    11 c -0.5  1.5e.  o  1   . 2     3 .   4     5         .         .
    12 g  3.0  1.5 .  o  e   . 2     3 .   4     5         .         .
    13 g  6.5 -0.5o.     1   . 2e    3 .   4     5         .         .

      Here the horizontal axis is again in bits, but the vertical
      axis is the location on the sequence (which is why the bp column
      shows the bases).

      In the n mode, only a summary of the palindrome locations is provided:

           even      odd  palindromes
 at   537           21.0 bits
 at   547 24.5 bits
 at   707           22.5 bits
 at   725           25.0 bits
 at  1101 21.0 bits
 at  1180 24.0 bits
 at  1279           21.0 bits
 at  1322 24.5 bits

      palinfeatures: The locations of palindromes in the features format that
      the lister program uses.  Pass these to lister and the palindrome will
      be drawn on your sequence listing.

      The format that the features are listed is:

define "odd60.K00042" "-" "(((|)))" "(((|)))" -3 -2 -1 0 1 2 3
@ K00042 60 +1 "odd60.K00042" " 4.5 bits"

define "even547.K00042" "-" "((()))" "((()))" -3 -2 -1 0 1 2
@ K00042 547 +1 "even547.K00042" " 4.5 bits"

      output: messages to the user.

description

      Each piece of the book is searched for imperfect palindromes with
      significance determined by the first parameter in palinfp.  There are
      two kinds of palindrome: even and odd, refering to the size of the
      palindrome in bases.  An odd palindrome will have a central base, while
      an even one will not have one.  Method of use:  search without the 'm'
      option to pick out sites of interest.  Then use 'm' under 'stringent
      conditions' or on a smaller fragement to see the structure of the
      palindrome.  The final r value will be the maximum of r values for all
      smaller palindromes.  Note: equiprobable compositions are assumed for
      e(hnb).

      Theory:
      When there are a large number of sequences, the information needed to
      chose one of the 4 bases is log2(4) = 2 bits.  In contrast, for only
      two sequences (n = 2), the information measure is severely biased.
      This reflects the statistical likelyhood of finding matches.  One
      quarter of the time two randomly chosen bases will match.  In
      information theory terms, this means that a match counts only as 0.75
      bits (see reference Schneider1986 appendix figure A2).  So, for
      example, the restriction site for EcoRI, GAATTC is 6 x 2 = 12 bits when
      taken from many examples of the site (as when EcoRI binds).  However,
      as a single sequence, it only counts as 6 x 0.75 = 4.5 bits.  This
      effect prevents one from identifying spurious palindromes, but it is,
      unfortunately, not intuitive.

      To avoid duplicate definitions as much as possible, the names now
      include the piece name in which the palindrome is found.

examples

      The parameters

21 positive: bits minimum to find; negative: st.dev out to find
71 largest size palindrome to find (measured from center to edge in bases)
m  n=indicate detected palindromes; x=show sequence; m=show palindromes
   palinfp: parameters for palinf

      will locate the E. coli lac operator uniquely in the 401 bases
      surrounding the start of the lacZ transcript.

      The inverted repeats of pSC101 in GenBank K00042 are located with the
      same [13/35/m] parameters at coordinates 707 and 725.  (Other things
      are found as well, they have been ignored in the literature because
      they don't match the inverted repeats.)

      The parameters [4.5/6/n] will locate 6 base palindromes.

documentation

      Schneider, T.D., G.D. Stormo, L. Gold and A. Ehrenfeucht (1986)
      The information content of binding sites on nucleotide sequences.
      J. Mol. Biol. 188: 415-431.

see also

   Example parameter file: palinfp

   Program to display the palindrome features: lister.p

author

      Thomas D. Schneider and Karen A. Lewis

bugs

      If parameter 2 is very large, spurious sites will be found.

technical notes

      Limiting the size of the palindrome will increase the search speed.

*)
(* end module describe.palinf *)
{This manual page was created by makman 1.45}


{created by htmlink 1.62}