Delila Program: diffribl

diffribl program

Documentation for the diffribl program is below, with links to related programs in the "see also" section.

{   version = 1.28; (* of diffribl.p 2005 Jun 3}

(* begin module describe.diffribl *)
(*
name
   diffribl: calculate the difference between two ribls

synopsis
   diffribl(ribla: in, riblb: in, diffriblp: in,
            posdiff: out, drxyin: out, output: out)

files

   ribla:  The ribl output of the Ri program for the first of 2
   compared ribls

   riblb:  The ribl output of the Ri program for the second of 2
   compared ribls

   posdiff: An output file which can be used with the xyplo program. This
            file lists all the difference or distance values for each
            position. The columns are as follows:

            1. This coordinate is the relative position of ribl A.

            2. This value is the sum of differences or distances at
            that position

           This file is only useful for the non-scrolling calculations.

   diffriblp:  parameters to control the program.  The file must contain the
      following parameters, one per line:

      parameterversion: The version number of the program.  This allows the
         user to be warned if an old parameter file is used.

      range of calculation (char OR char with integers):
         This allows for the user to specify the range of the matrix.  The
         user can use the range of the matrix by using an 'r'.  The user can
         specify their own range by using a 'u' and then the desired range.
         The user range must be smaller than the range of the ribl.

      scrolling function (char OR char with integers):
         The matrix can be scrolled over itself by using 'v' and the range of
         the scroll.  To not use the scrolling function, use 'n'.

      calctype: type of calculation (char):
         The user can use one of several types of calculations.
         In many cases, the units reported are in bits.

        (e) The first, specified by "e" is a measurement of the Euclidean
            distance between two positions in two matrices.  This is done
            with the following equation:

            Positional distance = square root( (A1 - A2)^2 + (C1 - C2)^2 +
                                               (G1 - G2)^2 + (T1 - T2)^2 )

            This positional distance is then summed for all positions giving
            the total sum of positional distances.

            When "e" is used with the scrolling function, it calculates only
            for the overlapping part of the matrices.  This feature can be
            used with both symmetric and asymmetric models.


 (o) The second, specified by "o", is a measurement of the Euclidean
     distance between two matrices.   As opposed to the calculation
     done in "e", this treats each matrix as a point in 4^(l)
     dimensional space.

            Since there is only one point, in this case there are no positional
            differences and so the values in posdiff are reported the
            same as for the "e" option.

 (d) The third, specified by "d", is a measurement of difference
     between the two matrices.  This is done with the following
     equation:

            Positional difference = (A1 - A2) + (C1 - C2) +
                                    (G1 - G2) + (T1 - T2)

     This positional difference is then summed for all positions
     giving the total sum of positional differences.

     When "d" is used with the scrolling function, it calculates only
     for the overlapping part of the matrices.  This feature cannot be
     used with an asymmetric model.

 (s) The fourth, specified by "s", is a measurement of the
            average response the ribla should make to passing across
            sites in riblb.  This is computed as:
               $\sum_l \sum_b f_b(b,l-offset)*Ri_a(b,l)$

            (This is LaTeX typesetting notation, \sum is sum;
            "_" means subscript.)

            NOTE: the frequency f is computed from the number of bases
            at the given position.

        (z) The fifth, specified by "z", is a measurement of the three
            dimentional distance between the two matrices,
            following Zhang.Zhang1991a and Zhang.Zhang1991b.

            Base frequencies are computed from the ribl data file.
            Then each set of frequencies, a, c, g, and t, for which,
            by definition,

            a + c + g + t = 1                                      (1)

            can be represented in three dimensions as:

            x = (a+g) - (t+c) = 2(a+g) - 1                         (2)
            y = (a+c) - (t+g) = 2(a+c) - 1
            z = (a+t) - (g+c) = 2(a+t) - 1

            These are three independent variables defined by Zhang.
            They map into a tetrahedron in three dimensions.

            Zhang consideres the above to be a 'reduced'
            coordinate system.  The non-reduced system is:

            X = [sqrt(3)/4] x                                      (3)
            Y = [sqrt(3)/4] y
            Z = [sqrt(3)/4] z

            The positional distance is calculated as:

            positional distance = sqrt (   (X2-X1)^2               (4)
                                         + (Y2-Y1)^2
                                         + (Z2-z1)^2 )

            where sqrt is the square root.

            From Zhang.Zhang1991a (page 46 and 47), this simplifies
            to:

            positional distance = [sqrt(3)/2]                      (5)

                                  * sqrt( (a2-a1)^2 + (g2-g1)^2 +
                                          (c2-c1)^2 + (t2-t1)^2  )

            where the (a1, g1, c1, t1), (a2, g2, c2 and t2) are
            probabilities of two different matrices at one position.

            This positional distance is then summed for all positions giving
            the total sum of positional distances.

            Because frequencies sum to 1, there really are only three
            independent degrees of freedom and therefore only three
            dimensions.  So equations for three and four dimensions
            give the same results.

        (y) The seventh, specified by "y", is computed as "z" and then
            it is normalized by the maximum possible distance between
            points in the tetrahedrons.

            With frequencies, the largest distance in the Zhang
            tetrahedron is sqrt(3/2), along the edge.  For L
            positions, the largest possible distance is therefore
            sqrt(3/2)L.  All values are divided by this maximum.

            The following shows that the maximum distance in the
            non-reduced coordinate system is sqrt(3/2).  Using
            equations (2) and (3), for the case of all G, the point is
            at

                (X, Y, Z) = (sqrt(3)/4, -sqrt(3)/4, -sqrt(3)/4)

            while for all C, the point is

                (X, Y, Z) = (-sqrt(3)/4, sqrt(3)/4, -sqrt(3)/4)

            The distance between these points is sqrt(3/2).

  drxyin:  This gives the total sum of distance or difference, depending on
    which calculation function is being used.  When the scrolling
    function is being used, it will report the total sum value along
    with the position of the scroll.  The position of the scroll is
    the distance between the zero coordinates of the two matrices.

   output: messages to the user

description

   This program looks at the differences in two individual information
   weight matricies (ribls) by finding the difference in information
   at each position, for each base, and then summing the differences.
   Then all of the differences at each position are summed to express
   a diffribl value.

   Actually, the program now has a number of other ways of comparing
   the ribls, depending on a user parameter.

examples

examples of diffriblp

1.24        version of diffribl that this parameter file is designed for.
r u -10 +10 r use the from/to coords from ribl, u means use user specified
n v -21 +21 v and coords=move riblB across riblA for the range, n=no
eodszy      e:Euclid, o:Euclid4d, d:difference, s:scan, z:Zhang, y:znorm

documentation

@article{Shultzaberger.Schneider2001,
author = "R. K. Shultzaberger
 and R. E. Bucheimer
 and K. E. Rudd
 and T. D. Schneider",
title = "{Anatomy of \emph{Escherichia coli}
Ribosome Binding Sites}",
journal = "J. Mol. Biol.",
volume = "313",
pages = "215-228",
comment = "Shultzaberger.Schneider.flexrbs",
note = "\htmladdnormallink
{https://alum.mit.edu/www/toms/paper/flexrbs/}
{https://alum.mit.edu/www/toms/paper/flexrbs/}",
year = "2001"}

see also

   example parameter file: diffriblp

   Description of use is in Shultzaberger.Schneider2001:
   https://alum.mit.edu/www/toms/paper/flexrbs/

   program that generates ribls:  ri.p
   program that uses ribls to find sites: scan.p
   graphics program for xyin: xyplo.p
   source of program modules: lister.p

author

   Ryan Shultzaberger
   Thomas D. Schneider
   Zehua Chen

bugs

   There is a problem with comparing different sized ribls.  I (Ryan?)
   need to fix this.  For now, only use this program with same sized
   ribls.  The result will be wrong if done otherwise.

   Comparisons in 4 dimensional space are not appropriate because the
   4 probabilities are not independent.  To avoid this, one can
   replace the 4 dimensional space with a 3 dimensional one according
   to Zhang's methods:

@article{Zhang.Zhang1991a,
author = "C.-T. Zhang
 and R. Zhang",
title = "Diagrammatic representation of the distribution of {DNA}
bases and its applications",
journal = "Int. J. Biol. Macromol.",
volume = "13",
pages = "45-49",
note = "tetrahedron method",
year = "1991"}

@article{Zhang.Zhang1991b,
author = "C.-T. Zhang
 and R. Zhang",
title = "Analysis of distribution of bases in the coding sequences
by a diagrammatic technique",
journal = "Nucleic Acids Res.",
volume = "19",
pages = "6313-6317",
note = "tetrahedron method",
year = "1991"}

@article{Zhang1997,
author = "C.-T. Zhang",
title = "A Symmetrical Theory of {DNA} Sequences and Its Applications",
journal = "J. Theor. Biol.",
volume = "187",
pages = "297-306",
year = "1997"}

   To do this, the Ri can be converted to probabilities according to
   Ri = 2 + log2(Pi).  Then the probabilities are converted to the
   Zhang XYZ space.  Distances are then measured in that XYZ space.
   However it is better to use the Pi directly from the ribl file.

technical notes

*)
(* end module describe.diffribl *)
{This manual page was created by makman 1.45}


{created by htmlink 1.62}