Delila Program: scan

scan program

Documentation for the scan program is below, with links to related programs in the "see also" section.

{   version = 3.67; (* of scan.p 2017 Aug 07}

(* begin module describe.scan *)
(*
name
   scan: scan a book with an Ribl weight matrix and generate a vector

synopsis
   scan(book: in, ribl: in, scanp: in,
        data: out, scanfeatures: out, scaninst: out, output: out)

files

   book: a book from the delila system

   ribl: a weight matrix from sites or ri programs.
      Lines that start with * are notes.  the next line contains the matrix
      FROM-TO coordinates, this is followed by the matrix in the order A, C, G,
      T from FROM to TO.

   scanp: parameters to control the program.

      parameterversion: the version number of the program.  This allows the
         user to be warned if an old parameter file is used.

      seqs: One integer on the first line is the number of sequences to scan
         to produce the vector.  0 = none, positive = that number; negative =
         all.

      Ri range : Two real numbers on the second line give the range of
         information content to report in the data file.

      Z score range: Two real numbers on the third line give the range of the
         Z score to report in the data file.  A negative sign will be
         converted to a positive sign so that this parameter limits the range
         of acceptable sites to an interval on the real line.  Note: normally
         one would want the lower number to be zero.

      Probability range: Two real numbers on the fourth line give the range
         of probability to report in the data file.  The probability of a
         site is determined from the mean and standard deviation of the Ri
         distribution.  Note: normally one would want the lower number to be
         zero.

      fromwanted towanted range: two integers that define the FROM-TO range
         of the ribl matrix to use for computations.  This is independent of
         the range displayed in the walker.

      ways:  One integer.  2 means scan both the sequence and its
         complement.  1 means simply scan the sequence.  0 means to let the
         program figure it out.  The Ri program determines the symmetry of
         the matrix.  If it is symmetrical, it will only scan one way.  If it
         is asymmetrical, both scans are done.

      sitedefinition:  If the first non-blank character on the line is 'd',
         then the rest of the line contains a definition of how to write out
         the sites.  If no site is defined, the scanfeatures file will not be
         written to.  See program lister.p for details.  The basic format for
         an ASCII definition looks like this:

         define "Fis" "-" "[0]" "[0]" -7  0 +7

         For a walker it looks like:

         define "Fis" " w" "  " "  " -7 +7

         NOTE: the range for walker display (given in this site definition)
         is independent of the range of the weight matrix used for
         computation (given in the fromwanted and towanted parameters).

      print definitions:  Any number of lines that define how to print the
         "other" feature string in each feature definition.  The data that may
         be printed are the same as those in the data file.  They are:

         #           width
         length      width
         name        width
         coordinate  width
         orientation width
         Ri          width decimal
         Z           width decimal
         probability width decimal
         string      "quote string"
         .           end of print definitions

         If the first character on a line is '#', the line defines the
         width for the coordinate of the number of the DNA piece from the book.

         If the first character on a line is 'l' or 'L', the line defines the
         width for the length of the DNA piece in the book.

         If the first character on a line is 'n' or 'N', the line defines the
         width for the name of the DNA piece in the book.

         If the first character on a line is 'f' or 'F', the line defines the
         width for the fullname of the DNA piece in the book.

         If the first character on a line is 'c' or 'C', the line defines the
         width for the coordinate of the zero base of the site.

         If the first character on a line is 'o' or 'O', the line defines the
         width for the orientation of the site.  If the width is 1, the
         orientation is given as + or -, if ithe width is larger the
         orientation is given as -1 or +1.

         If the first character on a line is 'r' or 'R', the line defines the
         width and decimal fields for the individual information in bits.  The
         word "bits" is attached to the end of the string.

         If the first character on a line is 'p' or 'P', the line defines the
         width and decimal fields for the probability of the site.

         If the first character on a line is 'z' or 'Z', the line defines the
         width and decimal fields for the Z score of the site.

         If the first character is 's' or 'S' then the line defines a string to
         insert.

         The end of the file or a period "." ends the print definitions.

         The lines may be put in any order and this defines the order that they
         will be printed to the "other" string.  If the first character is not
         found (as, for example by having a blank in front of it), the
         corresponding data will not be printed.  This gives the user full
         control of the "other" string contents.

         The only kind of definition that may be repeated is the "string".
         This allows the user to put whatever they desire between the data
         items.

      file output definitions:  The first three characters on the line define
         which files will be output.  Capital characters turn on the output.
         Small characters turn it off.  The files are data, (scan)features,
         and (scan)inst so the characters are d, f and i, respectively.  Thus
         DfI turns on the data and scaninst files and leaves the scanfeatures
         off.  (Unidentified characters default to upper case.)

      normalizeRi:  The first character is defines how to normalize
         the reported Ri values.  The Ri value at coordinate zero
         is called Ri0.

         n: normal: scan and report Ri

         s: subtract: compute Ri(l) - Ri(0) at each position l

         d: divide: compute Ri(l) / Ri(0) at each position l

         The s and d modes are usually to be used in conjunction with
         renumbering by Delila (the 'default coordinate zero' command).

      instfrom, instto: range of Delila instructions produced in scaninst
         if that file is created.

   data: The results.  Comments are lines that begin with '*'.  The columns are
      defined in comments in the file.  The matrix is searched over both the
      sequence and its complement.  Ri is reported, as is the Z and probability
      based on the mean and st.dev.

   scanfeatures: The results in the "features" format for input to the
      lister program.  This consists of comment lines (beginning with "*"),
      definition lines (as shown above), and features of the form:

      @ K01789 229.0 -1 "dnaA" "+12.2 bits " 12.200338    -0.473212     0.318031

      See program lister.p for details.

   scaninst:  The results are given in the form of delila instructions:
      name "dnaA"; piece K01789; get from 229 -100 to 229 +100 direction -;

   output: messages to the user

description
   The Ri(b,l) weight matrix is scanned across the sequences in the book to
   produce a vector.

examples

   Example scanp files:

3.00    version of scan that this parameter file is designed for.
-1      number of seqs to scan 0 = none, positive = that number; negative = all
0       information content at or above which to report in the data file.
100     Z score below which to report in the data file
0       probability at or above which to report in the data file.
-10 +10 desired region of the ribl weight matrix to use
0       0: program figures it out; 1: one way scan; 2: two way scan.
define "Fis" "-" "[0]" "[0]" -7  0 +7
string "data at:" string: A string listed at the feature
coordinate 5
string " Ri = "   string: A string listed at the feature
Ri 5 1  Riwidth Ridecimal: character places for reporting bits to scanfeatures
string " Z = "    string: A string listed at the feature
Z 4 1   z score
string " p = "    string: A string listed at the feature
probability 5 2
.       end of print definitions
DFI     dfi: data, features, inst: files output
n       normalizeRi: n: normal, s: Ri(l)-Ri(0), d: Ri(l)/Ri(0)
-50 +50 instfrom, instto: range to make the scaninst file (if made)
   scanp: parameters to control the program.

3.00    version of scan that this parameter file is designed for.
-1      number of seqs to scan 0 = none, positive = that number; negative = all
0   100 information content at or above which to report in the data file.
0   100 Z score below which to report in the data file
0   1   probability at or above which to report in the data file.
-10 +10 desired region of the ribl weight matrix to use
1       0: program figures it out; 1: one way scan; 2: two way scan.
define "Fis" " w" "  " "  " -10 10
 string "@"
 coordinate 4
 string "|Ri="
Ri 4 1  Riwidth Ridecimal: character places for reporting bits to scanfeatures
string " bits"
 string "|Z="
 Z 7 4   z score
 string "|p="
 probability 5 3
.  end of print definitions
dFi     dfi: data, features, inst: files output
n       normalizeRi: n: normal, s: Ri(l)-Ri(0), d: Ri(l)/Ri(0)
-50 +50 range to make the scaninst file (if made)

documentation

@article{Schneider.Ri,
author = "T. D. Schneider",
title = "Information Content of Individual Genetic Sequences",
journal = "J. Theor. Biol.",
volume = "189",
number = "4",
pages = "427-441",
note = "http://www-lecb.ncifcrf.gov/$\sim$toms/paper/ri/",
comment = "indiv.tex",
comment = "Submitted, April 1997",
year = "1997"}

@article{Schneider.walker,
author = "T. D. Schneider",
title = "Sequence Walkers:
a graphical method to display how binding proteins
interact with {DNA} or {RNA} sequences",
journal = "Nucl. Acids Res.",
volume = "25",
comment = "walker.tex, November 1, issue 21",
note = "http://www-lecb.ncifcrf.gov/$\sim$toms/paper/walker/,
erratum: NAR 26(4): 1135, 1998",
pages = "4408-4415",
year = "1997"}

see also
   sites.p, ri.p, genhis.p, lister.p, dnaplot.p

author
   Thomas Dana Schneider

bugs
   * The quote strings in the parameter file are not recorded and so are not
   reproduced in the data file comments.
   * Blank characters are placed around the quote strings.

technical notes
   The mean and standard deviation of the Ri distribution are stored just
   after the Ri(b,l) table in the ribl file.  They are produced automatically
   by the ri program.

   To provide upwards compatability, scanp files of version 2.90 or less will
   be interpreted by the old definitions for the bounds of Ri, Z and p:

      Ri cutoff : One real on the second line is the information content at
      or above which to report in the data file.

      Z score cutoff: One real on the third line is the Z score at or below
      which to report in the data file.  A negative sign will be converted to
      a positive sign so that this parameter limits the range of acceptable
      sites to an interval on the real line.

      Probability cutoff: One real on the fourth line is the lowest
      probability which to report in the data file.  The probability of a
      site is determined from the mean and standard deviation of the Ri
      distribution.

   It is not advisable to rely on this feature, as it will go away at some
   point.

*)
(* end module describe.scan *)
{This manual page was created by makman 1.45}


{created by htmlink 1.62}