Delila Program: orf

orf program

Documentation for the orf program is below, with links to related programs in the "see also" section.

{   version = 1.36; (* of orf.p 2022 Dec 16}

(* begin module describe.orf *)
(*
name
   orf: find ORFs for ribosome binding sites

synopsis

   orf(genomebook:in, scanfeatures: in, orfp: in, orffeatures: out,
       output: out)

files

   genomebook:  a book from the Delila system containing a genome.
     If a genome is not available, a regular Delila book works ok.

   scanfeatures:  output of the multiscan program containing
     locations of Initiation Region (ir) features.

   orfp:  parameters to control the program.  The file must contain the
      following parameters, one per line:

      parameterversion: The version number of the program.  This allows the
         user to be warned if an old parameter file is used.

      shortestorf: the shortest orf to report

      longestorf:  the  longest orf to report

   orffeatures:  ORF features reported as features for the lister program
      This file REPLACEs the scanfeatures in the features file.

   output: messages to the user

description

   The orf program reads a book containing a complete genome and,
   given the start points of translation (initiation regions, ir, the
   AUG, GUG or UUG starts) it finds the corresponding stop codons.
   These are called 'orf' features.  They reside at the end of the
   open reading frame and report the length of the frame in codons.
   The length includes the initiation and termination codons.

   Be sure to turn on 'predict peptides' in listerp!

   The procedure used by the program is:

   * Read in a whole genome book (or Delila book).

   * Read in a scanfeatures.

   * Look through the scanfeatures for ir features.  Determine the orf
   for each ir feature in the given orientation at that position in
   the genome.

   Note that the orffeatures REPLACE the scanfeatures in a features
   file and are given to the lister.p program for display.  This is
   because the scanfeatures are read in and then modified: the total
   lines corresponding to the ir get orf data added to them and an orf
   definition and features are inserted into the list.

   The orffeatures have special parameters for total features:

      The Aparam of the total feature is the information (in bits).

      The Bparam of the total feature is the orf length (in codons,
      including the stop).

      The Cparam of the total feature is the last base of the orf.

      The Dparam: see below.

   The parameters shortestorf and longestorf only affect one thing,
   the Dparam of the total feature of the ribosome binding site.  If
   the number of codons (including initiation codon and stop codon) is
   greater than or equal to shortestorf and less than or equal to
   longestorf then the total Dparam is set to '1', otherwise it is 0
   (zero).  This allows selection of the total features to generate
   tables.

examples

   Example parameters to use in listerp:
39    basesperline: number of bases per line in the listing
1     aastate: 0=no aa; 1=predict peptides; 2=translate all frames
7     frameallowed: binary; highest bit is highest frame on, etc.
1     codelength: 1 or 3 letters per amino acid

basesperline: must be a multiple of 3 for peptides
aastate:      1=predict peptides
frameallowed: 7 for all frames
codelength: 1 or 3 letters per amino acid could do either.

documentation

  We used the orf program to identify small proteins:

@article{Hemm.Rudd2008,
author = "M. R. Hemm
 and B. J. Paul
 and T. D. Schneider
 and G. Storz
 and K. E. Rudd",
title = "{Small membrane proteins found by comparative genomics and
ribosome binding site models}",
journal = "Mol. Microbiol.",
volume = "70",
pages = "1487--1501",
pmid = "19121005",
pmcid = "PMC2614699",
note = "\htmladdnormallink
{https://doi.org/10.1111/j.1365-2958.2008.06495.x}
{https://doi.org/10.1111/j.1365-2958.2008.06495.x}, \htmladdnormallink
{https://alum.mit.edu/www/toms/papers/smallproteins/}
{https://alum.mit.edu/www/toms/papers/smallproteins/}",
year = "2008"}

see also

   program for display the results: lister.p

author

   Thomas Dana Schneider

bugs

   Orf is designed for a single bacterial genome at the moment, since
   it only handles one piece at a time.

   Can we use the ir color for the stop codon color bar?  No!  The
   color is available yet because mkpetal happens LATER - there is no
   way to get that color now!  This function can be done only in
   lister when there are colors available after mkpetals has been run.
   Method for lister: search the features for ir.  If the ir is
   followed by an orf, assign the color of the ir to the orf.

   When two ribosome binding sites are in frame, two orf stops are
   generated that are in the same place.  They only differ by the
   other string and so in the current lister they are considered to be
   duplicates.  Two solutions are (1) make lister check the 'other'
   string for identity and (2) do an 'other' string check only when
   the namestring is 'orf'.

technical notes

*)
(* end module describe.orf *)
{This manual page was created by makman 1.45}


{created by htmlink 1.62}