Delila Program: makelogo

makelogo program

Documentation for the makelogo program is below, with links to related programs in the "see also" section.

{   version = 9.67; (* of makelogo.p 2023 Jan 02}

(* begin module describe.makelogo *)
(*
name
   makelogo: make a graphical `sequence logo' for aligned sequences

synopsis
   makelogo(symvec: in, makelogop: in, colors: in, marks: in,
            wave: in, logo: out, output: out)

files
   symvec:  A "symbol vector" file usually created by the alpro or dalvec
     program.  Makelogo will ignore any number of header lines that begin
     with "*".  The next line contains one number (k) that defines the number
     of letters in the alphabet.  and then defines the composition of letters
     at each position in the set of aligned sequences.

     Each composition begins with 4 numbers on one line:

     1. position (integer);
     2. number of sequences at that position (integer);
     3. information content of the position (real);
     4. variance of the information content (real).

     This is followed by k lines.  The first character on the line
     is the character.  This is followed by the number of that character
     at that position.

     Example:

* position, number of sequences, information Rs, variance of Rs
4 number of symbols in DNA or RNA
-100       86 -0.00820  6.3319e-04
a   27
c   18
g   20
t   21
 -99       86 -0.00436  6.3319e-04
a   26
c   19
g   17
t   24

     * If the symvec file is empty, the alphabet is printed as a test.

     * If the error bars values are negative, they are not displayed.  This
     allows the sites program to control the display when it would not be
     appropriate.

     * If the number of a symbol is negative in symvec, then the symbol will
     be rotated 180 degrees before being printed.  The absolute value is used
     by makelogo to determine the height.  This allows statistical tests
     which find rare symbols to be significant to show that the symbol is
     rare by having it up side down.  Notice that ACGT are all easy to
     distinguish from their upside down versions, but unfortunately this is
     not always true for protein sequences.  Program dalvec contains a switch
     for turning the letters over in the ChiLogo.

   makelogop: parameters to control the program.

     line 1: contains the lowest to highest range of the binding site to do
                    the logo graph. (FROM to TO range)

     line 2: bar: sequence coordinate before which to print a vertical bar
             NOTE: the vertical bar takes up a small amount of horizontal
             space.  However, to make sure that marks are placed correctly,
             the logo is not offset.  The bar will overwrite the previous
             stack and the next stack will overwrite the bar.
             To remove the bar, just set its location outside the range
             of the display.

     line 3: xcorner and ycorner.  This is the coordinate of the lower left
             hand corner of the logo (in cm).  These should be real numbers.

             z xcorner ycorner zerobase
             If the first letter of this line is "z", then the program
             expects three numbers:  xcorner, ycorner and zerobase.  Zerobase
             is a real number defining the position on the sequence that the
             zero of the coordinate system is to be set.  For example,
             setting zerobase to 0 (zero) will place the center of the 0 at
             xcorner, ycorner.  This special feature allows the logo to be
             precisely placed relative to other logos so that they can be
             aligned one above another in a figure.

     line 4: rotation: angle in degrees to rotate the logo.  Warning:
             rotations other than by factors of 90 degrees may produce
             incorrect logos because character scaling depends on the
             orientation of the characters.  (Essentially, it's a design
             fault of PostScript.)

     line 5: charwidth: (real, > 0) the width of the logo characters, in cm
             squashspray (real, > 1.0 - optional.

             2018 Jul 26: If there is a second number after the
             charwidth, this becomes squashspray.  Make this value
             larger than '1.0' if you are getting a line to the right
             of your logo; these are squashed letters that are too
             small.  After being squashed they are sprayed to the
             right of the logo for unknown reasons.  This appears to
             be a bug in PostScript interpreters and printers.  If a
             character height is smaller than squashspray, it is drawn
             as a solid rectangle.  Most of the time a user will not
             notice this.  You can see them by making squashspray a
             large value (e.g. 80).  See the technical notes for how
             and why to use this.

     line 6: barheight barwidth: (real, > 0) height of the vertical bar, in
             cm, and its width, in cm.

             WARNING: if the barwidth is too big, it can cover the
             smaller tic marks.

     line 7: barbits: (real) The height of the vertical bar, in bits, is
             given by the absolute value of barbits.  If barbits is positive,
             an "I-beam" will appear at the top of the symbol stack.  The
             I-beam indicates one standard deviation of the stack height,
             based entirely on how small the sample of sequences is.  If the
             value of barbits is negative, the I-beam is not displayed.  Not
             knowing how big the sampling effects are can fool one, so one
             should usually have the I-beam, even if it is ugly.

                WARNING: it is not known how to calculate the error for data
             derived from a dirty DNA synthesis experiment (see
             Schneider1989, reference given below).  In that case the error
             could be calculated (in program sites) from the number of
             sequences, so that the error bar would be an underestimate of
             the variation.  Unfortunately, when I tried this, people
             interpreted the error bar as the size they saw, so this does not
             work well visually.  Therefore when data come from the sites
             program, the I-beam is suppressed.

                The combination of barheight and barbits determines the size
             of the logo in bits per centimeter.  Both must be specified even
             if no vertical bar is desired.

     line 8: Ibeamfraction: real. The fraction of the vertical part of the
             Ibeam to draw.  When it is 1, the Ibeam is normal.  When it is
             zero, no vertical line is drawn.  At 0.1, only 10 percent of the
             top half and 10 percent of the bottom half of the Ibeam is
             drawn, for a total of 10 percent of the entire ibeam.  More
             precisely, this number is the fraction of a standard deviation
             to draw.  Negative values will reverse the direction of the part
             drawn, making a 'thumbtack'.  (Note:  if this parameter is
             missing, as in old makelogop files, the program will ignore
             it.)  I thank Shmuel Pietrokovski (Structural Biology
             department, Weizmann Institute of Science, 76100 Rehovot -
             ISRAEL, bppietro@dapsas1.weizmann.ac.il) for suggesting this
             method, and for the code to do it.  See further description
             below.

             Note:  This parameter can be skipped.  The code looks for
             a number at this position in the parameter file.  If
             there is a number, the Ibeamfraction is read.  Otherwise,
             the Ibeamfraction defaults to 1.0.

     line 9: barends: if the first character on the line is a 'b', then bars
             are put before and after each line, in addition to the other
             bar.  The first bar on each line is labeled with tic marks and
             the number of bits.  If you don't want this, you can remove the
             call to maketic in the logo.  This is easily done in Unix with

               grep -v maketic.startline logo > logo.without.tics

             That is, the PostScript code that generates the tic marks is on
             one line and there is a comment containing "maketic.startline".
             The grep removes that entire line from the logo file.  Likewise,
             the bars at the start and end of the lines can be removed with:

               grep -v makebar.startline logo > logo.without.start.bar
               grep -v makebar.end linelogo   > logo.without.end.bar

             If barends is:

                b - put a bar on both left and right sides of the logo
                l - left only
                r - right only
                n - no bar on either side

             One can control tic marks that are not numbered.  These
             are called 'subtics' and they are controlled by the
             second character on the line.

             If the second character on the line (ticcommand) is:

                t - it is followed by two numbers: subticsBig and
                    subticsSmall.

                    Both numbers define the number of intervals of sub-tic
                    marks to show for each vertical bit of
                    the bar.

                    subticsBig is the number of intervals for big subtics.
                    These are the same size as the numbered tics.

                    subticsSmall is the number of intervals for small subtics.
                    These are half the size as the numbered tics.

                    Examples:

                    't 2 10' will put a big tic at 0.5 and 1.5 bits
                    and small tic marks every 0.1 bit.  This is the
                    default.

                    't 2 2' will put a big tic at 0.5 and 1.5 bits
                    There will also be small tic marks but since they
                    are in the same location as the big ones, you
                    would not see them.)

                    't 1 1' will make the tic marks fall under the
                    numbered tic marks so none are visible.

                    WARNING: if the barwidth is 0.1 (the previous
                    standard) then the tic marks will get covered.  A
                    barwidth of 0.05 works.

                Any other character for ticcommand will be the same as
                't 2 10', so this is the default.

    line 10: showingbox: if the first character on the line is an 's', then
             show a box around each character.  This is useful to check that
             the heights of the letters are correct and to distinguish the
             letters from each other when amino acids are represented.

             If the character is an 'f', then the box is filled and no
             character is shown.  This is useful for showing 'logos' of
             extremely large size where the individual character is not
             readable, but the color is.

    line 11: outline:  If the first character is 'o' then the characters show
             up in outline form.  Otherwise, they are solid.

             The outline of an entire stack can be turned on or off using the
             marks file.  The command is toggleoutline and it is treated as a
             user defined command.  The first parameter is the position, the
             remaining three must be given but are ignored.  The state of the
             outlining will apply to the stack following the given position.
             For example,

U 0 0 0 0 toggleoutline
U 1 0 0 0 toggleoutline

             will set position 1 to be the reverse of the rest of the logo.
             (New as of 1999 April 12)

    line 12: caps: if the first letter is 'c' then alphabetic characters are
             converted to capital form.

    line 13: stacksperline: number of character stacks per line output.  A
             "stack" is a vertical set of characters.  A "line" is a series
             of stacks.  One may have several lines per page (next
             parameter).  Special note: This value is used to do the
             centering of strings.  For a range of -23 to +19, you have to
             set it to (19)-(-23)+1 = 43 to get your title centered
             correctly.  You can get the program to tell you the number '43'
             by setting stacksperline very large, in which case it realizes
             there is something wrong and does the calculation.

    line 14: linesperpage: number of lines per page output

    line 15: linemove: line separation relative to the barheight Note:  This
             affects the BoundingBox discussed below.

    line 16: numbering: if the first letter is 'n' then each stack is
             numbered.  Otherwise, the number is suppressed in a PostScript
             if statement.  This allows you to modify the logo file by hand
             to reinstate numbering for only the positions you want by
             removing or changing the if statement calls to makenumber.  For
             example,

                numbering {(6) makenumber} if

             Is the PostScript for making the number "6" under the global
             numbering control.  To make "6" always be there, change it to:

                true {(6) makenumber} if

    line 17: shrinking: (real)  Factor by which to shrink the characters.  If
             shrinking <= 0 or shrinking >= 1 then the characters exactly fit
             into the box.  If shrinking > 0 and shrinking < 1, the
             characters are shrunk inside the box.  To use this feature, the
             parameter showningbox be on, so that the user does not create a
             logo whose height is misleading.

    line 18: strings: the number of user defined strings to follow.  Each
             string definition takes up two lines.  The first is the (x,y)
             coordinate of the string, the second is the string itself.  The
             coordinates are in centimeters relative to the coordinate
             transforms performed above.  (This way, the title position stays
             the same relative to the logo.)

    line 18+strings+1: (x,y,s) coordinates of first user defined string (if
             strings >= 1) followed by the factor by which to scale the
             string.  A factor of 1 means no scaling.  In addition, if the x
             coordinate is less than or equal to -1000, then the string is
             centered by using the string width, the stacksperline and
             charwidth.  Note!  To allow more parameters, it is no longer
             true that one may turn off the strings by setting the number of
             strings to 0, but the lines can be left in the file.  If strings
             are zero, then they must be removed.

    line 18+strings+2: the first user defined string (if strings >= 1)

    line 18+strings+3: (x,y,s) coordinates of second user defined string (if
            strings >= 2)

    line 18+strings+3: (x,y,s) coordinates of second user defined string (if
            strings >= 2)

             Special string controls:
             \i italics toggle
                To make italics, use \i twice, around the text.
             \n 5 give number of sequences at coordinate 5.
                More than one \n can be used for different coordinates.
                If out of range, give maximum in symvec.
             \\ produce backslash
             \160 produce the Greek letter pi from the PostScript Symbol font.
                These fonts are listed on pages 270 to 273 of the "Red" book
                (see references, below).
             \r produce Rsequence
             \s produce standard deviation
             \d decimal places: must be followed immediately by the number of
                decimal places to use for the next \r or \s

             Example:
             \n 0 \i E. coli\i LexA binding sites
             will give the number of lexA sites at coordinate 0
             and make "E. coli" in italics.

             \d2 Rs = \r +/- \s bits
             will look like this:
             Rs = 5.72 +/- 3.46 bits

             For advanced users:
             HOW TO MAKE ITALICS IN YOUR STRINGS using PostScript
             To allow for italics, use a string like this:

38\) \( E. coli \) IT \(LexA binding sites

             This will make the words "E. coli" in
             Helvetica-BoldItalic font, but leave "38" and "LexA
             binding sites" in Helvetica-Bold.  See the technical
             notes for how this works.  The toggle form "\i" uses the
             same method, but simplifies it for the user.  This method
             allows one to create any PostScript commands.

    line 18+2*strings+1: edgecontrol edgeleft, edgeright, edgelow, edgehigh:
             edgecontrol is a single character that controls how the bounding
             box of the figure is handled.  If it is 'p' then the bounding
             box will be the page parameters defined in constants inside the
             program (llx, lly, urx, ury).  Otherwise, there are four real
             numbers that define the edges around the logo in cm.  To allow a
             sequence logo to be imbedded into another figure, its size must
             be defined in PostScript (with %%BoundingBox).  The basic logo
             fits within a rectangle, but the numbers along the bottom
             symbols and labels may be anywhere outside.  By setting these
             four numbers, the edges are defined.

    line 18+2*strings+2: ShowEnds: a single character
             d: show for DNA 5' and 3' on the logo
             p: show for protein N and C on the logo
             otherwise: nothing is shown.

    line 18+2*strings+3: formcontrol: a single character that determines
                the overall form control of the output.
                See discussion below and the examples.
             n: normallogo. standard sequence logo (or any other character)
             v: varlogo.  See discussion below for what this is.
             e: equallogo.  All stack heights are at the maximum.
                Of course this loses the useful data about the exact
                sequence conservation (measured in bits) at each
                position.
             r: rarelogo.  Plot (1-Pi) for each symbol instead of Pi.
                See discussion below.
             R: rareequallogo.  As with r, but equal stack heights.

             WARNING: To avoid missing important biological
             discoveries, BEFORE using the equallogo and rarelogo
             parameters read this page:

             https://alum.mit.edu/www/toms/logorecommendations.html

             To avoid a user thinking that a symbol is used when it is
             not, for r and R a '.' is plotted instead of the letter
             when Pi = 0.  This shows up as a black rectangle.


             This parameter was implemented on 2011 Mar 09.

    The remainder of the file is ignored and may contain comments.

   colors: Defines the color of each character printed.  Any number of lines
     that begin with an asterisk [*] can be used as comments to identify the
     file or portions of the file.  Put into the file one line for each
     character that is to have a color other than black.  The line must
     contain:

                    character red green blue

     The last three parameters are real values between 0 and 1 (inclusive).
     The values depend on the PostScript interpreter, but 0 means black and a
     value of 1 means the most bright.

     To assign the asterisk a color, proceed it with a backslash [as \*].

     To assign the backslash a color, proceed it with a backslash [as \\].

     If the file is empty, the logo is made in black and white and the lower
     half of the I-beam error bar is made white so that when it is inside the
     letters it is visible.

     To make any letter invisible, assign it any color less than zero, for
     example -1 -1 -1.  This is different than black, which is 0 0 0 and
     white which is 1 1 1.  The error bar will still be displayed.

     Each of the symbols A, C, G and T can represent either DNA or
     amino acids.  To distinguish between them, the lister program
     uses lower case in the colors file for DNA/RNA and upper case for
     amino acids.  This is now fully implemented in makelogo.  Note
     that the usual sequence logo for DNA has upper case letters.
     This is done using the caps parameter.  New as of 2007 Mar 31.

   marks: an empty file means no marks are made.  Otherwise, a series of
     lines contain data that define marks to be placed on the output:

        symbol and kind: the first two characters on the line define
        the symbol and then how to draw the symbol.  The symbols are:

           c circle

           b box

           l line

           t triangle

           s square

           u Begin a user defined symbol.  Define a symbol yourself in
             PostScript.  The PostScript code may extend over several lines.
             The end of the code is given by the character "!" at the start
             of a line.  (The rest of the "!" containing line is ignored.)
             This allows one simply to insert some pre-tested PostScript
             between "u" and "!" lines of the marks file.  The code will be
             passed 4 coordinates and any other parameters given in the U
             line (defined below).

           U Call the user defined symbol.  The U must be followed by 4
             coordinates numbers: x1 y1 x2 y2.  The x1 and x2 are in bases,
             while y1 and y2 are in bits.  The remainder of the line is
             copied to the logo file, so you can have more parameters there.
             End the line with the name of one of your defined symbols.

           * a comment line

           % a comment line

        The drawing types are:
           s stroke
           f fill
           d dash

        If marksymbol is c, t or s, three more parameters are required:

           base coordinate: a real number that determines the center of the
              mark

           bits coordinate: a real number that determines the position of the
              mark in bits.

           scale: a positive real number in units of bases that is the
              diameter of the circle or the diameter of a circle that the
              equilateral triangle would be enscribed in.  For the square, it
              is the side.  By using units of bases, these marks
              automatically will fit between bases on the logo, as the
              charwidth is changed or other scaling is done.

        If marksymbol is b, l or U, 4 more parameters are required:
           base coordinate: a real number that determines end 1
           bits coordinate: a real number that determines end 1
           base coordinate: a real number that determines end 2
           bits coordinate: a real number that determines end 2
        The line is drawn from end 1 to end 2 while these ends define box
        diagonal.  Note that the center of a base is defined as an integer,
        so one must add 0.5 to base coordinates to put a boxes around a
        base.  You may make the user symbol use these coordinates however you
        want.

        ********************************************************************
        * The symbols MUST be in increasing order of position in the site! *
        ********************************************************************

        The symbols must be given in the order of their use in bases.  If
        symbols are not there, check the order.

        Since symbols are drawn concurrently with the logo letters, drawing a
        box or line symbol that has an end 2 to the left of the current
        position (which is end 1) will draw over the letters (because the
        letter was already drawn), while drawing to the right will draw under
        the letters (because the base is drawn over later).

        There is a special predefined user mark that allows one to toggle
        stacks between regular and outlined characters; see the outline
        parameter of makelogop.

   wave: Define a cosine wave over the graph.  Empty file means no cosine
       wave, otherwise the parameters of the wave are given one per line:

       extreme: char;  h or l, the extreme high or low point on the curve
         defined by the wavelocation and wavebit

       wavelocation: real; the location in bases of the extreme

       wavebit: real; the location in bits of the extreme

       waveamplitude: real; the amplitude of the wave in bits

       wavelength: real (positive); the wave length of the wave in bases

       dash: real; the size of dashes in cm.  Zero or negative means no
          dashes.

          If the first character on the line is 'd' then a
          new method of dash control is applied.  In this case there
          are three parameters:

          dashon: real; the size of dashes ON segment in cm.
             Zero or negative means no dashes.

          dashooff: real; the size of dashes OFF segment in cm.

          dashooffset: real; the offset for dashing.

          These parameters follow the PostScript Language Reference Manual,
          Second Edition, page 500.  Dashes start with the ON segment,
          followed by the OFF segment.  They are shifted by the offset,
          which is the amount into the dash cycle to start.

          NOTE:  The distances are defined along the length of the cosine,
          which is a function of the waveamplitude, bits per cm (barbits)
          and wavelength and bitsperbase.  For now it is simplest to
          empirically first determine the dashon and dashoff values that give
          repeats every wavelength, then set the dashoffset.

       thickness: real; thickness of the wave in cm.  Zero or negative means
          the value defaults to PostScript line thickness.

   logo: the output file, a PostScript program to display the logo.
       The last line of the file gives:
       Rsequence = area under the logo (bits)
       small sampling error (bits)
       range from, (bases)
       range to, (bases)
       information density = Rsequence /(two times bases in range)

   output: messages to the user

description

   The makelogo program generates a `sequence logo' for a set of aligned
   sequences.  A full description is in the documentation paper.  The input
   is an `symvec', or symbol-vector that contains the information at each
   position and the numbers of each symbol.  The output is in the graphics
   language PostScript.

   The program now indicates the small sample error in the logo by a small
   'I-beam' overlayed on the top of the logo.  Although the user may turn
   this off to make pretty logos, I strongly recommend use of it to avoid
   being fooled by small amounts of data.

********************************************************************************

   Making A Logo As Part of Another Figure
   ---------------------------------------

   The normal logo file is designed to stand by itself.  However, it is often
   desirable to incorporate the logo as part of another figure.  The
   difficulty is that the stand-alone logo PostScript program will erase the
   page (which wipes out any previous figure drawing) and show the page
   (which prints the page right after the logo).  To prevent these actions,
   the lines of PostScript code which do this have comments that contain the
   word REMOVE.  All you have to do is remove these lines and your logo will
   be able to fit into your figure.  In Unix this can be easily done by:

   grep -v REMOVE logo > logo.ps

   If you do this, then it is advisable to do the erasepage and the showpage
   yourself.  A convenient way to do this is to have several files that
   contain postscript commands, and to use a shell script to concatenate them
   together:

   cat start.ps logo.ps end.ps > myfigure.ps

   If you have a large number of logos together in one figure, you can reduce
   the size of the final figure by another trick.  Logo files begin with a
   header which is the same from one figure to the next assuming you don't
   change colors/letter combinations.  So the first logo in the figure must
   contain this header, but later ones don't really need it.  You can remove
   the header material by using the censor program:

   censor < logo.ps > logo.no.header.ps

   EXAMPLE:
   Suppose that you have two logo files, 1 and 2.  Then to join them, you can
   use the unix commands:

      grep -v REMOVE 1 > 3
      censor < 2 >> 3
      echo "showpage" >> 3

   The grep removes the REMOVE lines from file 1 and puts the rest into the
   start of file 3.  The censor removes the duplicate PostScript definitions
   from file 2 and appends the remainder to the end of 3.  Finally, the echo
   puts a 'showpage' command on the end of the file so that the printer will
   print the page (otherwise you won't get any printout).

********************************************************************************

   Playing with Ibeams
   -------------------
   Shmuel Pietrokovski (bppietro@dapsas1.weizmann.ac.il) suggested that the
   middle of the Ibeams be removable so that it doesn't get in the way of
   logos.  That is, a normal Ibeam looks like:

   -----
     |
     |
     |
     |
     |
     |
   -----

   This is sitting on the top of the sequence logo stack of letters.  This is
   obtained by setting the Ibeamfraction to 1.0.  Shmuel suggested that there
   be a parameter to remove the vertical part or to have it partway:

   -----
     |
     |


     |
     |
   -----

   This is obtained by setting the Ibeamfraction to 0.6.  Setting
   Ibeamfraction to -1.0, puts the vertical parts OUTSIDE the bars.  This way
   one can read one standard deviation of the stack and also have a mark at
   (for example) 2 standard deviations out at the tips of the thumb tacks:

     |
     |
   -----


   -----
     |
     |

********************************************************************************

   How do I disable the error bar?
   -------------------------------

   Set barbits negative.  If I were to do it again I'd separate the
   variables.  For example, -2 gives a height of 2 bits for the bar but would
   be no error bars.

********************************************************************************

   How do I label the residues every 5, for example 0, 5, 10, 15 ...
   -----------------------------------------------------------------

   There isn't a way to do this directly since I like having all positions
   labeled because it is less work for the reader to figure out where things
   are.  However, you can remove all numbering (set the numbering parameter
   to anything but 'n').  Then you can use the marks file to put numbers
   where you want.  See:  marks.lettering for a mechanism that I put together
   for this.  (There is a link from the 'See Also' section below.) You could
   even rotate the numbers if you know how to program PostScript.  If you get
   a nice working example, I can add it to my set.  If not, you *might*
   convince me to generate the marks file if you describe what you want and
   marks.lettering doesn't do it ;-).

********************************************************************************

   How do I set the default paper size (A4 or letter)?
   ---------------------------------------------------

   The simplest thing is to place the logo wherever you want on the page.
   You can set the box boundaries with the edgecontrol variables.

   You can also set the PostScript page size by changing the four constants:
   llx, lly, urx and ury.  This would require a recompile.  These numbers are
   in 'points', one point is 1/72 inch (I know, silly!) but you can convert
   precisely to cm by multiplying by 2.54/72.

********************************************************************************

   How do I make a logo that has several lines?
   --------------------------------------------

   If you are working with a protein or a very long DNA sequence, you might
   consider setting linesperpage to more than 1 and adjusting stacksperline
   and linemove accordingly.

********************************************************************************

   rarelogo:

   Sometimes one would like to examine the rare symbols.  This is one
   technique for doing so.  A parameter called 'formcontrol' is set to
   'r' to use this.

   In a conventional logo, for the bases A, C, G, T the heights are
   set to the conservation.  Call this "1" so that A+C+G+T = 1.

   A "rare logo" graphs:

   (1-A)
   (1-C)
   (1-G)
   (1-T)

   The sum of these is 4 - (A+C+G+T) = 3.  That's a bit strange, but ok!
   It says that you plot each symbol with a height:

   conservation*(1-Pi)/(M-1)

   Where M is the number of symbols in the alphabet.

   varlogo:  If the first letter is 'v' then the makelogo program will
   produce a 'varlogo'.  This method was invented by Peter Shenkin
   (Shenkin.Mastrandrea1991).  In a regular sequence logo the vertical
   scale is the information content.  However in some systems, as in
   the immunoglobulin variable regions, one is not interested in the
   conservation, but rather the degree of variability.  This is best
   expressed as the uncertainty Hafter(l) rather than the information
   R(l) = Hbefore - Hafter(l) (where 'l' is the position in the
   sequence alignment).  Basically, it "turns over" the curve.  This
   is also implemented in alpro.

********************************************************************************

@article{Shenkin.Mastrandrea1991,
author = "P. S. Shenkin
 and B. Erman
 and L. D. Mastrandrea",
title = "{Information-theoretical entropy as a measure of sequence
variability}",
journal = "Proteins",
volume = "11",
pages = "297--313",
pmid = "1758884",
year = "1991"}

see also

   Example sequence logos:
   image for lexa-logo
   image for donor.pure

   A Gallery of Sequence Logos:
   https://alum.mit.edu/www/toms/sequencelogo.html

   Glossary definition of Sequence Logo:
   https://alum.mit.edu/www/toms/glossary.html#sequence_logo

   -----------------------

   FORM CONTROL FOR SEQUENCE LOGOS
   controlled by parameter formcontrol

   WARNING: To avoid missing important biological discoveries, BEFORE
   using the equallogo and rarelogo parameters read this:

   https://alum.mit.edu/www/toms/logorecommendations.html

   Normal logo (normallogo):
   Note: the sine wavelength is 3.6 amino acids, corresponding to an alpha helix.
   image for normallogo

   Variable logo (varlogo):
   Plot Hafter(l) instead of R(l).
   image for varlogo

   Equal logo (equallogo):
   Equal stack heights. Note that sequence conservation data is lost.
   image for equallogo
   SEE WARNING ABOVE!

   Rare logo: (rarelogo):
   Plot 1-Pi instead of Pi.  Normal stack heights
   image for rarelogo

   Rare-Equal logo: (rareequallogo):
   Plot 1-Pi instead of Pi, equal stack heights.
   image for rareequallogo
   SEE WARNING ABOVE!

   -----------------------

   FULL WORKING EXAMPLE
    This is a full test of makelogo.

   1. obtain these files:

   lambdacicro.colors      lambdacicro.makelogop   lambdacicro.symvec
   lambdacicro-logo.ps     lambdacicro.marks       lambdacicro.wave

   2. Except for the lambdacicro-logo.ps file,
    copy these to files without the 'lambdacicro.'.
    3. Run makelogo.
    4. Except for the version number, makelogo should create a logo file
       identical to lambdacicro-logo.ps.

    Unix commands for doing the test are:

    cp lambdacicro.colors    colors
    cp lambdacicro.makelogop makelogop
    cp lambdacicro.symvec    symvec
    cp lambdacicro.marks     marks
    cp lambdacicro.wave      wave
    makelogo
    diff lambdacicro-logo.ps logo



   -----------------------

   Related programs:

   There are several ways to get the symvec file, this is described in:
   https://alum.mit.edu/www/toms/logoprograms.html

   1. The Alpro route to making logos: alpro.p

   2. The Delila route to making logos:
   dbbk.p, catal.p, delila.p, alist.p, encode.p, rseq.p, dalvec.p

   3. A program that creates a symvec from a list of words is:
   alword.p

   -----------------------

   To PRINT LOGOS see:
   https://alum.mit.edu/www/toms/postscript.html

   -----------------------

   Other related programs:
   rsgra.p, sites.p, censor.p, rav.p

   Example input files:
   symvec, makelogop, colors, wave, marks

   Some demonstration input files:
   symvec.demo, colors.demo, makelogop.demo, wave.demo, marks.demo
   Resulting output file:
   logo.demo

   Example output files, in postscript:
   logo

   Other examples and useful control files:
   colors.protein
   marks.arrow
   marks.ellipse
   marks.lettering
   marks.plusminus
   marks.symbols
   marks.userdefined

author

  Thomas D. Schneider, Ph.D.
  toms@alum.mit.edu
  toms@alum.mit.edu (permanent)
  https://alum.mit.edu/www/toms (permanent)

examples
   makelogop parameters:

-15 2      FROM to TO range to make the logo over
1          sequence coordinate before which to put a bar on the logo
15 2       (xcorner, ycorner) lower left hand corner of the logo (in cm)
90         rotation: angle to rotate the graph
1.0        charwidth: (real, > 0) the width of the logo characters, in cm
10 0.1     barheight, barwidth: (real, > 0) height of vertical bar, in cm
2          barbits: (real) height of the vertical bar, in bits; < 0: no I-beam
no bars    barends: if 'b' put bars before and after each line
show       showingbox: if 's' show a dashed box around each character; f = fill
no outline outline: if 'o' make each character as an outline
100        stacksperline: number of character stacks per line output
1          linesperpage: number of lines per page output
1.1        linemove: line separation relative to the barheight
numbers    numbering: if the first letter is 'n' then each stack is numbered
1          shrinking: factor by which to shrink characters inside dashed box
2          strings: the number of user defined strings to follow
2 14 1     coordinates of the first string (in cm)
First TITLE
3 13 1     coordinates of the second string (in cm)
SECOND TITLE
n 2 1 2 1  edgecontrol (p=page), edgeleft, edgeright, edgelow, edgehigh in cm
d          d: 5' 3'; p: N C; else: nothing shown on ends

makelogop.dna: parameters for the makelogo program, version 8.31 or higher

   colors:
* Color scheme for logos of DNA (for the makelogo program).
* color order is red-green-blue
*
* green:
A 0 1 0
a 0 1 0
*
* blue:
C 0 0 1
c 0 0 1
*
* red:
T 1 0 0
t 1 0 0
*
* orange:
G 1 0.7 0
g 1 0.7 0

   wave:
l        extreme: char; h or l, the high or low extreme to be defined
2        wavelocation: real; the location in bases of the extreme
1.0      wavebit: real; the location in bits of the extreme
0.5      waveamplitude: real; the amplitude of the wave in bits
10.4     wavelength: real; the wave length of the wave in bases
0        dash: real; the size of dashes in cm.  dash <= 0 means no dashes
0.1      thickness: real; thickness of the wave in cm. <=0: default.

   marks:

* example marks file for makelogo 8.06 and higher
*
* square stroked, filled and dotted:
ss -2 -0.40 0.5
sf -1 -0.30 0.5
sd  0 -0.20 0.5
*
* circle stroked, filled and dotted:
cs  1 -0.40 0.5
cf  2 -0.30 0.5
cd  3 -0.20 0.5
*
* triangle stroked, filled and dotted:
ts  4 -0.40 0.5
tf  5 -0.30 0.5
td  6 -0.20 0.5
*
* box stroked, filled and dotted base to base:
bs  7 -0.40  8 0
bf  8 -0.30  9 0
bd  9 -0.20 10 0
*
* line stroked, filled and dotted base to base:
ls 10 -0.40 11 0
lf 11 -0.30 12 0
ld 12 -0.20 13 0
*
* box stroked, filled and dotted, around bases:
bs 13.5 -0.40 14.5 0
bf 14.5 -0.30 15.5 0
bd 15.5 -0.20 16.5 0
*
* line stroked, filled and dotted, around bases:
ls 16.5 -0.40 17.5 0
lf 17.5 -0.30 18.5 0
ld 18.5 -0.20 19.5 0

  A test symvec is provided with the program, file 'symvec.demo', to be run
  with 'colors.demo' and 'makelogop.demo'.

documentation

   Description of Logos:
@article{Schneider.Stephens1990,
author = "T. D. Schneider
 and R. M. Stephens",
title = "Sequence Logos: A New Way to Display Consensus Sequences",
journal = "Nucleic Acids Res.",
volume = "18",
pages = "6097--6100",
pmid = "2172928",
pmcid = "PMC332411",
note = "\htmladdnormallink
{https://alum.mit.edu/www/toms/papers/logopaper/}
{https://alum.mit.edu/www/toms/papers/logopaper/}",
year = "1990"}

   Use of wave:
@article{Papp.Schneider1993,
author = "P. P. Papp
 and D. K. Chattoraj
 and T. D. Schneider",
title = "{Information analysis of sequences that bind the replication
initiator RepA}",
journal = "J. Mol. Biol.",
volume = "233",
pages = "219--230",
pmid = "8377199",
comment = "Cover of 233, number 2!",
year = "1993"}

   Dirty DNA synthesis experiments:
@article{Schneider.Stormo1989,
author = "T. D. Schneider
 and G. D. Stormo",
title = "{Excess information at bacteriophage T7 genomic promoters
detected by a random cloning technique}",
journal = "Nucleic Acids Res.",
volume = "17",
pages = "659--674",
pmid = "2915926",
pmcid = "PMC331610",
year = "1989"}

   The Blue Book:
@book{PostScriptTutorial1985,
author = "{Adobe Systems Incorporated}",
title = "PostScript Language Tutorial and Cookbook",
publisher = "Addison-Wesley Publishing Company",
address = "Reading, Massachusetts",
callnumber = "QA76.73.P67P68",
isbn = "0-201-10179-3",
year = "1985"}

   The Red Book:
@book{PostScriptManual1985,
author = "{Adobe Systems Incorporated}",
title = "PostScript Language Reference Manual",
publisher = "Addison-Wesley Publishing Company",
address = "Reading, Massachusetts",
callnumber = "QA76.73.P67P67",
isbn = "0-201-10174-2",
year = "1985"}

bugs
   Some chi-logo (upside down characters) do not display on OpenWindows, but
   do print ok on the Apple LaserWriter IIntx.  The reason is completely
   obscure.

   A bug in NeWS 1.1 is that characters that are scaled too small are forced
   to be big.  This messes up the logo and can be confusing.  Another bug in
   NeWS 1.1 prevents one from using the outline, but the dashed boxes will
   show up.  Sometimes displaying a logo in NeWS 1.1 on a Sun 4 will cause an
   'illegal instruction', after which one is thrown completely off the
   computer.  The source of this is not known, since it is not repeatable.
   The first two bugs are resolved under OpenWindows 2; the third has not
   been observed.  These NeWS bugs do not apply to the Apple LaserWriter
   IIntx, which prints everything correctly.

   * MISSING LOGO LETTER PROBLEM
   The OpenWindows PostScript on a Sun workstation will mess up displaying a
   stack of letters if the vertical movement is too small.  The result is
   that the letters above that point are missing.  This occurs if there is a
   highly conserved base and very few other bases.  The result is a huge gap
   where the highly conserved base should be.  Other printers do fine, so
   this is a problem with the Sun implementation of PostScript (will they
   ever get it right???).  If you don't have this window system, set the
   constant gooddisplay to true.  If you do want the logos to show up
   properly on the screen, use false.  Unfortunately, this will mean that the
   vertical translation for the small letters won't be done, so the display
   will be very slightly wrong.

   * The freeware program Ghostview will sometimes refuse to print some
   bases, but they come out just fine on many printers.

   *******************************************************************

   * Eric Miller (esm@unity.ncsu.eduk, http://www.mbio.ncsu.edu/esm) pointed
   out (2000 Dec 15):

   > Aesthetically, the error bars at the bottom of the logo (little to no
   > information regions) obscure the base coordinate line.

   Yes that's bothered me at times also.

   > For a given logo, the error bars are / appear the same length, probably
   > as a function of the number of sequences present in the alignment, since
   > each position is represented in each sequence.

   That's correct.

   > It would be preferable to have the logo error bar in a single location
   > (since they are the same),

   No they aren't all the same.  The delila system handles blanks, where no
   sequence is known or reported.  So error bars tend to be bigger away from
   the center of the logo where there are fewer sequences.  Some examples are
   in the Gallery, especially the 8 E. coli sequence logos.

   > maybe off any letter of the sequence (above a specified coordinate
   > position, at a specified bit height), or just on the high part of the
   > logo.  I need to check the makelogop to see if the error bars can be
   > removed or modified.

   One can remove the bars, though of course one goes blind at that point.
   Moving them is an interesting idea.  Of course the problem is in the cases
   where there is low information content, so wouldn't work.  If one had a
   lower bound, then explaining it to people would be complex - one's eye
   would see it more than the background!  Also, one could not judge the
   background against the bars.  One solution might be to block the bar below
   zero, but then I'm worried that partial bars may be misinterpreted.  So
   you raise a good issue but I don't know a good solution.  Fortunately it
   is for the most part aesthetic as you say - one can figure out the
   numbering.

   *******************************************************************

technical notes

   * HERE'S HOW ITALIC STRINGS WORK.  User defined strings have to be
   rendered into PostScript.  To indicate that a region of the string is to
   be done in italics, one must gain access to the PostScript machinery.  For
   example:

38\) \( E. coli \) IT \(LexA binding sites (extra parenthesis)

   The first "\)" after the "38" switches to the PostScript interpreter. The
   backslash "\" is used as an "escape" character, telling makelogo that the
   following character is to be interpreted as PostScript.  (Otherwise
   makelogo would protect the character and you would just get a
   parenthesis.) Likewise, the string

\( E. coli \)

   is interpreted as a PostScript string.  At that point there will be
   two strings on the stack, the (38) and the "( E. coli )".  There is
   a special function defined in makelogo called IT.  IT takes these
   two strings and shows the first in Helvetica-Bold and the second in
   Helvetica-BoldItalic.  After that we must return to normal typing,
   and this is done with "\(" just before "LexA".  The general form
   for using PostScript commands is therefore

\) postscriptstuff \(

   That is, the parenthesis always match backwards.  The code (procedure
   postscriptstring) is curious and interesting because it starts with a
   string like this:

38\) \( E. coli \) IT \(LexA binding sites (extra parenthesis)

   and converts it to the following valid PostScript:

(38) ( E. coli ) IT (LexA binding sites \(extra parenthesis\))

   The escape character by the user is removed from parenthesis, while
   unprotected parenthesis get escape characters!

   Why not let the user type raw postscript?  Because they would have to
   remember to type a \ in front of various characters, and this would often
   lead to programs that would bomb.

   Note that one can define ANY function one would like by this means!

   * Unfortunately PostScript fonts are not exactly the same height.  Thus if
   A and T are the standard, then C and G hang above and below the line.
   This has been solved in this version of makelogo.  As a consequence, the
   user never need to determine any character sizes empirically, and the
   logos should work on any PostScript printer.

   Special thanks go to the following people for their help in solving this
   problem:

   Kevin Andresen [kevina@apple.com]
   "The problem facing you is that, while the PostScript language is more or
   less standard, the font shapes depend on the designer, type vendor, or
   language implementation.  The fonts used in NeWS are not exactly the same
   as those from Adobe, which are not the same as those from Bitstream, which
   are not the same as the original lead type, etc.  (This is an
   industry-wide issue.)  One way to compensate for this in PostScript is to
   use the charpath and pathbbox operators and scale appropriately."

   He provided a program, which I then rewrote and generalized.  That version
   almost worked, but not quite.  This was solved by:

   finlay@Eng.Sun.COM (John Finlay) who said:
   "It would appear that the calculation of the pathbbox for characters
   varies with the scale of the characters (I don't know why exactly but
   would speculate that there's probably some weirdness with the font hints
   and scaling).  I modified your postscript to iterate once on the size and
   recalculate the pathbbox at the scaled size.  Seems to printout OK (inside
   the boxes) on a LWI, LWII and in NeWS2.0 (though NeWS still seems to get
   the wide slightly wrong)."

   shiva@well.sf.ca.us (Kenneth Porter) was also involved and actively
   interested.  My apologies if I have forgotten someone else who
   contributed.

   The letter I and the vertical bar (|) are treated specially since
   in the Helvetica-Bold font they are rectangles and would completely
   fill the character space.  In addition, the letter I is centered by
   makelogo.

   * Thanks go to Joe Mack for suggesting numbering and titles (strings) and
   to Pete Lemkin and Wojciech Kasprzak for pointing out that the shrink
   option would be helpful.  Thanks to Jeff Haemer for pointing out that the
   PostScript program should begin with '%!', and for suggesting that the
   string fonts should be different from the logos themselves.

   * As of version 8.12, makelogo produces encpsulated PostScript.  This
   allows the logo to be more safely imbedded in other figures.  The
   BoundingBox, which defines the region a figure resides on a page, is
   computed from the basic size of the logo.  The width is computed from the
   charwidth and stacksperline.  The height is computed from barheight,
   linesperpage and linemove.  The linemove parameter is used only if
   linesperpage is more than 1.  The edge parameters are then added around
   all edges.  This allows the numbering and labels to be inside the
   BoundingBox.  The figure can be rotated by -90 or +90 degrees.  Other
   rotations result in a BoundingBox that is page-sized.  Note that rotations
   can place much of the logo outside of the page.  The bounding box will not
   show parts outside of the page, so this can be confusing.  To see roughly
   where the logo will appear on the page, use -89 or +89 angles.

   * Constant centertrigger determines the value of the base position of a
   string at which the string will be centered.

   * 2006 Oct 25.  Very small values of Rs(l) = rsl < 0.00005, cause
   ghostview to crash.  Changing rsl would alter the sum, so that is
   not a solution.  The solution is to restrict the minimum stack size
   drawn to the constant minimumStackSize.

   * 2018 Jul 25.  Very small character sizes cause a 'squash-spray'
   effect.  The effect is a thin colored line extending to the right
   of the sequence logo (a spray), usually at zero bits corresponding
   to a letter whose height is very small (squash).  That is, the
   letter is squashed and sprayed to the right.  This did not happen
   with MacOS some time ago but more recently it has occured with both
   Skim and Adobe viewers.  Experimenting with calls to numchar
   suggested that it occurs when the charheight is less than the
   constant squashspray in points.  So the code now simply does not
   show the character if the height is below that value.  It appears
   that squashspray is best to be set to 1 point.  When the effect
   occurs, the squashed near-zero height letters are displayed anyway
   as "junk" along the bottom of the logo since they have been sorted
   to be small.  When they are removed, the larger letters do not move
   vertically , but the "junk" disappears.  This was confirmed using
   the flicker technique: https://alum.mit.edu/www/toms/flicker.html
   The example used to solve this bug is show by this flicker:
   https://alum.mit.edu/www/toms/images/squashspray.gif

   * 2018 Jul 26.  squashspray becomes a hidden variable next to
   charwidth.  The default (squashspraydefault) is 1 point but this
   doesn't always work for unknown reasons.

   * 2018 Aug 08.  When a character is smaller than squashspray it
   will be drawn as a filled rectangle instead of being invisible.
   Users won't notice this much since the rectagle will generally be
   small.

*)
(* end module describe.makelogo *)
{This manual page was created by makman 1.45}


{created by htmlink 1.62}