human donor splice site sequence logo

How Can I Make Sequence Logos on My Own Computer?

There are two ways to make sequence logos, the Alpro Route and the Delila Route. If you are interested in proteins or your sequence has gaps then the you must use the Alpro Route. If you are working with binding sites, the Delila Route has great advantages.

FOR BOTH ROUTES

To use the programs, you must first set them up on your computer. The programs are written in Pascal and either you can get a binary (if you work on a Sun machine) or you need to compile them. The currently available methods are described on the Delila Software page.

Documentation, source code and Sun binaries of each program are given on the pages linked to below.

First try the test.p program. If you can get this program to run, many other programs will work.

Then get delmod.p and run it. This tests the date/time feature, which is system dependent. If it works ok, all programs should be ok.

For more tips see Setting Up and Using Delila Programs.

To learn how the Delila system is designed, see LIBDEF, the definition of the Delila system.

Recommendations for Making Sequence Logos

ALPRO ROUTE: proteins or nucleic acid binding sites

If you have alignments with gaps or protein sequences, then you only need (and must use) alpro and makelogo.

The steps are:

  1. Prepare the protseq file and then run alpro.

  2. Run makelogo.

DELILA ROUTE: nucleic acid binding sites

If you are working with binding sites in GenBank, you will need many programs. They are given below in the order you use them, with a brief explanation of each. Be sure to read the manual page of each one. In particular, note how the output of each program generally becomes the input to the next. Also, it helps to read various definitions in the glossary.

  1. dbbk: Convert GenBank flat file format to Delila format, which is called a `book' (db = database, bk = book). This produces an 'l1' file which contains the book. (l1 is a lower case L [standing for 'Library'] followed by the symbol 'one').

    a bookshelf with several books on top

  2. catal: Catalogue the contents of a book, usually l1. If you are not using them (which is generally the case) make empty l2, l3 and catalp files, then run catal. This will produce six files: lib1, lib2, lib3 and cat1, cat2, cat3. These make up the library used by Delila in the next step.

    open scissors pointing to the right a open book

  3. delila: Extract fragments of sequences from a library of sequences and create a subset, a book. This is the core of the Delila system. You give Delila instructions (inst file) and those are used to create the subset desired. For a logo, generally one makes instructions that look like this:
     get from 5600 -200 to same +200 direction +;
    
    This means to get 200 bases before position 5600 to 200 bases afterwards, for a total of 401 bases. I recommend that you use a wide range like this to be able to see the background noise around the binding site. By the way, the number 5600 becomes the zero coordinate of the binding site. I try to pick a position that is strongly conserved (high information content). See the Introduction to Delila Instructions for more details.

    A cartoon of a pizza slice with steam coming off it.

  4. alist: Make an aligned listing of sequences using the Delila instructions and the book created by Delila. The pair of the book and inst file is a set called an 'aligned book'. I create a listing of the sequences to be sure that they are aligned correctly. If this works, making the logo is fast and a piece of cake (or slice pizza ;-) Even if you have a large range defined in the delila instructions, you can use a smaller one for the aligned listing. Note that running this program does not affect the inst or book files, so can't affect later steps. The program is controled by a file `alistp', which stands for alist-parameters. See the glossary definition of parameter file.

  5. encode: Convert the book/inst into 0 and 1's. This is historically the way we did it, but it is fast so we still do it this way. The output is encseq, which is a bit large. To save space, you can delete the encseq file after running the next program. Note that the range can be reduced in the parameter file, so I normally set this to -200 to 200.

  6. rseq: Compute Rsequence for each position in the aligned book, to make an rsdata file.

  7. dalvec: Convert the rsdata file into a symvec file, which the next program can use to make the logo.


    human donor splice site sequence logo
  8. makelogo: Finally! Make the sequence logo! There are lots of parameters you can adjust.

  9. rf: Once you have computed the information content of your binding site, you may want to compute Rfrequency. You will need to know how big the genome is (or the total number of binding positions availble to the recognizer) and the number of binding sites in that region.


Displaying the Logo

  1. PostScript: Give the logo file to a postscript printer or display device and you will have the logo. A good postscript display is ghostview.
  2. Convert to PDF. However, now-a-days I convert to PDF using my showpdf script (which depends on ps2pdf available by fink and my ish script). I then look at the PDF with Acrobat or Skim. The advantage of Skim is that it can be set to watch the logo file and update when that changes using atchange. This way one can edit the parameter file and see the logo change without moving one's fingers from the keyboard.


After the Logo: Individual Information

Once you have a sequence logo, it is only one step to create the individual information model and to create sequence walkers. All of these programs are part of the Individual Information theory software package, which (for better or worse) is protected by a patent. However it is not hard to obtain access.

  1. ri: this program computes the individual information for a set of binding sites.

  2. makewalker: this program displays sequence walkers using ghostscript. I usually use scan and lister (described below) now.

  3. scan: search a Delila book with a simple information-theory based weight matrix to find features.

  4. multiscan: search a Delila book several information-theory based weight matricies connected by variable gaps to find features. Examples are ribosome binding sites and sigma70 promoters.

  5. lister: this program displays sequence walkers on pages. image for fispromoterArt

color bar Small icon for Theory of Molecular Machines: physics,
chemistry, biology, molecular biology, evolutionary theory,
genetic engineering, sequence logos, information theory,
electrical engineering, thermodynamics, statistical
mechanics, hypersphere packing, gumball machines, Maxwell's
Daemon, limits of computers


Schneider Lab

origin: 1998 February 3
updated: 2020 Dec 30
color bar