A Tutorial on Delila Instructions

with examples

by Tom Schneider

Outline

Introduction to Delila Instructions

Terminiology used is described in a glossary.

The concept of the Delila system is to extract fragments of sequence from a library (database) of sequences before beginning any analysis of the sequences. This has a number of advantages, including automating the analysis process, avoiding editing sequences (which will lead to mistakes!), the ability to permanently record the sequences used in a compact form (instructions) and therefore the ability to repeat an analysis. The extraction is done by a librarian program named Delila. One gives Delila instructions for what fragments to obtain and how to mutate them. The returned result given by the librarian is -- of course! -- a book.

An important feature of Delila is that the coordinate system of each sequence in the book corresponds to that in the parent library. This way you won't go crazy trying to figure out the locations of bases - all output has the same coordinate system. (The exception is if you make mutations, in which case coordinates get renumbered on the 'downstream' side.)


Making a Delila Library

If you already have a Delila Library (i.e. the 6 files lib1, lib2, lib3, cat1, cat2, cat3) then you can skip this section. If not, you need to create one.

The first step is to create a Delila book containing the genomic or artificial sequence you want to manipulate. There are a number of programs you can use to do this:

Next, you need to create the Delila library. In a Unix system:

cp book l1 # copy your book to the file l1
touch l2 l3 catalp # make empty files
catal # run the catal program

Delila will now run using the 6 library files and an instruction file.


Title

Since Delila produces a book, it is natural that the first instruction in a set of Delila instructions is the title to be given to the book:

title "An example book";

Note that delila will accept both single (') and double (") quotes.

You can have any title you like. I would, however, recommend this format:

title 'Fis sites version = 1.81 of fis.inst 2002 Apr 24';
This includes four important components:
  1. Fis sites: the name of the sites
  2. 1.81: the version number which can be used by the ver program. All Delila programs pass the title to the next program (though they may be in comments). So by changing the version every time you change anything in the file, you will always know exactly what is happening.
  3. fis.inst: the file name (note that the type is 'inst'),
  4. 2002 Apr 24: the date.
If you use this format, then you can save backup copies in the form fis.inst.1.81 by using the save script.


Specification

Next, the desired source sequence must be specified. Delila was built before GenBank existed and it assumes that the database is organized by organism and chromosome (as opposed to the current mess of entries). So one defines these:

organism H.sapiens;
chromosome H.sapiens;

Next one needs to choose the particular sequence of DNA, called a piece:

piece LINEAR;
where LINEAR would usually be the GenBank ACCESSION number.


Requests

Having specified the sequence we want, we now can make a series of requests to get particular parts of the sequence. Suppose that the wild-type sequence named LINEAR begins with the EcoRI site 5' gaattc 3', with bases numbered 1 to 180. Then to obtain the entire sequence we can say:

get all piece;

The DNA sequence 5' gaattc 3' with the second t marked
at position 5. To get the first 6 bases (containing just the EcoRI site) we say:

get from 1 to 6;
The lister program puts an asterisk ('*') every 5th base, and numbers every 10th base. (This way you won't go crazy counting bases - you never need to count more than 3 positions to identify a base.)

The DNA sequence 5' aattc 3' with the second t marked at
position 4. To get the second to sixth bases one can say:

get from 2 to 6;
which gives 5' aattc 3'.

The DNA sequence 5' gaattc 3' with the first a marked at
position 2. One can also get the complement:

get from 6 to 2 direction -;
which also gives 5' gaatt 3'. Note that the asterisk in the figure is still over base 5. Delila retains the original coordinate system, which means that you can compare output from different extractions and the coordinates of the bases remain the same.

Here's a puzzler: The DNA sequence 5' aatt 3' with the second t marked at
position 5.

get from 2 to 5 direction +;
The DNA sequence 5' aatt 3' with the first a marked.
get from 5 to 2 direction -;
Why are these the same?

The DNA sequence 5' aaagtcaactaactgaattc 3' with every
5th base marked with an asterisk starting at 20 and
decreasing in numbering.  Positions 20 and 10 have numbers. An example longer sequence is:

get from 20 to 1 direction -;
giving 5' aaagtcaactaactgaattc 3', which shows how the coordinate system decreases. (Note the EcoRI site at the 3' end.)

Having obtained the sequence(s) we want, Delila's job is over. Other programs are used to display and analyze the sequence. For these examples I used the Lister program for the figures. Lister gives the sequence, carefully labeled with 5' and 3' on the ends. Every 5th base is marked by an asterisk, and every 10th base is numbered. This way you will never need to count more than 3 bases to determine the coordinate of any base.


Relative Coordinate Requests

The DNA sequence 5' gaatt 3' with the second t marked at
position 5. A powerful way to get sequences is relative to a particular point:

get from 3 -2 to 3 +2;
which gets 2 bases before coordinate 3 to 2 bases after coordinate 3, that is from base 1 to base 5: 5' gaatt 3'. Generally one does not want to repeat the second coordinate, so one can use the command: The DNA sequence 5' gaatt 3' with the second t marked at
position 5.
get from 3 -2 to same +2;
where 'same' refers to the coordinate given after the word 'from'. This is the most convenient form for specifying binding site locations. For more examples, see: Making Delila Instructions for Symmetric Sites.


Making Mutations

There are three ways to make changes.

The DNA sequence 5' gaattc 3' changed to 5' taattc 3'. 1. A CHANGE requires the previous base, the coordinate to change and then the new base:

get from 1 to 6 with g1t;

gives taattc. The base that changes from a G at 1 to an T is marked by the tail and head of an arrow. The figure is produced by first running Delila to extract the sequence(s) and to produce the marking information. This information is then used by Lister to create the postscript.

How do I write my instructions if I want the complementary sequence?
Glad you asked. Coordinates of changes are always given on the original wild-type coordinate system. The rule is:

The coordinates given in the mutation and the sequences given refer always to the sequence written 5' to 3' in the *positive* coordinate direction.
The reason for doing things this way is that you would go absolutely crazy if you had to change the definition of the mutation merely if you wanted the complementary sequence!

The DNA sequence 5' gaattc 3' changed to 5' gaatta 3' For example, starting again from 5' gaattc 3':

get from 6 to 1 with g1t;
Delila makes the mutation and then complements the sequence to give 5' gaatta 3'. Note that the first sequence in the illustration is already complemented. You can see this because the asterisk ('*') marks the 5th base.


The DNA sequence 5' gaattc 3' changed to 5' gaccttc 3' 2. An INSERTION uses two coordinates and a sequence. The sequence BETWEEN the coordinates is removed and the given sequence is inserted.

get from 1 to 6 with i2,3cc;
gives gaccattc.


The DNA sequence 5' gaattc 3' changed to 5' gccttc 3' Changing that to:

get from 1 to 6 with i1,4cc;
does a replacement to give gccttc.


The DNA sequence 5' gaattc 3' changed to 5' gttc 3' Finally,

get from 1 to 6 with i1,4;
deletes to give gttc.


Note that any change can be made with this definition; the other methods are available for convenience.

The DNA sequence 5' gaattc 3' changed to 5' gc 3' 3. A DELETION takes two coordinates. The sequence INCLUDING the coordinates is removed.

get from 1 to 6 with d2,5;
gives gc. Coordinates outside the end of the piece are allowed.


The DNA sequence 5' gaattc 3' changed to 5' tccttc 3' in
two steps. Combined changes are possible. Separate the changes with periods:

get from 1 to 6 with g1t.i1,4cc;
gives tccttc.


Mutation Analysis: example

title "ABCR mutation";
organism H.sapiens;
chromosome H.sapiens;
set doubling on;
piece Y15651;
name "mutation at exon 17 acceptor";
get from 63 -25 to same +7 with g64a;

Two new commands are introduced here:

set doubling on;
which tells Delila to give both the original sequence and the sequence with the mutation and
name "mutation at exon 17 acceptor";
which tells Delila to name the new sequence. The result, when displayed by the lister program, is:
Sequence from GenBank Locus Y15651, the ABCR gene and a
mutated sequence.  The top one has the tail of an arrow
pointing at position 64, a g which is the middle base of
the codon gga, coding for G (glycine).  The bottom sequence
shows this base changed to an a, gaa now coding for E
(glutemate).  Below the top sequence are two sequence
walkers for human splice acceptor sites of 11.6 bits
(exactly at the end of the exon) and 3.9 bits (3 bases to
the left end of the exon).  After the mutation the first
walker becomes 10.7 bits and the second one becomes 5.6
bits.
Note how the mutation affects both walkers simultaneously. (See ABCR Mutation G863A for more information about this curious mutation.)


Controling Lister

In all of the examples above, the book was given to the lister program, which generated PostScript output. Lister has a special mode for displaying sequences along with their mutations: the 'pagetrigger' parameter is set to 'd'. To use this feature, create a mutation instruction using 'with' and be sure to 'set doubling on' before that point in the instructions. In the book Delila will put the original sequence along with the mutation sequence. Delila will also create a 'marksdelila' file which contains information about how to mark the mutation. Append the 'marksdelila' file to the end of the arrow definition file (marks.arrow) and run lister:

delila
cat marks.arrow marksdelila > marks
lister
The resulting 'map' file is in PostScript and can be sent to a printer, displayed on your screen or converted to PDF.


Comments

The Delila language provides two ways to create comments in the instruction files. Both are 'Pascal-like' since the same form is used in the computer language Pascal:

(* Two character comments *)
and
{ One character comments }
Material inside comments is ignored by Delila. Comments of one type can be nested inside the other type. I commonly make my comments using (* and *) and then use the braces { and } to block off instructions I don't want temporarily.

I strongly recommend putting in the date and the file name in the title, and at least a short description of what the instruction set is about in a comment. It is also useful to add citations for evidence that the sequence is a binding site, and to mention the kind of data that supports this (e.g. footprinting, gel shift assay, mutations).


Making Delila Instructions for Symmetric Sites

Binding sites can have three kinds of symmetry, as discussed in the glossary entry on binding site symmetry. The corresponding Delila instructions are of increasing difficulty:

Note: the ranges given above are only examples. We generally take a very large range such as -200 to +200 for our initial analysis to get a feeling for the background noise of the information curve.


Setting Parameters

Delila has a number of parameters that have preset values which you can change. You can use the word 'default' or 'set' to change them.


Full Definition of Delila

If you would like to know more about the Delila language, then you can look at the LIBrary DEFinition, LIBDEF.


Automatic Generation of Delila Instructions

The delila system has a number of ways to automatically generate delila instructions:


Original References:

color bar Small icon for Theory of Molecular Machines: physics,
chemistry, biology, molecular biology, evolutionary theory,
genetic engineering, sequence logos, information theory,
electrical engineering, thermodynamics, statistical
mechanics, hypersphere packing, gumball machines, Maxwell's
Daemon, limits of computers


Schneider Lab

origin: 1999 May 2
updated: version = 2.05 of delilainstructions.html 2009 Jan 27
color bar