By downloading this code you agree to the
Source Code Use License (PDF). |
{ version = 1.36; (* of embed.p 2012 Jun 23} (* begin module describe.embed *) (* name embed: embed an aligned set of DNA sequences into random sequences synopsis embed(inst: in, book: in, mkvseqs, in: ranbook, in: embedp: in, embedbk: out, output: out) files inst: delila instructions of the form 'get from 56 -5 to 56 +10;' book: the book generated by delila using inst mkvseqs: random sequence output from the markov program ranbook: book made from random sequences using makebk program; either mkvseqs or ranbook must be contain sequence. If both contain sequence, then mkvseqs will be used as the source for random sequences. embedp: parameters to control the program. The file must contain the following parameters, one per line: parameterversion: The version number of the program. This allows the user to be warned if an old parameter file is used. alignmenttype: The type of alignment to use. f: first base, i: inst, b: book alignment 'b' is to be used when 'default coordinate zero;' is used in the inst file, resulting in a book whose coordinates do not match the inst coordinates. 'i' is to be used when the book contains a normal coordinate system corresponding to the inst file. 'f' simply aligns by the first base in the book. See alist.p for more details on alignmenttype. InFrom, InTo: the from-to range of the input sequences to be used. OutFrom, OutTo: the from-to range of the sequences to output. This includes the Infrom range AND the random sequences. embedbk: book created by the program. Contains the sequences embedded within random sequences to the specified range. output: messages to the user description Embed embeds a given set of aligned sequences into random sequences having a specified range. If there is an incomplete sequence in the region to be embedded, it is filled in with random sequences as well. This allows one to destroy a pattern in the aligned sequences, so that the sequences can be realigned to find other patterns nearby. The parameters OutFrom, InFrom, InTo, OutTo in embedp set the range to do the embedding. In order for the program to function correctly, the following must be true: OutFrom <= InFrom <= InTo <= OutTo The sequence from InFrom to InTo is not changed, and random sequence is filled in around it from OutFrom on the left to OutTo on the right. See example below. If the orginal sequence is longer than the range OutFrom to OutTo then the book will contain the embedded sequence with orginal sequence on either side of the random sequence. The program stores the random sequence as a string and then uses it base by base until there is no more in the string. Then it reads another string of random sequence. In this way, none of the random sequence is "thrown away". If the program finds the end of mkvseqs or ranbook before it has embedded all the sequences, it gives a message that it is out of random sequence and halts. Why doesn't the program reuse the random sequence? This is not a good idea because the embedded sequences are designed to be fed into malign, and malign would pick up on this reused sequence and find unnatural sequence conservation. Aligned sequences can be viewed with the alist program. The random sequences are generated by the markov program. They can be read from either mkvseqs or ranbook. mkvseqs is directly generated from markov to a given composition and length. Ranbook can be made using the makebk program. If both files are present, mkvseqs is used. The output of this program is designed to be fed into the malign program for multiple alignment. examples With the following parameters from embedp the sequence would be embedded as shown below. -10 10 InFrom, InTo: range of input sequences to be used -30 30 OutFrom, OutTo: range of the sequences to output original: -----|-------------------<---------0--------->-------------------|----- -30 -10 +10 +30 OutFrom InFrom InTo OutTo embedded: ********************<---------0--------->******************** -30 random -10 original +10 random +30 sequence sequence sequence Note that if there is any sequence in the original alignment outside the range OutFrom to OutTo, it will be copied to the embedbk. Randomizing a Single Patch Using embed it is possible to cover only one small area with random sequence instead of two areas. To do this you will need to use the embed parameters in a certain way. For example if you wanted to cover only the zero coordinate with random sequence, three of the parameters will need to be the same: -1 -1 InFrom, InTo: range of input sequences to be used -1 0 OutFrom, OutTo: range of the sequences to output When parameters are the same, the InFrom and InTo override the OutFrom and OutTo. The example parameters given above would keep the sequences at the -1 coordinate the same, but make the sequences at the 0 coordinate random. In this case all sequences other than 0 are kept the same. Another example would be to 'zap' or randomize from -3 to +4. The parameters would be: -4 -4 InFrom, InTo: range of input sequences to be used -4 4 OutFrom, OutTo: range of the sequences to output These parameters would leave sequences from below up to the -4 coordinate alone, but make the sequences from -3 to +4 random. The sequences from +5 and higher would be maintained as well. documentation see also alist.p, markov.p, makebk.p, malign.p author Elaine Bucheimer bugs The program cannot handle sequences longer than dnamax. This is a fixable bug. A possible future addition to the program would be to allow the user to specify if they want the old sequence hanging around or if the sequence should be chopped outside the OutFrom and OutTo coordinates. It appears that the 'i' option does not embed correctly. The resulting book does not have the advertised coordinates. A temporary solution is to use the f option with appropriate ranges. technical notes *) (* end module describe.embed *) {This manual page was created by makman 1.45}{created by htmlink 1.62}