By downloading this code you agree to the
Source Code Use License (PDF). |
{version = 1.96; (* of siva.p 1999 Dec 13} (* begin module describe.siva *) (* name siva: site information variance synopsis siva(sorted: in, sivap: in, incu: out, curves: out, list: out, output: out) files sorted: the output of the sites program that contains a sorted list of sites for each experiment performed. sivap: parameters to control the program. first line: two integers, from and to coordinates over which to do the calculations. second line: repeats, the number of times to take passes through the data removing subsets. This improves the statistics. incu: the xyin input to xyplo, output of this program. Two columns: first column is the number of sites used to find the information second column is the amount of information in bits The curves loop around along the axis, so they remain connected. curves: another xyin file, for graphing the wiggling info curves first column is the position across the site second column is the information The curves loop around along the axis, so they remain connected. list: statistical picture of the result. Two columns: first column is the number of sites used to find the information second column is the average amount of information (corresponds to the second column of incu, but is the average) third column is the variance of the information (corresponds to what your eye picks out as the thickness of the incu curves) output: messages to the user description Siva calculates the variance of the information in a set of randomized sites by eliminating each site in turn and keeping track of the increase in the information content. The information content must increase, since with fewer samples there must be less variation (this is the small sample bias effect). The program allows one to graph the information content versus the number of sites removed (incu). When this is done repeatedly, with different orders of removing the sites, a thick band of curves is created. The thickest part of this band shows the greatest possible amount of variation that could be in the total set of sequences. To be even-handed, the program removes the first sequence, then randomly removes the others. This creates the first curve. Then the program removes the second sequence and randomly removes the others for the second curve. If there are n sequences, then n removal curves will be generated. This is one complete repeat of the process. If you want, you can do this a number of times to get better statistics, using the repeat parameter in sivap. The largest variation in the information content is surely greater than the variation of the information content in all the sets of removals of sites. For several experiments, the statistics are joined into one set. With several experiments, surely the variation of the combined experiments would be less than the variations found for the individuals. So if one experiment gives a greater variation, that will increase the variation siva reports in list, so the highest value in list is an upper limit on the variation. documentation @article{Schneider1989, author = "T. D. Schneider and G. D. Stormo", title = "Excess Information at Bacteriophage {T7} Genomic Promoters Detected by a Random Cloning Technique", year = "1989", journal = "Nucl. Acids Res.", volume = "17", pages = "659-674"} see also sites.p author Thomas Dana Schneider bugs none known *) (* end module describe.siva *) {This manual page was created by makman 1.45}{created by htmlink 1.62}