By downloading this code you agree to the
Source Code Use License (PDF). |
{ version = 2.29; (* of calhnb.p 2005 Jul 16} (* begin module describe.calhnb *) (* name calhnb: small-sample correction for information and uncertainty synopsis calhnb(fin: in, fout: out, output: out) files fin: the genomic composition (integers) on one line followed by a set of integers, one per line representing values of n fout: a table showing n, e(hnb), ae(hnb) and their difference. the variances var(hnb) and avar(hnb) are tabulated along with the difference between their square roots. This is the difference between the standard deviations. e(n) is found from the genomic uncertainty minus e(hnb). Finally, sd(n) = sqrt(var(hnb)) is given. output: messages to the user. describe Given a genomic composition and a series of integers (n) that represent the number of sample sites, calhnb calculates the sampling error as e(hnb) and the variance var(hnb). It also finds the approximations ae(hnb) and avar(hnb). These values are presented in a table along with the differences between the exact and approximate calculations. This table will allow a user to decide when to use the approximations. Beware that the exact calculation becomes very expensive for large n. For this reason, I use the approximate computation for n > 20 in rseq and alpro. examples When used as fin, the calhnb.fin file should generate the calhnb.fout file in the fout. The data should be identical those given in Figure A.2 on page 428 of the Appendix of Schneider et al 1986. documentation "Information content of binding sites on nucleotide sequences" T. D. Schneider, G. D. Stormo, L. Gold, and A. Ehrenfeucht JMB 188:415-431 (1986) [see link below] see also Example input file, fin: calhnb.fin Corresponding output file, fout: calhnb.fout fin file for values up to n = 50: calhnb.50.fin fout file for values up to n = 50: calhnb.50.fout Discussion about correctiing for small sample size: https://alum.mit.edu/www/toms/small.sample.correction.html Schneider et al. (1986): https://alum.mit.edu/www/toms/paper/schneider1986 related programs: rseq.p, alpro.p author Thomas D. Schneider bugs It would be nice to have a generalized algorithm for any number of symbols. *) (* end module describe.calhnb *) {This manual page was created by makman 1.45}{created by htmlink 1.62}