Conversation 2: Information, Uncertainty and Secret Codes

1999 Jul 17 If it is not too much trouble, and if you get the chance over the course of the day, could you possibly provide short answers to the following questions?

1. Does a random file contain information?

Without more information (in the general sense of the word) about the situation, one can't say.

This raises the interesting issue of secret codes. If I encode a message with pgp and send it through your computer, it may well look like a random stream. But to me and my receiver, it is interpretable.

2. Does file size have an effect on information content?

Yes, by an inequality. The larger the file is, the more information (in the Shannon sense) can be stored in it. But the information may not be stored efficiently. You can use a compression program to get an idea of how efficiently the information is stored. If the file compresses a lot, then it was not originally efficient. Of course, if you compress the file, it looks "random", but (excluding lossy methods like some forms of jpg) can be reconstituted exactly.

For example, I could have a 1 kb file containing text. It might compress to 0.5 kb. On the other hand, a 1kp file containing a pgp message (perhaps compressed beforehand - I don't know if this is part of pgp) might not compress at all.

So file size gives the upper bound on what could be stored there in information content by Shannon's meaning.

3. Does a random file contain more information than a non-random file of the same size?

Again, you need to specify were the 'random' file came from and where it is going. If it is the output of a natural chaotic process, and you can't repeat that process, then it would contain a lot of 'data' but upon receipt as a message the receiver uncertainty would not drop, so it would contain little information.

If the file contains a long string of 'aaaa...' then it is not random and contains little information. Hbefore and Hafter are both close to zero so the information is near zero.

These are in relation to source. I guess the essence of the question is, if I transmit a random file, have I transmitted information? And, if so, does the size of that file play in to the calculation of that amount of information?

I'm not sure that it makes sense to restrict things to the source, you have to know what's going on at the receiver too. If the receiver knows how to decode the message, it could have high information content for the receiver. If the receiver can't, the information content would be low.

I am honestly trying to gain a deeper understanding of information theory. With no instructor to validate my interpretations of the paper (and my math skills being a little weak), I have only my intuitive understanding of the concepts to go by.
Practical examples help a lot. Take a look at sequence logos. Thank you again for all your help!

You're welcome

Jeff Abramsohn
jabramsohn@jhancock.com
(with permission to put on the web)

Tom

Dr. Thomas D. Schneider
toms@alum.mit.edu
permanent email: toms@alum.mit.edu
https://alum.mit.edu/www/toms/

color bar Small icon for Theory of Molecular Machines: physics,
chemistry, biology, molecular biology, evolutionary theory,
genetic engineering, sequence logos, information theory,
electrical engineering, thermodynamics, statistical
mechanics, hypersphere packing, gumball machines, Maxwell's
Daemon, limits of computers

Schneider Lab
origin: 1999 July 17
updated: 1999 July 17
color bar