Chapter 5: Problem 7
Entropy can be viewed as a measure of the lack of information you have about a system. Claude Shannon [112] realized, back in the 1940 's, that communication over telephone wires amounts to reducing the listener's uncertainty about the sender's message, and introduced a definition of an information entropy. Most natural languages (voice, written English) are highly redundant; the number of intelligible fifty-letter sentences is many fewer than \(26^{50}\), and the number of ten-second phone conversations is far smaller than the number of sound signals that could be generated with frequencies between up to \(20,000 \mathrm{~Hz} .^{44}\) Shannon, knowing statistical mechanics, defined the entropy of an ensemble of messages: if there are \(N\) possible messages that can be sent in one package, and message \(m\) is being transmitted with probability \(p_{m}\), then Shannon's entropy is $$ S_{I}=-k_{S} \sum_{1}^{N} p_{m} \log p_{m} $$ where instead of Boltzmann's constant, Shannon picked \(k_{S}=1 / \log 2\) This immediately suggests a theory for signal compression. If you can recode the alphabet so that common letters and common sequences of letters are abbreviated, while infrequent combinations are spelled out in lengthly fashion, you can dramatically reduce the channel capacity needed to send the data. (This is lossless compression, like zip and gz and gif). An obscure language A'bç! for long-distance communication has only three sounds: a hoot represented by A, a slap represented by \(\mathrm{B}\), and a click represented by C. In a typical message, hoots and slaps occur equally often \((p=1 / 4)\), but clicks are twice as common \((p=1 / 2)\). Assume the messages are otherwise random. (a) What is the Shannon entropy in this language? More specifically, what is the Shannon entropy rate (entropy per sound, or letter, transmitted)? (b) Show that a communication channel transmitting bits (ones and zeros) can transmit no more than one unit of Shannon entropy per bit. (Hint: this should follow by showing that, for \(N=2^{n}\) messages, equation \(5.38\) is maximized by \(p_{m}=1 / N\). You needn't prove it's a global maximum: check that it is a local extremum. You'll need either a Lagrange multiplier or will need to explicitly set \(\left.p_{N}=1-\sum_{m=1}^{N-1} p_{m .}\right)\) (c) In general, argue that the Shannon entropy gives the minimum number of bits needed to transmit the ensemble of messages. (Hint: compare the Shannon entropy of the \(N\) original messages with the Shannon entropy of the \(N\) (shorter) encoded messages.) Calculate the minimum number of bits per letter on average needed to transmit messages for the particular case of an A'bç! communication channel. (d) Find a compression scheme (a rule that converts \(a\) \(A\) 'bç! message to zeros and ones, that can be inverted to give back the original message) that is optimal, in the sense that it saturates the bound you derived in part (b). (Hint: Look for a scheme for encoding the message that compresses one letter at a time. Not all letters need to compress to the same number of bits.) Shannon also developed a measure of the channel capacity of a noisy wire, and discussed error correction codes...
Short Answer
Step by step solution
Key Concepts
These are the key concepts you need to understand to accurately answer the question.