/*! This file is auto-generated */ .wp-block-button__link{color:#fff;background-color:#32373c;border-radius:9999px;box-shadow:none;text-decoration:none;padding:calc(.667em + 2px) calc(1.333em + 2px);font-size:1.125em}.wp-block-file__button{background:#32373c;color:#fff;text-decoration:none} Problem 7 Entropy can be viewed as a measu... [FREE SOLUTION] | 91Ó°ÊÓ

91Ó°ÊÓ

Entropy can be viewed as a measure of the lack of information you have about a system. Claude Shannon [112] realized, back in the 1940 's, that communication over telephone wires amounts to reducing the listener's uncertainty about the sender's message, and introduced a definition of an information entropy. Most natural languages (voice, written English) are highly redundant; the number of intelligible fifty-letter sentences is many fewer than \(26^{50}\), and the number of ten-second phone conversations is far smaller than the number of sound signals that could be generated with frequencies between up to \(20,000 \mathrm{~Hz} .^{44}\) Shannon, knowing statistical mechanics, defined the entropy of an ensemble of messages: if there are \(N\) possible messages that can be sent in one package, and message \(m\) is being transmitted with probability \(p_{m}\), then Shannon's entropy is $$ S_{I}=-k_{S} \sum_{1}^{N} p_{m} \log p_{m} $$ where instead of Boltzmann's constant, Shannon picked \(k_{S}=1 / \log 2\) This immediately suggests a theory for signal compression. If you can recode the alphabet so that common letters and common sequences of letters are abbreviated, while infrequent combinations are spelled out in lengthly fashion, you can dramatically reduce the channel capacity needed to send the data. (This is lossless compression, like zip and gz and gif). An obscure language A'bç! for long-distance communication has only three sounds: a hoot represented by A, a slap represented by \(\mathrm{B}\), and a click represented by C. In a typical message, hoots and slaps occur equally often \((p=1 / 4)\), but clicks are twice as common \((p=1 / 2)\). Assume the messages are otherwise random. (a) What is the Shannon entropy in this language? More specifically, what is the Shannon entropy rate (entropy per sound, or letter, transmitted)? (b) Show that a communication channel transmitting bits (ones and zeros) can transmit no more than one unit of Shannon entropy per bit. (Hint: this should follow by showing that, for \(N=2^{n}\) messages, equation \(5.38\) is maximized by \(p_{m}=1 / N\). You needn't prove it's a global maximum: check that it is a local extremum. You'll need either a Lagrange multiplier or will need to explicitly set \(\left.p_{N}=1-\sum_{m=1}^{N-1} p_{m .}\right)\) (c) In general, argue that the Shannon entropy gives the minimum number of bits needed to transmit the ensemble of messages. (Hint: compare the Shannon entropy of the \(N\) original messages with the Shannon entropy of the \(N\) (shorter) encoded messages.) Calculate the minimum number of bits per letter on average needed to transmit messages for the particular case of an A'bç! communication channel. (d) Find a compression scheme (a rule that converts \(a\) \(A\) 'bç! message to zeros and ones, that can be inverted to give back the original message) that is optimal, in the sense that it saturates the bound you derived in part (b). (Hint: Look for a scheme for encoding the message that compresses one letter at a time. Not all letters need to compress to the same number of bits.) Shannon also developed a measure of the channel capacity of a noisy wire, and discussed error correction codes...

Short Answer

Expert verified
(a) 1.5 bits/sound; (b) Shannon entropy max is 1 bit/bit. (c) Minimum is 1.5 bits/letter. (d) Encode A as '10', B as '11', C as '0'. Optimal compression achieved.

Step by step solution

01

Determine Probabilities

For the language A'bç!, there are three sounds: A, B, and C. We determine the probability for each sound given in the problem. Let \( p_A = p_B = 1/4 \), and \( p_C = 1/2 \). This is because A and B occur equally often, while C is twice as common.
02

Shannon Entropy Formula

The Shannon entropy \( S_I \) is calculated as follows: \[ S_I = -k_S \sum_{i=1}^N p_i \log_2(p_i) \] where \( k_S = 1 \) since \( k_S = 1 / \log 2 \). We substitute in the probabilities: \( p_A = 1/4 \), \( p_B = 1/4 \), and \( p_C = 1/2 \).
03

Calculate Shannon Entropy for A'bç!

Calculating each term separately: \[ -((1/4) \cdot \log_2(1/4) + (1/4) \cdot \log_2(1/4) + (1/2) \cdot \log_2(1/2)) \]\[ S_I = -(1/4 \cdot -2 + 1/4 \cdot -2 + 1/2 \cdot -1) = 1.5 \text{ bits} \] Thus, the Shannon entropy per sound or letter for this language is 1.5 bits.
04

Maximum Entropy for Binary Channel

For a binary channel (bits), maximize entropy for \( N = 2^n \) messages by using equal probabilities. Check the condition using calculus, particularly the use of Lagrange multipliers or constraints to show \( p_m = 1/N \) maximizes entropy. For binary, \( p_0 = p_1 = 1/2 \), so \( S_I^{\text{binary}} = -2(1/2 \log_2(1/2)) = 1 \text{ bit} \).
05

Compare Entropy of Encoded Messages

Recognize that Shannon entropy represents the minimum average number of bits needed to transmit the messages. Calculate the average bits needed using the function: \[ \text{Average bits/letter} = \sum p_m \cdot \text{number of bits per letter} \]. Given the entropy calculated, encoding must deliver 1.5 bits/letter on average.
06

Optimal Compression Scheme

Develop a rule that translates the sounds into binary, aiming to match the entropy rate. Use a shorter encoding for more frequent sounds (C) and longer for less frequent (A, B). For example, use: A as '10', B as '11', and C as '0', giving an optimal compression to match step 4's requirements.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with 91Ó°ÊÓ!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Information Theory
Information theory is a field of study that focuses on quantifying, storing, and communicating information. It was developed by Claude Shannon in the 1940s to address problems related to communication over channels like telephone wires. Shannon introduced the concept of entropy as a way to measure the uncertainty or surprise associated with a set of messages. In information theory:
  • Entropy quantifies the amount of information or uncertainty in a message.
  • Higher entropy means more uncertainty or disorder.
  • It helps in reducing redundancy and optimizing data transmission.
For example, in a language with redundant symbols, understanding entropy allows us to streamline communication by reducing unnecessary repetition. This is the foundation of modern communication systems and data compression techniques.
Signal Compression
Signal compression is a method used to reduce the amount of data needed to represent a signal without losing vital information. This is particularly important in contexts where data storage or bandwidth is limited. Signal compression is built on the principles of entropy from information theory:
  • Lossless compression means reducing file sizes without losing any original data, like in ZIP files.
  • Common sounds or letters are encoded using fewer bits, while rarer ones use more bits.
  • This technique maximizes the efficiency of data transmission, saving both time and resources.
For instance, in the exercise, the A'bç! language can be optimally compressed by assigning shorter binary codes to frequent sounds. This approach reduces the overall channel capacity required to relay information efficiently.
Probability Distribution
A probability distribution is a mathematical function that provides the probabilities of occurrence of different possible outcomes in an experiment. In the context of information theory and Shannon entropy, understanding the probability distribution of a system's possible states is key to calculating its entropy.
  • The probability of each sound in the A'bç! language reflects how often it occurs.
  • The probabilities for sounds A, B, and C are 1/4, 1/4, and 1/2, respectively.
  • The sum of all probabilities in a proper distribution is always equal to 1.
In signal processing, these probabilities are used to determine the best way to encode the data so that it requires the fewest bits on average. By calculating the entropy using these probabilities, we can establish the minimum number of bits required to send messages.
Binary Encoding
Binary encoding denotes the process of representing data using the binary number system, which includes only two numbers: 0 and 1. In digital communications, binary encoding is crucial as it forms the foundation of how information is shared over digital communication channels.
  • Data is broken down into a sequence of bits (binary digits), which represent the original information.
  • An optimal encoding scheme ensures that frequently occurring symbols use fewer bits, such as using '0' for sound C in the A'bç! language.
  • This approach can be adapted and optimized based on the computed entropy and probability distribution of data.
Binary encoding doesn't just simplify storage and transmission; it also ensures that the maximum channel capacity is utilized efficiently. By translating sounds or signals into binary codes, systems can transmit messages more swiftly and accurately, adhering to the limits established by Shannon's entropy and channel capacity principles.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Our refrigerator is about \(2 m \times 1 m \times 1 m\), and has insulation about \(3 \mathrm{~cm}\) thick. The insulation is probably polyurethane, which has a thermal conductivity of about \(0.02 \mathrm{~W} /(\mathrm{m} \mathrm{K})\). Assume that the refrigerator interior is at \(270 \mathrm{~K}\), and the room is at \(300 \mathrm{~K}\). (a) How many watts of energy leak from our refrigerator through this insulation? Our refrigerator runs at \(120 \mathrm{~V}\), and draws a maximum of \(4.75\) amps. The compressor motor turns on every once in a while for a few minutes. (b) Suppose (i) we don't open the refrigerator door, (ii) the thermal losses are dominated by the leakage through the foam and not through the seals around the doors, and (iii) the refrigerator runs as a perfectly efficient Carnot cycle. How much power on average will our refrigerator need to operate? What fraction of the time will the motor run?

(Math, Complexity) (With Myers. [75]) Let's consider a dynamical system, given by a mappingfrom the unit interval \((0,1)\) into itself: \({ }^{47}\) $$ f(x)=4 \mu x(1-x) $$ where the time evolution is given by iterating the map: $$ x_{0}, x_{1}, x_{2}, \cdots=x_{0}, f\left(x_{0}\right), f\left(f\left(x_{0}\right)\right), \ldots $$ In particular, for \(\mu=1\) it precisely folds the unit interval in half, and stretches it (non-uniformly) to cover the original domain. The mathematics community lumps together continuous dynamical evolution laws and discrete mappings as both being dynamical systems. You can motivate the relationship using the Poincaré sections (figure 4.4), which connect a continuous recirculating dynamical system to the once-return map. The mapping \(4.11\) is not invertible, so it isn't directly given by a Poincaré section of a smooth differential equation \(^{48}\) but the general stretching and folding exhibited by our map is often seen in driven physical systems without conservation laws. In this exercise, we will focus on values of \(\mu\) near one, where the motion is mostly chaotic. Chaos is sometimes defined as motion where the final position depends sensitively on the initial conditions. Two trajectories, starting a distance \(\epsilon\) apart, will typically drift apart in time as \(\epsilon e^{\lambda t}\), where \(\lambda\) is the Lyapunov exponent for the chaotic dynamics. Start with \(\mu=0.9\) and two nearby points \(x_{0}\) and \(y_{0}=x_{0}+\epsilon\) somewhere between zero and one. Investigate the two trajectories \(x_{0}, f\left(x_{0}\right), f\left(f\left(x_{0}\right)\right), \ldots f^{[n]}\left(x_{0}\right)\) and \(y_{0}, f\left(y_{0}\right), \ldots\) How fast do they separate? Why do they stop separating? Estimate the Lyapunov exponent. (Hint: \(\epsilon\) can be a few times the precision of the machine (around \(10^{-17}\) for double precision arithmetic), so long as you are not near the maximum value of \(f\) at \(x_{0}=0.5\).) Many Hamiltonian systems are also chaotic. Two configurations of classical atoms or billiard balls, with initial positions and velocities that are almost identical, will rapidly diverge as the collisions magnify small initial deviations in angle and velocity into large ones. It is this chaos that stretches, folds, and kneads phase space (as in the Poincaré cat map of exercise \(5.4\) ) that is at root our explanation that entropy increases. \({ }^{49}\)

Entropy is a measure of your ignorance about a system: it is a measure of the lack of information. It has important implications in communication technologies: messages passed across the Ethernet communicate information, reducing the information entropy for the receiver. Shannon [112] worked out the use of entropy ideas in communications, focusing on problems where different messages have different probabilities. We'll focus on the simpler problem where all \(N\) messages are equally likely. Shannon defines the information entropy of an unread message as being \(\log _{2} N=k_{S} \log N\), where \(k_{S}=1 /\left(\log _{e} 2\right)\) is analogous to Boltzmann's constant, and changes from log-base- \(e\) to log-base-2 (more convenient for computers, which think in base two.) Your grandparent has sent you an e-mail message. From the header of the message, you know it contains 1000 characters. You know each character is made of 8 bits, which allows \(2^{8}=256\) different letters or symbols per character. Assuming all possible messages from your grandparent are equally likely (a typical message would then look like G*me!8V[beep]...), how many different messages \(N\) could there be? This (unrealistic) assumption gives an upper bound for the information entropy \(S_{\max } .\) (a) What \(S_{\max }\) for the unread message? Your grandparent writes rather dull messages: they all fall into the same pattern. They have a total of 16 equally likely messages. \({ }^{43}\) After you read the message, you forget the details of the wording anyhow, and only remember these key points of information. (b) What is the actual information entropy change \(\Delta S_{\text {Shannon }}\) you undergo when reading the message? If your grandparent writes one message per month, what is the minimum number of 8-bit characters per year that it would take to send your grandparent's messages? (You may lump multiple messages into a single character.) (Hints: \(\Delta S_{\text {shannon }}\) is the change in entropy from before you read the message to after you read which of 16 messages it was. The length of 1000 is not important for this part.) Remark: This is an extreme form of data compression, like that used in gif images, zip files (Windows) and gz files (Unix). We are asking for the number of characters per year for an optimally compressed signal.

Freeman Dyson [28] discusses how living things might evolve to cope with the cooling and dimming we expect during the heat death of the universe. Normally one speaks of living things as beings that consume energy to survive and proliferate. This is of course not correct: energy is conserved, and cannot be consumed. Living beings intercept entropy flows: they use low entropy sources of energy (e.g., high temperature solar radiation for plants, candy bars for us) and emit high entropy forms of the same energy (body heat). Dyson ignores the survival and proliferation issues; he's interested in getting a lot of thinking in before the universe ends. He presumes that an intelligent being generates a fixed entropy \(\Delta S\) per thought. \({ }^{38}\) (This correspondence of information with entropy is a standard idea from computer science: see exercises \(5.6\) and 5.7.) Energy needed per thought. Assume that the being draws heat \(Q\) from a hot reservoir at \(T_{1}\) and radiates it away to a cold reservoir at \(T_{2}\). (a) What is the minimum energy \(Q\) needed per thought, in terms of \(\Delta S\) and \(T_{2}\) ? You may take \(T_{1}\) very large. Related formulæ: \(\Delta S=Q_{2} / T_{2}-Q_{1} / T_{1}\); First Law: \(Q_{1}-Q_{2}=W\) (energy is conserved). Time needed per thought to radiate energy. Dyson shows, using theory not important here, that the power radiated by our intelligent-being-as-entropy- producer is no larger than \(C T_{2}^{3}\), a constant times the cube of the cold temperature. \({ }^{39}\) (b) Write an expression for the maximum rate of thoughts per unit time \(d H / d t\) (the inverse of the time \(\Delta t\) per thought), in terms of \(\Delta S, C\), and \(T_{2}\). Number of thoughts for an ecologically efficient being. Our universe is expanding: the radius \(R\) grows roughly linearly in time \(t\). The microwave background radiation has a characteristic temperature \(\Theta(t) \sim R^{-1}\) which is getting lower as the universe expands: this redshift is due to the Doppler effect. An ecologically efficient being would naturally try to use as little heat as possible, and so wants to choose \(T_{2}\) as small as possible. It cannot radiate heat at a temperature below \(T_{2}=\Theta(t)=A / t\). (c) How many thoughts \(H\) can an ecologically efficient being have between now and time infinity, in terms of \(\Delta S\), \(C, A\), and the current time \(t_{0}\) ? Time without end: Greedy beings. Dyson would like his beings to be able to think an infinite number of thoughts before the universe ends, but consume a finite amount of energy. He proposes that his beings need to be profligate in order to get their thoughts in before the world ends: he proposes that they radiate at a temperature \(T_{2}(t) \sim t^{-3 / 8}\) which falls with time, but not as fast as \(\Theta(t) \sim t^{-1}\). (d) Show that with Dyson's cooling schedule, the total number of thoughts \(H\) is infinite, but the total energy consumed \(U\) is finite. We should note that there are many refinements on Dyson's ideas. There are potential difficulties that may arise like to quantum limits to cooling or proton decay: how will we make bodies out of electrons and neutrinos? And there are different challenges depending on the expected future of the universe: a big crunch, for example, where the universe collapses back on itself, demands that we adapt to heat and pressure (but an infinite number of thoughts appears to remains possible before the end).

We saw that entropy technically doesn't increase for a closed system, for any Hamiltonian, either classical or quantum. However, we can show that entropy increases for most of the coarse-grained effective theories that we use in practice: when we integrate out degrees of freedom, we provide a means for the information about the initial condition to be destroyed. Here you'll show that entropy increases for the diffusion equation. Diffusion Equation Entropy. Let \(\rho(x, t)\) obey the onedimensional diffusion equation \(\partial \rho / \partial t=D \partial^{2} \rho / \partial x^{2}\). Assume that the density \(\rho\) and all its gradients die away rapidly at \(x=\pm \infty .{ }^{42}\) Derive a formula for the time derivative of the entropy \(S=-k_{B} \int \rho(x) \log \rho(x) d x\) and show that it strictly increases in time. (Hint: integrate by parts. You should get an integral of a positive definite quantity.)

See all solutions

Recommended explanations on Physics Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.