Mathematical Theories of Communication
BLOG: Heidelberg Laureate Forum
A wonderful potted history of the theory of communication was capably presented by 2002 Nevanlinna Prize winner Madhu Sudan, who talked us through from the earliest mathematical thinking on the subject through to the present day, and his team’s work. It was also almost a love letter to one of his mathematical heroes, the father of information theory, Claude Shannon.
Most human communication has historically used speech or writing, but the digital age has changed this into the communication of bits and bytes of digital data. Of course, speech and writing can both also be encoded in this way, but communication theory as a subject is more concerned with the pure bandwidth of data for any type of communication.
There are many problems when you attempt to communicate any kind of information from one place to another. Communication is expensive, depending on how much data you are transferring; it can be noisy, as communication lines aren’t always perfect in transmitting the data; it sometimes needs to be interactive, so systems need to be able to communicate both ways; and it’s often contextual, and data transfer needs to be robust to people speaking different languages, or using different hardware/software.
Sudan gave a simple example of an early method (still used today in simple cases!) to encode a message before sending through a noisy system, so as to improve the likelihood of the message arriving intact – this is to simply repeat each character of the message three times. In his example, to prevent the message WE ARE NOT READY being catastrophically mis-received as WE ARE NOW READY (the opposite meaning!), you could send it in this form:
WWW EEE AAA RRR EEE NNN OOO TTT RRR EEE AAA DDD YYY
Now, even if some of your data is corrupted, as below:
WXW EEA ARA SSR EEE BNN OOO PTT RSR EEE AAS DFR IYY
Errors can still make it through – if, for example, as in the second-to-last group, two different errors happen, you can’t tell what the original letter was meant to be. In the fourth group there are two errors both going to the same incorrect letter, which would lead to an incorrect conclusion about that letter if you simply looked at the most common letter in the group.
So how do we improve this system? Of course, we could use groups of 5 or 10 characters – but as Sudan points out, increasing the number of repeats decreases the probability that all the symbols in a set might be corrupted, but it decreases the amount of information you’re actually able to communicate. For 100 characters of data sent, instead of sending a 100-letter message, you can now only send 10 or 20 letters. As the number of repeats increases to infinity, usefulness of your system drops to zero – and equally, as your message gets longer, the number of repeats you can afford drops so your likelihood of errors increases.
This fine balance between methods which are more likely to preserve your message with a higher certainty, and methods which will allow your system to send the largest amount of information, was long believed to be an unwinnable fight. That is, until the hero of Madhu Sudan’s talk came along – Claude Shannon. In the late 1940s Shannon worked out a theory of communication that changed the game completely.
He envisioned ways of encoding messages which would allow a good compromise, and worked out formulae to compare rates of communication for different encoders and decoders. Put simply, if your encoder and decoder are functions E and D which you can apply to a message M, you need:
M = D(E(M) + error) : with a high probability
If you know the probability that any given bit of data will be damaged by the transmission or storage method you’re using, you can find the kind of function you’ll need.
If the probability of a bit being flipped is 0, you can send the messages as they already are with no errors. If it’s as large as ½, you may as well send anything because the message received at the other end will be essentially random, so the rate of data transfer is 0.
Even with a probability of 0.4999, there would still be a non-zero rate of data transfer, and it’s possible to design systems that work under these conditions. Shannon’s work was revolutionary, and considered a major leap of faith, since at the time no computers existed, and even the functions he was imagining weren’t known yet.
He was the first to use a probabilistic approach to this kind of mathematics, and invented many deep concepts including that of entropy, of information and even coined the word ‘bit’ – a binary digit. His work was non-constructive, as the maths it has since been applied to was yet to be invented, but all subsequent technology has kept his ideas in mind.
Sudan also explained some later contributions from Chomsky (on language structure and human communication), Yao (on designing protocols for two-way communication, which won him a Gödel prize), and Schulman (on interactive coding schemes, useful for collaborative shared document editing).
He finished by mentioning some of his group’s research into communicating in situations where there is a large shared body of information, or ‘context’. For example, in giving his talk he’d assumed everyone in the room spoke English, and had given the talk in English on this basis.
However, if the shared context isn’t quite perfect, his system is still robust to this – if there’s a word he uses nobody else in the room knows, he can take a little extra time to define it. This will increase the length of the talk (and decrease the amount he can communicate in a given time), which must be taken into account. The systems he’s working on have this same kind of shared context, but must be robust to imperfectly shared context, and this has been explored in his recent work.
Katie Steckles wrote (1. October 2017):
> [If] there’s a word [a speaker] uses nobody else in the room knows, he can take a little extra time to define it.
… in terms of words with which the audience is already familiar.
> This will increase the length of the talk (and decrease the amount he can communicate in a given time)
Not necessarily.
For one, providing the audience with the (the speaker’s) definition of a “new word” adds itself to the “amount being communicated“.
Even more importantly, the “new word”, which had to be defined (introduced, made familiar, “decompressed”) to the audience, may subsequently be used again (and again) in the course of the talk. The descrease in length of the talk (resp. increase in amount being communicated) thereby achieved may even outweigh the “extra time” needed for providing the definition (once).
The introduction of “new words” may therefore even be intended to actually shorten a talk.
Whether the audience is able to readily absorb and apply the definition provided, and to actually understand the parts of the talk that are (attempted to) being communicated by using the “newly defined word”, given its definition (once), is a measure of (“the quality of”) the audience.
Follow-up questions and explanations may become necessary, and in turn lead to (unintended) lengths.