After the fall of the Roman Empire, a dark age came in which a large part of the ancient knowledge was lost in Europe. Manuscripts were destroyed in wars and fires, or became illegible before getting transcribed. Much of their content had to be re-discovered, but it took centuries to recover it.
We may be at a similar turning point, according to Vinton Cerf. The computer scientist and “father” of the Internet warned at the Heidelberg Laureate Forum that we face massive digital oblivion if urgent action is not taken. “The director of the library of Alexandria has scanned several manuscript, but it may be that the digital version is less durable than the original one!”, he said.
Will historians of the future be able to study the famous Merkel’s text messages, they become unreadable due to mobile software updated? Will Word be able to read old contracts or official texts, if already now it cannot read texts written in its oldest versions? Will future scientist be capable to interpret climate data stored electronically, if the programs they use now will be quickly overcome by better ones?
“You will become ghosts in history if you don’t find ways to preserve digital information”, said Cerf in his talk at the Forum. “I already cannot access anymore some emails that have been very important from the history of the Internet, because the email software that I used does not exist anymore”, he said in a long press conference he gave later. “Even worse, I have electronic records of the ‘70s that I cannot even play: they are on tape!”, he said. That time, in which electronic supports started to overtook papers, may be the beginning of the new digital Middle Age.
Cerf is “chief Internet evangelist” at Google, which has been recently obliged to recognize the right to be forgotten. So one could ask whether his defence of digital preservation, as opposed to digital oblivion, has a corporate bias. However, he acknowledges that the problem is double edged. “The internet is a strange thing: it remember things we don’t want to remember and it forgets things we do want to remember”, he said.
The seriousness of the preservation problem has already triggered in 2013 a UNESCO conference held in Vancouver, in which about 400 librarians and other experts met to tackle it. At the same time, scientific projects set to generate enormous amounts of data, like the Large Synoptic Survey Telescope (LSST) or the Large Hadron Collider (LHC) have specific data preservation schemes.
But the solution is not trivial, according to Cerf. A lot of information is stored on precarious supports, like tapes whose iron oxide gets degraded in such a way that they become illegible. But it’s not only a matter of making backups to more modern supports.“There are plenty of different codes aimed at transforming bits into pictures, video, and sound: we should keep a library of them, otherwise there would be no way to interpret the stored information”, Cerf pointed out.
Other data are even more difficult. “A spreadsheet or a videogame is not something static, that can be printed out somewhere: it is strongly associated with the software that executes it: this makes preservation much harder”, Cerf said. Different updates of the software and of the operating systems should be stored, in order to maintain access to those contents.
The case of scientific data shows a further complication of the issue. “We may record measurements in simple list of numbers, but then who knows whether those number refer to temperature or pressure? Or in which unity they are expressed? It’s not enough to record data: also metadata is needed”, said Cerf.
The challenge is so big that one may opt for giving up, trusting that the really important information will be kept by somebody, somewhere, somehow. “That’s not a good idea: sometimes, the importance of a certain piece of information is not understood until centuries later”, replies Cerf. Ongoing digital preservation projects don’t seem to work. “In the ‘90s, electronic data interchange became popular, but each sector of economy made its own, and they could not talk to each other: it did not work very well”, Cerf pointed out.
Cerf’s proposal is much more radical. “Copyright needs to be amended to take into account the importance of archiving. We need to agree that some institution or authority is granted access to software, with a commitment to copy it to preserve information”, he said. Source codes, programming languages, compilers, operating systems in their different versions, programs capable of emulating old machines… all these should be archived. “We need to remember an increasing amount of information to remember the existing information”, said Cerf. Additionally, more research should be fostered in long-lasting memory hardware, and software standards thought with preservation in mind.
But will companies be open to such an proposal? “Already now, when a company assigns the development of a software to another one, they agree by contract to keep a copy of the software somewhere, in case the second goes bankrupt, for example”, he said. “We should be ready to give up a certain amount of freedom in exchange of preservation. A society owes to its future the preservation of its past”, Cerf concluded.