DNA data storage: 100 million hours of HD video in every cup
By Jonathan Keith
Shakespeare's sonnets, Martin Luther King's and Watson and Crick's seminal paper have been encoded in DNA and decoded successfully.
Biological systems have been using DNA as an information storage molecule for billions of years. Vast amounts of data can thus be encoded within microscopic volumes, and we carry the proof of this concept in the cells of our own bodies.
Could this ultimate storage solution meet the ever-growing needs of archivists in this age of digital information?
Stored in DNA
A team of researchers headed by Nick Goldman and Ewan Birney at the European Bioinformatics Institute of the European Molecular Biology Laboratory (EMBL-EBI) has dramatically demonstrated the potential of the technique to store and transport human-made data.
Their data included some well-chosen iconic elements: Shakespeare’s 154 sonnets, an audio excerpt from Martin Luther King’s “I have a dream” speech, Watson and Crick’s classic paper on the structure of DNA, and a colour photograph of the European Bioinformatics Institute.
These files, in common digital formats found on almost every desktop computer, were encoded byte-by-byte as DNA molecules, shipped from the USA to Germany without specialised packaging, and finally decoded back into their original electronic formats.
Although the study involved less than a megabyte of data in total, this is already orders of magnitude more than has previously been encoded as synthesised DNA.
The authors argue convincingly that the technique could eventually be scaled up to create a storage capacity far beyond all the digital information stored globally today (somewhere in the vicinity of 1 zettabyte or 1015 megabytes).
Perfect for data storage
DNA molecules are natural vehicles for digital information. They consist of four chemicals connected end-to-end like characters of an alphabet to form long strings similar to a line of text. DNA molecules are even more similar to the sequences of zeroes and ones that digital computers use to represent information.
DNA has substantial advantages over both printed text and electronic media. For one thing, it can remain stable for long periods of time with a minimum of care. Intact DNA has been extracted from bones (and other organic matter) tens of thousands of years old, and its sequence reconstructed with as much detail as if it had come directly from a living organism.
Another advantage of DNA over electronic media is that it requires no power supply to maintain its integrity, which makes it easy to transport and store, and potentially less vulnerable to technological failure.
Perhaps the greatest advantage of DNA as a storage medium is its minuteness. For example, EMBL-EBI’s official press release claims that more than 100 million hours of high-definition video could be stored in roughly a cup of DNA.
We’re getting there
DNA storage devices won’t be available in the supermarket any time soon. The major drawback is the current cost of synthesising DNA in the quantities required, estimated at around US$12,400 per megabyte of data stored.
This is cost-effective only for archives intended to last hundreds or even thousands of years – something few of us contemplate.
The main cost of maintaining electronic archives over such a long period of time is that the media have to be periodically replaced and the data copied, whereas DNA has merely to be stored somewhere cool, dry and dark.
But if the cost of synthesising DNA can be reduced by one or two orders of magnitude – which, judging by current trends could occur within a decade – DNA archives intended to last less than 50 years would become feasible.
Another issue is the cost of decoding the information stored in DNA, estimated at about US$220 per megabyte. At that price, DNA archives would only be rarely accessed. Yet this too could change in the near future, given the rapid pace of innovation in DNA-related technologies.
We shouldn’t let these practical issues distract from the significance of this exciting innovation.
As the inventors point out, the technique may already be economically viable and attractive for certain long-term, infrequently accessed archives, including some government and historical records, or science projects that generate massive amounts of data.
Examples of the latter include important large-scale experiments in particle physics, astronomy and medicine.
But perhaps the most exciting aspect of this proof-of-concept study is the impetus it provides to further innovation and the unexplored doors of possibility it opens.