How To Store Digital Data In DNA


Who doesn't love something extra in everything? What if you can store all your digital data and memories from your childhood and access it whenever you need them or pass on to your next generation. Yes it is possible with the help of DNA. Yes the same DNA from our own genome, produced artificially in labs to store the information.

Recent technological advances in bio molecular science and bio technology are allowing us to synthesize DNA from scratch. These advancing methods are becoming cheaper and more efficient as time goes by. Bringing with it a variety of practical applications one of which is data storage on DNA.

Here we will know how data can be stored within DNA and why this technology has the potential to out-compete current storage methods. Now before we dive into the biological aspects, let me explain to you how current storage methods work using the flash drive or the USB.

 As an example essentially a flash drive is just a platform with millions of tiny transistors. These transistors are set up in circuits which interact, store information for your data. So we have one of many tiny transistors. These devices may vary how they relay the information, but their storage mechanisms are all the same.

How Does a Flash Drive store your data...

How To Store Digital Data In DNA
Open view of a Hard Drive

There is this oxide layer sandwiched between a control gate and a floating gate. This layer has the ability to store electrons without the need for power. This layer can be charged pushing the electrons through and relaying the bit information.

These transistors can be organized to store very complex information using this mechanism. There are however drawbacks to using flash storage. For instance if you were to damage and crack this oxide layer by let's say corrosion or by dropping it, this crack can lead to the leaking of electrons from the oxide layer corrupting all of your precious information.

Why choose DNA?

 The durability and the data storage capacity of these flash drives is enhanced every year but the danger of data corruption by physical damage will always be there.

Now what if I told you that DNA can bypass this problem. Deoxyribonucleic acid or DNA for sure is a biological molecule composed of nucleotides which code for the all genetic information in all forms of life. It exists in a double helical form with four nucleotide compositions.

 First off in DNA we have the purine molecule guanine. This purine binds with its pyrimidine complement cytosine. Secondly we have the purine adenine which binds to its complement thymine.

 So you may be thinking how can this biological molecule store our personal data. Sure it can store genetic information, but how can we mess with it to make it store pictures, books, movies and so on.
 Let's now learn why scientists nominated DNA as a potential data storage molecule, and the methods they developed which allow us to store our personal information on DNA.

So why George Church and his colleagues did described in their 2012 publication the advantages of using DNA as a new platform for data storage?

Reason behind choosing DNA to store digital data:

  •  Firstly the reason that DNA had natural data storage capabilities as we know it stores a complex array of genetic information as a result in nature DNA is constantly being read and written through enzymes and other bio molecules.
  •  Second reasoning was its resilience. DNA can withstand a large range of temperatures without degradation. It's also been shown that the information on DNA can still be read after degradation.
  • Lastly DNA provides grounds for nonplanar information storage. It can be condensed into tiny spaces much smaller than the planar organization of a transistor circuit on flash drives. So with these different aspects in mind let's have a look at how George church and his team went about storing non genetic data on DNA.

 They used the book how synthetic biology reinvent nature and ourselves as a sample of text to store. The words within this book were converted into binary code or a big code. This code was then further translated into a nucleotide sequence.

 Following this the desired nucleotide sequences were synthesized from scratch encrypting the data into nucleotide libraries.

 How can you access this information?

How To Store Digital Data In DNA
DNA converted into binary codes

You can do this by sequencing the nucleotide libraries using next-generation sequencing. The sequence DNA can then be decoded revealing the original bit language and further decoded to reveal the original words from the book. So let's have a closer look at this process. You begin with the words from your book, and it can be rewritten into a bit language. Essentially this language replaces all of the letters, spaces, numbers, upper cases, lower cases in any other literary aspects of a typical book with this unique code this bit language is just a combination of ones and zeroes.

You can further translate this bit language into a nucleotide sequence. George and his team did this by writing a code which allowed them to convert the whole book into bits then further into nucleotides. They also incorporated 19 bit barcode into each segment of their book to allow for rapid identification of each nucleotide sequence within the whole library.

 Overall they encoded one bit per base turning the whole 5.2 7 mega bit book into 54,000 898 nucleotides. These sequence also avoided four or more nucleotide repeats and had a balanced G-C content after they had their desired sequences determined. The DNA oligonucleotides were compiled using phosphoramidite chemistry which is a three-step chemical process which essentially adds single nucleotides on top of each other one by one. This process involves with the protection stage based coupling stage in an oxidation stage.

 Once the desired illegal sequence is achieved it is then capped by acetylation these illegal nucleotide strings are then compiled in a microarray chip. The legal libraries with sequences corresponding to the bit code of the original book.

Accessing the information back:

Let's now know about how George and his team were able to access their encoded information. They used next generation Illumina sequencing which is a very rapid way of sequencing DNA. First DNA from a micro array library undergoes reduced cycle PCR amplification through this amplification additional motives are introduced to the ends of the oligonucleotides.

 The oligos are then isothermally amplified on a flow cell. These flow cells have only oligos bound to their surface which are complementary to the motives which were added to each other go during PCR the illegals again cluster amplified by different polymerizes resulting in clonal amplification of all nucleotide fragments.

 Following this fluorescently tagged nucleotides are added to the single-stranded oligoes which are detected and allowed for sequencing by synthesis. The emission length and intensity of each bound nucleotide is analysed and determined using a computer software.

 The data of millions of different nucleotides is compiled and the sequences are determined with a high degree of accuracy. Following this George and his team aligned the sequence data using the incorporated 19 nucleotide barcode and utilized their bits to DNA code to convert the sequence back into bit language and from there back into words.

 From this experiment they were able to code 5.7 megabit bit stream in to 54,898 nucleotides achieving 5.5 Peta bits per millimetre cubed. This greatly exceeds current flash drive storage capacities. They also found an error rate of 10 bits per five point two seven million proving that DNA can be a highly reliable mode of data storage.

Drawbacks of this process:

1. Even though DNA has a much higher storage capacity and a larger durability than conventional transistor circuitry, its data storage abilities present some pretty significant drawbacks.

 2. For instance the actual legal synthesis and sequencing are all limiting steps in this process they are far slower than conventional methods.

3. Furthermore the cost of these technologies is very high presenting yet another obstacle for DNA as data storage.


 It would be sensible to use this method of DNA data storage for more archive type storage of very large files. It is also important to note that DNA synthesis and sequencing technologies are advancing at a considerable rate and becoming more and more accessible as the years go by.

In summary DNA has the potential to provide a solid ground for a very large and stable data storage platform. Advances in our abilities to both synthesizing, study DNA have shown us its potential regarding data storage. Unfortunately the technologies which would allow us to perform such tasks efficiently and in a cost-effective manner are still highly lacking with us. As previously mentioned both DNA synthesis and sequencing methods are becoming more and more accessible and cost-effective allowing for DNA data storage to become a reality in the near future.

Post a Comment