Since binary became a computer language, the pace of development of human society has accelerated dramatically, and 0 and 1 have changed the way humans live in countless circuits. The emergence of computers is like a big explosion in the information age, allowing humans to quickly process a large amount of data, whether it is daily life or scientific frontier research has been earth-shaking changes.
A lot of data, bring changes, but also bring tests. According to statistics, the total amount of global data in 2021 will reach 84.5 ZB. (1ZB = 1021B) Such a huge amount of data not only puts forward high requirements on computing power, but also puts forward extremely high requirements on storage power. In order to store such a large amount of data, the data is also living in “buildings” such as data centers. By 2024, the number of hyperscale data centers worldwide could reach 1,000. More and more data centers are built, but the land resources are limited, and the construction of data center “skyscrapers” is a luxury thing, so increasing the data storage density becomes another solution.
In search of more efficient storage vehicles, researchers have turned their attention to DNA, the carrier of genetic information in nature. As a genetic term, DNA is no stranger to the public. In the process of heredity, DNA sequence stores genetic information, and then copies the genetic information through the process of transcription and translation to maintain biological development and normal operation. Researchers have speculated that aliens (or advanced civilizations) have stored some information in the genomes of living things, waiting for humans to decode it. This may seem like science fiction speculation, but it is based on an important fact: DNA has conveyed important information about human evolution for thousands of years, and is one of the densest and most stable information media known.
How does DNA storage work, and what difference does it make?
01 Is DNA storage reliable?
On a technical level, DNA storage has been proven to work.
The idea of storing information in DNA dates back to the dawn of molecular biology. Biochemist Frederick Sanger invented the Sanger sequencing method to make DNA sequences measurable, allowing humans to read nucleotide sequences in combinations with the code names A, T, C, and G. Since 0s and 1s can become computer languages, it is also possible to use DNA sequences to convey specific information. However, at the time, it cost $6,000 to synthesize a 10-base DNA sequence, although the material was good, and the price was too high.
With the development of new technologies for DNA synthesis and sequencing, DNA as a digital storage medium is no longer a fantasy. In 2001, a team wrote two Dickensian quotes into a DNA sequence. Use three bases to represent an English letter, such as A = AAA, B = AAC. In 2009, a team of researchers successfully encoded the lyrics, sheet music and a picture of the children’s song “Mary Had a Little Lamb” into a collection of DNA sequences.
There are two main advantages of DNA storage. First, the storage conditions are simple, as long as the DNA is kept low enough temperature, the data can be stored for thousands of years, so the cost of ownership is almost reduced to zero; DNA can pack huge amounts of data at a density far greater than that of electronic devices. DNA storage technology is more suitable for storing important “cold data” that does not need to be frequently accessed and called. “Cold data” in the case of near zero energy consumption, in theory, can be stored for more than 1,000 years. In the future, DNA storage is likely to become the main storage medium for large cold data storage.
Second, DNA storage density, small footprint, if stored in the form of DNA, each film can be stored in a space smaller than a sugar cube. According to calculations published in Nature Materials in 2016 by George Church of Harvard University and colleagues, the simple bacterium Escherichia coli has a storage density of about 1019 bits per cubic centimeter. At that density, a DNA cube with sides about a meter long could well meet the world’s current storage needs for a year. In terms of weight, the data storage capacity per gram of DNA can reach 215 petabytes, which is about 2,2544, 3,840 gigabytes (GB), equivalent to the data storage capacity of 220,000 1TB hard drives.
02 Breakthroughs have been made in DNA storage
There have been some breakthroughs in DNA storage in recent years. DNA is already being used to manage data in different ways by researchers who are struggling to make sense of massive amounts of data. Recent advances in next-generation sequencing technology allow for easy simultaneous reading of billions of DNA sequences. With this ability, researchers can use DNA sequences as molecular recognition “tags” to track experimental results.
The Harvard team used CRISPR DNA editing technology to record images of human hands into the genome of E. coli bacteria, reading them with more than 90 percent accuracy. Researchers in Switzerland have devised a “DNA-of-things” (DoT) storage architecture to produce materials with immutable memory. In the DoT frame, DNA molecules record data, and these molecules are then encapsulated in nano-silica nanobeads, which are fused into a variety of materials for printing or casting objects of any shape.
Researchers at the University of Washington and Microsoft Research have developed a fully automated system for writing, storing, and reading data encoded by DNA.
In December 2021, Chinese DNA storage researchers announced the development of a sliding chip – a microfluidic device capable of preserving DNA chemicals and various reagents. A sliding chip can be an electrode whose charge changes with the presence/absence of a DNA sequence.
In 2022, the synthetic biology team of Tianjin University successfully stored 10 selected Dunhuang murals into DNA, and said that the information of these murals can be preserved for 1,000 years at room temperature and 20,000 years at 9.4 ° C.
03 Giant endorses DNA storage technology
Although DNA storage technology may have trans-generational significance, can it actually be applied? The giants of the storage industry are positive. Gurtej Sandhu, senior researcher and vice president of Micron Technology, was one of the first project team members involved in DNA storage technology. In 2016, he was involved with Harvard’s George M. Church’s research group. Seagate has brought Catalog’s DNA storage technology to its “lab on a chip.” Seagate’s DNA storage and microfluidic research project has been ongoing for two and a half years, and currently has four known patent applications.
The company, which partnered with Seagate, is a U.S. startup founded in 2016 that used to store data by making 20-30 base pairs of DNA fragments, stitching them together with enzymes, and arranging them in different sequences. Catalog used DNA technology to store the novel “The Hitchhiker’s Guide to the Galaxy” and the poem “The Road Not Taken.”
Storage giants are bullish on DNA storage technology, but the DNA storage circuit is more dominated by biotechnology-focused startups. The core reason for this phenomenon is that the underlying key technologies of DNA storage technology are actually DNA sequencing technology, DNA synthesis technology and DNA storage technology.
In addition to Catalog, which has partnered with Seagate, the main company behind DNA data storage technology is Iridia, an American startup. Iridia was founded in 2016 to develop the world’s first commercially attractive DNa-based data storage solution. By combining DNA polymer synthesis techniques, electronic nanoswitches, and semiconductor manufacturing techniques, the company is developing a highly parallel format to give nanomodule arrays the potential to store data at extremely high densities.
Companies involved in DNA synthesis technology include DNA Script, a French company, and Molecular Assemblies, an American company.
Founded in 2014, DNA Script focuses on the manufacture of synthetic DNA using proprietary template-free technology. Rapid, economical and high-quality DNA synthesis technology greatly accelerates the development of new therapies, sustainable chemical production, improved crops, and new applications such as data storage. The company’s unique enzymatic technology and nucleotide chemical synthesis platform can synthesize longer DNA sequences with higher purity, increasing sequence accuracy by 500 times, faster synthesis speed, and reducing time by 50 times.
Founded in 2013, Molecular Assemblies develops enzymatic DNA synthesis technologies that power new products in industrial synthetic biology, personalized therapy, precision diagnostics, and information storage, nanotechnology, and more. The company’s proprietary DNA synthesis methods are designed to provide cost-effective, reliable and sustainable production of high-quality, sequence-specific DNA.
Founded in 2013, Twist Bioscience provides high-throughput DNA synthesis and sequencing services to customers in the medical, agricultural, industrial chemicals and data storage sectors. The semiconductor-based synthetic DNA manufacturing process developed by the company reduces the reaction volume by a factor of 1 million while increasing the yield by a factor of 1,000, resulting in the comprehensive synthesis of 9,600 genes on a single silicon wafer. In 2016, Microsoft signed an agreement with Twist Bioscience to order about 10 million DNA products to test DNA data storage capabilities.
DNA sequencing companies include Oxford Nanopore Technolog, a British company. Oxford Nanopore Technologies was founded in 2005 to develop disruptive electronic single-molecule sensing systems based on nanopore science. Oxford Nanopore Technologies has developed a new generation of sensing technology that uses nanopores – nanoscale holes – embedded in high-tech electronic devices for comprehensive molecular analysis.
In China, in 2019, Huawei announced the establishment of the Strategic Research Institute, saying that it will mainly research and develop cutting-edge technologies, including DNA storage. At the Huawei Global Analyst Conference in 2021, Xu Wenwei, director of Huawei and director of the Strategic Research Institute, said that he would use DNA storage to break through the super-large storage space model and coding technology and break the capacity wall.
On May 26, 2021, China Carbon Yuan (Shenzhen) Biotechnology Co., LTD. (C-ATOM) was officially established. In September this year, relying on the early accumulation in the field of DNA storage by the team of Dai Junbiao, a researcher from Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, ZhongKCO adopted the DNA online codec system (ATOM), which was independently developed and has independent intellectual property rights, and used the self-introduced synthesizer and sequencer. The complete process of DNA storage technology path from coding, synthesis, preservation, sequencing and decoding has been successfully completed.
04 The challenges and potential of DNA storage
At present, there are still some technical problems in the implementation of DNA storage technology, Fan Chunhai, an academician of the Chinese Academy of Sciences, said that in the synthesis process of DNA storage, the efficiency of data input and reading is still not high, and the time is long and the cost is high. Yuan Yingjin, academician of the Chinese Academy of Sciences and vice president of Tianjin University, said that DNA information storage is an emerging research direction with deep cross-integration of multiple disciplines. If we want to commercialize DNA storage technology, we need a multi-field research team to work together.
If only cost was the problem, this could eventually be solved. There is no doubt that DNA storage is one of the most promising data storage methods.