Catalog Technologies, Inc., a leader in DNA-based digital data storage and computation, has made a historic breakthrough in DNA computation by demonstrating the ability to search data stored in DNA in a massively parallel and scalable manner with resource usage almost independent of the data size.
This demonstration is a result of CATALOG’s ongoing collaboration with potential customers and partners in understanding their use cases. With its revolutionary platform, and using text as an example, CATALOG was able to show how chemistry could be leveraged to compute over archives in parallel.
In September, using an innovative combinatorial writing scheme, CATALOG encoded approximately 17,000 words from Shakespeare’s Hamlet into DNA in a few minutes on Shannon, CATALOG’s flagship writer. On this DNA archive, CATALOG performed a parallel search computation and successfully retrieved all occurrences of a query word. The approach required no complex pre-processing or indexing. Instead, CATALOG’s approach leveraged the massively parallel nature of chemistry to retrieve all occurrences of the query word in a number of steps that is almost independent of the size of the dataset. Thus, the number of steps required would be approximately the same if the dataset had 170,000 or 170 million words.
To show this, in November, CATALOG encoded approximately 200,000 words of eight Shakespeare tragedies into DNA. To search and retrieve all occurrences of a query word in all eight plays would require approximately the same number of chemical computing steps, time, and resources as the initial Hamlet search. CATALOG is on track to demonstrate this search scalability on data sets containing over 100 million words by mid-2023. CATALOG’s innovative approach shows, for the first time, how to leverage the massive parallelism of DNA chemistry to search almost any amount of data stored in DNA without the expected proportional increase in resources.
DNA-based Compared to Traditional Computers for Search
Search is a foundational element of computing. When searching on the Internet, queries are often returned quickly because of the time-consuming and costly process of indexing data. However, over 90% of enterprise data is unstructured, making it expensive and, in some cases, impossible to search effectively. This is a critical barrier in cases where a lack of timely search results can lead to missed insights which can have costly long-term implications in many industries, including oil and gas, finance, and government.
Why DNA for Computation
In recent years, the IT industry has witnessed a proliferation of purpose-fit technologies, including accelerators like GPUs, quantum computers, and extreme parallel computers.
This performance and scale, however, comes at the expense of higher energy consumption, larger memory and long-term storage demands, and higher management complexity. This has generated tremendous interest and momentum in chemistry-based DNA computing systems, which have a far smaller physical footprint, consume orders of magnitude lower energy, and are resistant to traditional electronic security vulnerabilities.
Not all Data Stored in DNA is Created Equal
While many in research and academia are developing approaches to use DNA as a storage platform for archival purposes, CATALOG’s proprietary approach to encoding data in DNA is uniquely positioned for computing at scale to gain critical insights into data stored in DNA.
Many researchers and labs testing DNA-based storage focus on storing information densely inside the DNA molecule. CATALOG turns this idea on its head and stores information in a specific collections of DNA molecules. Unlike other approaches, this allows CATALOG latitude in designing the DNA sequence that is optimal for computing and to make writing orders of magnitude more efficient.
In addition to proving DNA computing capability, with this achievement CATALOG has also demonstrated how powerful computing capabilities can increase the efficiency and cost-effectiveness of reading data back from DNA – currently a significant challenge for the field – by orders of magnitude.
“This historic and transformational achievement is based on years of work with partners and collaborators that helped make DNA-based computation a reality,” said Hyunjun Park, Ph.D., founder and CEO at CATALOG. “With the advantages of DNA-based data storage and computation demonstrated, we now turn our attention to addressing more sophisticated applications from signal processing to machine learning over massive datasets. In parallel, we are working closely with partners and collaborators to reduce the size and complexity of our platform and to identify specific workloads to target commercial offerings.”
CATALOG’s Vision for DNA Computing Technologies in the Enterprise
This historic achievement is a key element of CATALOG’s plans to develop DNA-based storage and computation solutions.
CATALOG is accelerating the vision of DNA computing by making advances in DNA computing algorithms and applications with potential widespread commercial use in areas including artificial intelligence, machine learning, data analytics, and secure computing. In addition, CATALOG is developing solutions for DNA-based information security, a rack-sized and desk-sized DNA data storage and computation platform, DNA data storage as a service, and a DNA data storage and computing API.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW