Genome of Babel

There are many concerns about scientists or hobbyists accidentally creating dangerous pathogens.  While engineering pathogens is a serious concern there is little worry about creating a random sequence of DNA that is dangerous or even useful.  

The Library of Babel is a short story published in 1941 by Jorge Luis Borges.  Borges imagines a universe composed entirely of a library.  Every book in the library has the same format, 1,312,000 characters from a 25 character set.  The library contains a book with every combination of characters possible, 25^1,312,000 books.  If the library were built it would be much larger than the known universe.  Because the library contains every combination of letters it contains all knowledge.  It contains everyone's biography, a record of future events, every great work of literature, and any other information.  Of course, it also contains all incorrect knowledge too.  Many of the inhabitants of the Library go insane looking for books with real knowledge.  

In this analogy the books are genomes and the Library is the complete state space of genomes.  The average genome size is ~4 Mbp or 4 million base pairs.  Each base pair can be one of four bases, adenine, thymine, guanine, or cytosine, so the number of unique genomes in the state space is 4^4,000,000 or 10^2,408,240.  

The state space of genomes is so large that every living thing since the beginning of life on earth has collectively only explored a small segment.  DNA Polymerase typically has a 1 in 10,000,000 error rate, so assume every organism has only a single mutation in its 4,000,000 bp genome.  There are ~1.7x10^30 new prokaryotic cells every year, and life has existed for ~4x10^9 years.  So, there have been about 6.8x10^39 unique genomes explored in the state space.  This is a lot.  However 6.8x10^39 << 10^2,408,240, and there are still 10^2,408,201 unique genomes to explore.

While the argument for security through endless combinations works well in fields like cryptography where every sequence is truly independent, it is less guaranteed in biology because, like horseshoes and hand grenades, close counts in biology.  A good example is the Influenza virus.  Some strains are more deadly than others, like the the 1918 Spanish Flu or H1N1, but all are infectious and mildly harmful.  So, exploring the unique genomes around the flu is more dangerous than exploring the unique genomes around S. cerevisiae (yeast).

(The print to the left is by Érik Desmazières)