When I began researching this topic towards the end
of 2013, I sensed a certain skepticism from the scientific community,
particularly when people with different backgrounds started experimenting
between disciplines, which can reveal new vectors of IT security attacks.
In late 2015, when I presented my Master’s thesis
(in IT security) on “Malware that infects genomes,” I experienced that
skepticism up close. During the revision process, one of the professors, who
was a specialist in molecular biology, branded it as “erudite
nonsense.” In his opinion, it was obvious that a DNA sequence could be
modified for malicious purposes and that it was the researcher’s duty to verify
that what was sequenced matched the originally published sequence. I do not
disagree with this point of view, but beyond the many scenarios that open up in
terms of security, it is difficult to explain how easy it would be for some of
the checks to fail, particularly if the problem lies in the software. The
simple fact that this could occur warranted further study, in my opinion.
Nevertheless, his perspective was not without
grounds. My biological scenarios were merely theoretical, given that I did not
have the resources to synthesize/sequence a modified genome and demonstrate a
real case. Without this, it was difficult to verify the feasibility of a genome
being compromised with malicious information in such a way that, if
synthesized, it could be passed into the biological realm, carrying an
arbitrary sequence, and then be sequenced and compromise the system.
Furthermore, it wasn’t something we could see ‘in-the-wild’, but technically
that didn’t mean it couldn’t happen one day.
And then, that day came.
Professor Tadayoshi Kohno and his team from the
University of Washington managed to demonstrate it in their article published
last week: “Computer Security, Privacy, and DNA Sequencing: Compromising Computers
with Synthesized DNA, Privacy Leaks, and More.”
Kohno and his team carried out in-depth, detailed
research into the subject, where they put into practice this theoretical
scenario which I was wondering about too: “maliciously” modified DNA can be
synthesized and sequenced, giving rise to the execution of arbitrary code. In
this case, they created a vulnerability in an application called fqzcomp to
demonstrate the code’s execution.
“Establishing whether or
not they belong to the structure of the sequence may be no trivial matter.”
However, there are many different possibilities. In
my work, for example, there was a simple script that parsed the FASTA
file (which contains the genome’s information and is written using the four
nucleotides: adenine, cytosine, thymine, and guanine) to decrypt and execute
the “payload.” It wasn’t an elegant solution, and also it required the victim
to be vulnerable in order to execute the script; therefore I wasn’t fully
satisfied, but it did the job. To encode the string into the sequence, the
procedure was similar to the biological process, whereby these four bases (A,
C, T, and G) are grouped into triplets forming what are referred to as codons
(which represent amino acids and are then translated into proteins).
This means you can take the groups of three as a
basis and then code a symbol for each triplet, forming a “hidden” alphabet. In
this case, ASCII was used, and the coding took the following form: ACA = “A”,
ACC = “B”, ACG = “C,” and so on successively (there are various ways to code
the message; this is just one example). As you can see, we have 4^3
combinations, so we can quite easily code the entire alphabet in uppercase,
lowercase, numbers, and symbols, and we still have spares after covering the 64
possibilities. This system offers a way to “write” arbitrary code inside a
genome. Naturally, you could write quotes, as J. Craig Venter did when he
created a cell controlled by a synthesized genome, or inject malware or
arbitrary code.
What kind of impact could this cause?
Below, I include a portion of my thesis that
analyzes the potential scenarios that could be discussed.
“The impact of this type of attack could be
classified as: digital, digital-biological, and biological.
1.
Digital
impact: The fact that a malicious payload can be injected into a DNA sequence
does not imply that this methodology aggravates the infection, but rather it
would aggravate the complexity of identifying it and subsequently detecting it
using traditional protection methodologies such as hashes to ensure integrity
and solutions to detect corrupted files. For this reason, it has been
demonstrated how this scenario would work in order to warn of the possible use
of genome sequences as alternative vectors.
2.
Digital-biological
impact: In the event that a genome sequence is maliciously modified, and that
genome is successfully synthesized, the malicious code could remain in the cell
without impacting it. It should be clarified that this was not verified by the
author as it falls outside the objectives of this work. If this were to happen,
this organism would load some malicious code, whose DNA could then be sequenced
in a laboratory and generate a sequence file that would contain, for example, a
portion of malicious code. An attacker would then just need to extract it and
execute it in order to activate a digital attack. (This point is similar to the one demonstrated by the University of
Washington.)
3.
Biological
impact: This would be the case where a maliciously inclined person has the
ability to cause a mutation in a sequence, which would have no malicious impact
on the system but could set in motion a functional problem at the biological
level, if it were synthesized without adequate checkpoints. (This would be a hypothetical case whose
feasibility is more difficult to verify.)“
As we saw with Professor Kohno’s publication last
week, Scenario Two has already been addressed and demonstrated to be “feasible”
under certain circumstances. Undoubtedly, it remains far from being a real
threat, but it is no longer a merely theoretical problem as we imagined in the
past.
In the future, could a bacterium infected
with malware replicate itself?
In the hypothetical case that a piece of modified
DNA has been successfully synthesized, then the malicious code could form a
part of a synthetic cell capable of replicating itself autonomously in the
biological realm. The malware could even be “propagated” biologically, given
that bacteria inherently have all the equipment needed for reproduction.
Furthermore, the malicious code would not affect the carrier cell accommodating
it, but would use it to stay “alive” until its genome was sequenced in a
laboratory and regained its digital form in order to then activate itself on a
computer or device. However, pinpointing the correct location for this code is
a complex matter if biological propagation is to succeed. Here are some of the
areas where a malicious string could be inserted:
1.
Irrelevant
area: the malicious code enters an area of little importance; it is likely to
have no significant impact.
2.
Area of a
gene: if it enters a gene sequence and produces a mutation, two possibilities
arise: The mutation is lethal, in which case it may disappear from nature
without propagating itself. Or, the mutation is beneficial or neutral, in which
case the added portion may continue its propagation.
3.
Regulating
area: In this case, it could alter a gene, as in the second scenario, or it
could do nothing, as in the first.
As such, in the event that it does not produce a
lethal mutation, the malware and the synthetic carrier cell could form a kind
of “cybernetic commensalism,” to make a simple comparison to the kind of
symbiosis by which one participant obtains a benefit while the other one is
neither harmed nor benefits.
In the University of Washington’s research, more
emphasis is placed on sequencing a piece of DNA without any biological
objective, but it is not clear [to me] whether it was dismissed on grounds of
feasibility or complexity. I believe that this, as much like science fiction as
it sounds, could be another point to consider in the future.
Detecting malicious strings
As the information is coded into the sequence,
detecting malicious strings could be a complicated procedure. This is because,
regardless of whether an application is capable of identifying them,
establishing whether or not they belong to the structure of the sequence may be
no trivial matter, if the DNA in question has a biological objective (and has
not been published) or is used to store information or for other purposes.
Conclusion
It is interesting to see that this topic is finally
gaining more attention in the media and, possibly, among researchers and
specialists thanks to the research done by Tadayoshi Kohno and his team.
Despite the debatable elegance of the implementation — creating a vulnerability
in an application — we can observe that one of the most important points from a
security perspective is gaining ground: the notion of subjecting this topic to
greater scrutiny in order to spark an interdisciplinary discussion of it, in
which IT and bioinformatics specialists, security experts, equipment
manufacturers, governments, and specialists in molecular and synthetic biology
come together.
In my opinion, given the rapid speed with which
sequencing devices are developing, and the dramatic reduction in costs,
successfully achieving security in DNA sequences will require a lot more work
than can be done by one research group and a few enthusiasts. Unfortunately,
until there are real-life cases or economic losses, it is likely that we will
not see anything more in the media than sensationalist articles predicting the
“genome-alypse.”
It is true that the feasibility is still low and
there is no reason to be alarmed, but we should also remember that with IT
security, waiting for an attack to happen before finding a solution has never
been a good strategy.
Disclaimer: Everything presented here makes no
claim to be exhaustive and may contain errors, considering the
interdisciplinary nature of the research and my background as a technician and
not as a biologist. Therefore comments, suggestions, and improvements are
welcome in order to keep deepening and expanding this fascinating topic.