[Update: If you’re interested in learning more you can also check out a new project that I’m involved with: DNASquirrel – helping you protect your genetic privacy when taking 23andMe, Ancestry, or other direct-to-consumer genetic test.]
I’m thinking about sending my DNA in to 23andMe anonymously. I’ll explain why, and how I plan on doing it.
I’m curious about what 23andMe and similar direct-to-consumer (DTC) genetic testing services can tell me about my health, about where my ancestors called home, and about what makes me unique (or not) as a human. Having spent the last decade working as a scientist in the field of medical diagnostics, I’m perhaps even more curious to get my hands on the ~700,000 raw data points that they will extract from my DNA. Most of this data won’t actually be used to create my personalized health and ancestry reports, but nevertheless contains a treasure trove of information about me and my family.
For example, I’m curious about creating DIY genetic tests for predicting risk of lymphedema.
But this unprecedented insight into my genetic makeup comes at a cost, and it’s not just monetary. Ownership of my DNA is a big concern for me, and this places me at direct odds with what DTC genetic test providers are really after: not just my $99 USD, but permanent ownership of my personal genetic blueprint.
Why do I want to take 23andMe anonymously?
I’m wary about giving my personal genetic blueprint (my ‘genome’) to a corporation, especially a privately held corporation whose business model rests not on consumer sales, but on a potentially much larger secondary market. This secondary market currently consists of selling the so-called ‘anonymized’ genomes of its customers to big pharma for research purposes, but in the future could involve something more.
To be fair, 23andMe and similar companies explicitly ask customers to opt-in to ‘sharing’ their data with third-parties for research purposes, and it seems that about 80% of their customers agree to this. It’s also FANTASTIC that these genomic resources are being created and shared, and they will no doubt lead to life-changing therapies for debilitating conditions.
But would these 80% of people feel less inclined to opt-in if they understood that:
- You are contributing to an incredibly valuable corporate asset that will be sold to third-parties. 23andMe’s genetic database stands to generate significant revenue both now (such as a $300 million drug development deal with GlaxoSmithKline in 2018) and in perpetuity. You do not know who these third parties will be, you will not benefit financially, and once you have opted-in you can never completely opt-out.
- Your genome contains a LOT more information than we currently understand, and what we currently understand is also a lot more than what 23andMe reports back to their customers in the form ancestry, disease predisposition and physical attribute information. What about 10 years from now? Is it possible that genetic analysis could one day offer far deeper and more personal insights into who we are? It is certain to offer greater insight into disease risk (heart disease, mental health, cancer, etc.) and life-expectancy, but possibly also sexual orientation, behavioral tendencies (such as risk tolerance), spirituality, IQ, and more. This makes it a uniquely powerful source of information about you – information that is currently (and legally) used for medical, insurance and law enforcement purposes in the United States and elsewhere. What future applications will arise?
- Your genome contains a massive amount of information about your family as well. Not just your immediate family, but distant cousins that you’ve never met and family yet to be born. Exposing your genetic information exposes theirs as well – and this can’t be undone. In the United States or other jurisdictions this might have significant unforeseen consequences such as affecting their insurability or making them easier targets of law enforcement. As individuals do we have the right to expose private information about our family members?
- Ensuring your (and your family’s) genetic privacy is not currently possible. I’ll explain below why 23andMe and similar companies are unable to make these promises, despite assurances that your genetic privacy is safeguarded.
- Losing your (and your family’s) genetic privacy is irreversible. You have one genetic blueprint, parts of which you share with your family members. Unlike your email password, this information cannot be changed, it is who you are. When you sign up for their service you agree to allow them to permanently retain your genetic information along with various other bits of information they have collected on you – even if you close your account with them. Your genetic privacy will forevermore rest in their hands.
I would expect that 99% of their current customers are unaware of the above, so are these customers really offering informed consent? Even if they understood the above, informed consent requires a reasonable understanding of what is going to be done with your personal information and its associated risks – which is not yet clear.
What about the 20% of customers who opt-out of ‘sharing’ their genetic information with third-parties? Certainly, these customers’ genetic privacy is safeguarded, right? And for those who do ‘consent’ to sharing, the company claims to do this anonymously, so why am I so paranoid?
Doesn’t 23andMe already protect my privacy?
Heck, they even have a page on their website titled “Can I be genotyped anonymously?”.
The short answer is yes, they most definitely will try to protect your privacy, to the level that they are currently legally required to, and according to the 23andMe privacy statement and terms of service – which incidentally they have the right to alter at any time as long as they inform you of the changes.
Feeling secure? Keep reading.
I have no doubt that the people working at this company and the many others like it have the best intention to maintain your genetic privacy, after all, they want consumers to keep sending them their DNA. But are 23andMe’s best efforts and ‘best practices’ enough? Do companies know how to safeguard customer information? Relentless data breaches at countless major technology companies suggests otherwise. Even so-called ‘unhackable’ blockchain technology is now getting hacked. For example, 23andMe does not require two-factor authentication (such as by texting a passcode to your phone) before allowing you to download your raw genetic data from their website – which is something my Gmail account now requires if I wish to adjust my settings.
“In the event of a data breach it is possible that your data could be associated with your identity, which could be used against your interests.”
For which I’m sure they will offer you a heartfelt apology for the irreversible and unprecedentedly thorough loss of your and your family’s privacy.
But I’m actually far more concerned about something more profound than routine data breaches: does 23andMe really understand how to protect genetic privacy? Does any business? Does anyone?
Do even scientists and policy makers know how to safeguard my genetic privacy?
Steady strides are being made in legislating better business practices for safeguarding consumer privacy, but genetic privacy is an entirely different beast. Scientists and policy makers appear to still be largely unaware of how easily identifiable your genetic information really is, of how best to protect it, and of how consequential, far reaching and irreversible any loss of genetic privacy would be.
For example, a study by Dr. Yaniv Erlich and colleagues published in 2015 (ref 1) demonstrated that ‘de-identified’ (i.e. ‘anonymous’) genetic data of people who donated to the 1000 Genomes Project could be easily re-identified (i.e. the person’s names and addresses could be found) using a genealogical test called ‘Y chromosome surname inference’. This finding prompted the 1000 Genomes Project to remove donors’ ages from their public database – one unnecessary piece of information that researchers used to help expose the donors’ true identities.
In another example, in 2013 the European Molecular Biology Laboratory (EMBL) published the genetic profile of a particularly famous experimental human cell line known as HeLa cells. In doing so they exposed personal genetic information about the family of the donor. After being publicly ridiculed for their error EMBL responded by removing the genetic information from public access.
Policy makers also appear to have been caught flat-footed by the special privacy concerns created by genetic information. The Health Insurance Portability and Accountability Act (HIPAA) sets the rules that govern how patient information is protected in the medical field. But HIPAA does not consider certain patient demographic information such as age, or state of residence to be ‘identifiable’ information. Nor are direct-to-consumer genetic testing companies like 23andMe even required to follow HIPAA patient privacy rules. Broadly speaking, it appears that genetic information is currently not widely considered to be information that can be used to identity an individual (ref 2,3) – a perception that is obviously outdated.
One helpful first step would be to make the practice of re-identifying individuals based on their genetic information illegal.
How easy would it really be for someone to identify me from my genetic profile?
If the genie wasn’t already out of the bottle, it certainly escaped in April 2018. Law enforcement revealed that they determined the identity of the Golden State Killer, a 32-year-old cold-case, in 4 months using crime scene DNA and a publicly available genealogy database. This database, called GEDmatch, contains the genetic profiles of around 1 million people. Detectives used the DNA to identify a third-degree cousin, information that helped expose the identity of the suspect. Law enforcement agencies are now eagerly employing this approach to solve other cases, both cold and warm (ref 2).
But is it reasonable to think that people or organizations outside of law enforcement would be able to pull off a similar feat using your genetic information?
Wouldn’t this be unreasonably difficult, or rarely successful? After all, the database used in the Golden State Killer case contained the genetic profiles of only about 1 million people. What are the chances that I have a cousin who donated their DNA to this database? Compared to a population of 325.7 million Americans, finding one person based on their DNA alone would be like finding a needle in a haystack.
A second study by Dr. Erlich and his team published in November 2018 asked just this question (ref 2), and the results are equal parts fascinating and unnerving.
Their study was broken down into two parts: first, they tried to determine how likely is it that an existing genetic database would contain a distant cousin of a target of interest (say for example, you). The second part was, if they did find your distant cousin, how hard would it be to use that information to unmask your identity.
Part 1: How likely is it that existing genetic databases contain at least a distant cousin of mine?
The first question they asked was how likely is it that a genetic database would contain a third-degree or closer relative of a person of interest (for example, you), as occurred in the Golden State Killer case?
Studying a database of 1.28 million genetic profiles of people of primarily Northern European decent (a database belonging to MyHeritage, a direct-to-consumer genetic testing service similar to but much smaller than 23andMe), Erlich and colleagues determined that about 60% of the time they could find a third-cousin or closer match for a person of European descent. Hardly a rare event if you happen to be a white American.
Even more impressive, they roughly estimated that a genetic database containing as little as 2% of the adults of a population would contain at least a third-cousin match for 99% of the people in that population. For Americans of European descent, this would require a database of only about 3 million genetic profiles, or just over twice the size of the database they had.
In other words it’s likely that 23andMe, with their database of over 5 million people, already has the genetic profile of at least one of your third-cousins or some closer relative.
In fact, there’s also a reasonable chance that the publicly available GEDmatch database does too. And if either of these databases don’t now, they will soon; growth of the direct-to-consumer genetic testing market is impressive.
Part 2: How likely is it that a match to my distant cousin could expose my identity?
What does it mean to be able to match your genetic profile to that of your distant cousin? A lot. Using public records one could then identify members of your family tree with surprising accuracy. The authors of this study conservatively estimated that, on average, finding a distant relative of their target of interest would allow them to reduce their list of ‘suspects’ from 325.7 million Americans to just 855. This is not unlike what occurred in the Golden State Killer case, where genealogical analysis narrowed law enforcement’s search down to a family tree of 1000 people.
But that’s still a group of 855 people that you are hiding in.
The researchers then tried to determine if there were other pieces of relatively easily-accessible information that might be used to narrow this haystack down further, from 855 to 1. It turns out that a bit of basic demographic information is all that is needed: your gender, your age, and very a rough idea of where you live.
Your gender can be determined from your DNA, and your age is not typically a protected piece of information. This is the case with 23andMe: neither piece of information, nor your ethnicity, is stored separately from your genetic profile according to the research consent form found on the 23andMe website:
“Your genetic data and any other personal information you enter into the website, except for your Registration Information (name, contact information, and credit card information), may be analyzed in the research.”
The authors assumed that the person looking to unmask your identity could reduce the physical search space down to a 160 km (100 mile) radius. If you were a criminal who left his/her DNA at a crime scene this is a fairly easy leap to make, since most crimes are committed close to home. But how could your approximate location be connected to your genetic profile at 23andMe?
Your approximate geographical location may be revealed if:
- Your registration information with 23andMe is breached in an ever-so-common hack. 23andMe describes storing registration information separate from their customers’ genetic profiles, to help reduce the risk that customer name and address could be connected to their genetic profile.
- Your geographical information such as state, city or zip code is stored with your genetic profile (making it less secure in case of a hack), and/or is included with your genetic profile when it is shared with third parties (whose data protection practices are unknown). Since high-level geographical information is often considered to be ‘pseudo-anonymous’ it may not be appropriately safeguarded. HIPAA privacy rules require that city and zip code information be safeguarded, but not state information. 23andMe does not have to follow HIPPAA (nor do they claim to), and even if they did, there are more than a handful of states with a diameter less than 100 miles.
- Your IP address was captured while visiting a website, or via an email exchange. Depending on where you live and your internet provider, this can offer VERY precise location information (see for yourself here: https://whatismyipaddress.com).
- Postal information was captured when your DNA sample kit was mailed in to the lab. Even if you didn’t include a return address, packages are postmarked with the date and location of the post office it was mailed through.
The authors found that by incorporating information about their target’s age, gender, and approximate location (to within a 100-mile radius) this reduced their list of suspects from 855 to 1-2.
To give you an idea of the sensitivity of this approach, if instead they could only estimate your age to within a 10-year range, they would end up on average with a group of 16-17 people that their target is hiding in, instead of 1-2.
Interestingly, in contrast with the four-month long search that it took law enforcement officials to find the Golden State Killer using genealogical information, the authors estimated that their strategy for re-identifying a person based on their genetic profile would take a little more than one day of work.
Larger and/or combined genetic databases, the availability of social media data, advancements in genetic and online search methodology, and computer automation will only make this strategy more effective and faster to execute. Incidentally, it’s also relatively straight forward to discern eye and skin color from genetic profile information, and with more sophisticated whole genome analysis, perhaps eventually height, weight, facial structure, and more (ref 4).
What if I delete my account with 23andMe after I get my results? Will this protect my privacy?
If you wish to follow 23andMe’s terms of service to the letter (I suggest an alternative strategy below), your best chance at protecting your genetic privacy would be to:
- Opt-out of sharing your genetic profile to reduce the risk that unnamed third-parties intentionally or unintentionally do you harm.
- Opt-out of having your sample stored to reduce the risk that a more revealing genetic analysis might be done on your sample without your permission – such as full genome sequencing.
- Close your account with them after collecting your reports and raw data. This prevents you from benefiting from any additional genetic analysis they may offer you in the future if you retained an active account. You can read about closing your 23andMe account here. However, according to the 23andMe privacy statement they will nevertheless retain all data derived from any genetic tests they have already performed on your sample, along with “other information” which we know includes at least your gender, birth date and email address – information that could still be used to identify you.
Below I’ll describe a more robust strategy for protecting your genetic privacy while using this or any other direct-to-consumer genetic testing service.
Is it really possible to get anonymous 23andMe results?
Truly anonymous 23andMe results? No. But you can definitely obtain your results MORE anonymously – by stripping away as much accompanying information about you as possible, thereby making it less likely and/or too onerous for someone to reasonably identify you based on your DNA. This is our goal.
What key information must I keep safe to preserve my genetic privacy?
As we saw above, if someone had your genetic profile along with some relatively basic demographic information about you, and an internet connection, they could likely determine your name with surprising accuracy.
So what information should you keep hidden? As many pieces of information about you as possible. Each piece of demographic information you allow to be associated with your genetic profile will significantly shrink the haystack that you’re hiding in.
You won’t be able to hide your gender because this is already known by 23andMe based on the presence or absence of genetic markers (called SNPs) found on the Y chromosome carried only by men. But hiding everything else, including your name, age and location information is certainly necessary to preserve your privacy.
A step-by-step approach to anonymous 23andMe testing
- Purchase a kit
- BASIC PRIVACY: Purchase your kit through Amazon (here if you’re Canadian, or here if you’re American) rather than have it mailed to you directly from 23andMe. This is a reasonably good way to ensure that your name and home address do not end up connected to your genetic profile in 23andMe’s database, but it is not a perfect solution. Even sellers of products that are fulfilled by Amazon (as in this case) receive the full name and address of each purchaser. But is it unlikely that 23andMe links this information to the specific kit purchased (and its association kit registration number), or that they cross-verify purchaser information against the registration information used when the kit is registered on their website. After all, I could be purchasing the kit as a gift for a friend. One way to further increase your anonymity would be to have a friend purchase the kit for you through their account. Disclaimer: we will receive a small commission if you make a purchase through the Amazon links above. We use any commissions to help support our blog – so thank you :)
- STEALTH MODE: Ask a friend to purchase you a kit through Amazon, and have it mailed to their home address. if you are in the US you could also purchase a kit in person from Walmart, CVS Pharmacy, Walgreens or some Best Buy locations (learn more on 23andMe’s website here). This is just the collection kit, so it will appear to be less expensive. When you register the kit on the 23andMe website you will be asked to pay a secondary “lab fee”, so the price works out the same as if you bought it directly. Pay for the kit in cash to further protect your identity. If you’re feeling concerned that 23andMe might keep record of which kits are sent to each store (and could therefore determine your approximate location), have your friend who lives in another city buy you one and have them send it to you.
- Mail in your sample
- BASIC PRIVACY: Collect your saliva in the collection tube and put it in the return envelope. Don’t put your return address on the package. Although it’s unlikely that the laboratory receiving it would be recording this information (nor the postmark information which shows which post office it was mailed from), your return address is unnecessary to include.
- STEALTH MODE: Definitely do not include a return address. Mail your sample from a post-office that is not near where you live (more than 100 miles away would be nice). Perhaps mail it to your friend who lives elsewhere and have them mail it in for you instead.
- Register your kit with 23andMe
- BASIC PRIVACY: You need to register your kit on the 23andMe website before you can get your sample analyzed and collect your results. 23andMe does collect and store IP addresses for marketing purposes, and this includes information about your location. However, your IP address is not likely to be cross-referenced or otherwise stored with your account information. Enter a fake name (especially last name) and age, and an address nowhere near where you live (since they won’t be mailing you anything anyway). Enter an email address that does not include your name. Do not provide them with any optional information or fill out any surveys. If you wish to opt-out of allowing your genetic profile and DNA sample from being used in research, do so. This protects your so called ‘anonymized’ genetic profile and DNA sample from being shared with third parties and decreases your risk of inappropriate use or data breach. You can also opt-out of having them search for any family relatives, but this doesn’t change much from a privacy perspective since it is very likely that 23andMe automatically includes every genetic profile in their genealogy database. If you decide to opt-in to looking for your relatives, you will have control over what information is shared (if any) with anyone who finds you through this service and tries to reach out to you. If you bought your kit in person you will also need to pay the lab fee through their website. To do this you should use a “non-reloadable” prepaid debit card registered with the same fake information you used to register on the 23andMe website (you can only do this if the card is not a “reloadable” one).
- STEALTH MODE: Log into their website using a VPN server to protect your IP address. There are still some free VPN services that are trustworthy (read a review here), but usually you get what you pay for. If you are very serious about your security, use an email address that you created without using your real name, and that didn’t require you to verify your identity using your phone number (this is becoming increasingly hard to find, but one and good example is Tutanota). When you register with 23andMe, use your fake identity, choose an age that is at least 10 years different from your true age, opt-out of sharing your genetic profile and DNA sample for research purposes, and opt-out of participating in their relative finder service.
- Get your anonymous 23andMe results
- BASIC PRIVACY: Log into their website and retrieve your results – download the reports they offer you as well as your anonymous 23andMe raw data so that you have it for your own purposes now and in the future. Close your account if you wish to not receive any additional analysis of your results that become available in the future. This will remove some of your fake information from their database, but not everything, and also means that your DNA sample will be pulled out of their freezer and destroyed. If you don’t close your account you can still ask to have your DNA sample destroyed. By one unconfirmed account this may be cumbersome to do, but worth it to avoid the possibility of further (and possibly more revealing) unauthorized analysis of it.
- STEALTH MODE: Log in to view your results through your VPN server. Download your reports AND your raw data so that you can use it later. Close your account with 23andMe.
If you’re really going all-in on protecting your privacy, by now you owe your friend a beer! You might also wish to take the opportunity to explain to them why you must now unfriend them on Facebook – just kidding (sort of).
Is this strategy bullet-proof?
No. The goal of this anonymization strategy is to:
- Make it highly improbable that your identity could be accidentally linked to your genetic information if 23andMe is ever hacked or inadvertently leaks identifying information to a third-party (use the BASIC SECURITY approach to protect against this).
- Make it highly improbable that your identity could be purposely revealed by 23andMe or a third-party for harmful purposes (use STEALTH MODE to protect against this).
Keep in mind that if you are a person of considerable interest to a law enforcement agency you should avoid sharing your DNA with anyone who manages a genetic database. But this doesn’t mean that your DNA still can’t be used against you. If these agencies have recovered a sample of your DNA from a crime scene and can make a genetic match to a distant relative of yours they will have decent odds of tracking you down regardless – so consider turning yourself in.
Is anonymizing my 23andMe results in this way illegal? Unethical?
Let me preface this by stating that I’m not a lawyer, ethicist, clergyman, or your mom, so feel free to follow your own path.
It is not illegal to submit inaccurate information about yourself to a corporation in this way. With that said, their terms of service also gives them the right to ban you from their service and deny you a refund.
Would it be unethical to mislead 23andMe about my identity? It’s not clear to me how obscuring my identity from them would cause the corporation measurable harm. Conversely, offering them my personal information could potentially expose innocent people (me, my children and my extended family) to significant and irreversible harm either now or in the future. Since lying on their forms isn’t likely to undermine the importance of truthfulness in our society, I think we’re all good.
Of course, I will still maintain an active email address with them so that they could always reach me in the future should a necessary (and legal) reason arise – like perhaps to inform me of a data breach.
Am I going to get my 23andMe results anonymously?
I’m still considering it.
The direct-to-consumer genetic testing environment is still in the ‘wild-west’ stage: no one really knows what our genomes contain about us nor how businesses and governments might wish to capitalize on this information; appropriate genetic privacy standards do not exist; major corporations appear unable to safeguard user information from hackers; and so-called ‘anonymous’ genetic profiles combined with other ‘non-identifiable’ demographic information is sufficient (or will soon be) to irreversibly link your identity to your genetic profile.You have only one genetic blueprint, and I see a future where some of us have retained our ownership, while others have irreversibly lost it.
In the meantime, I recently found out that a family member of mine has just taken the ancestry.com test (yet another horse has left the barn!) and so I’m going to spend some time exploring their data first. I’ll keep you posted.
What do you think? Please comment below, in particular if you have suggestions for improving my BASIC PRIVACY and STEALTH MODE strategies for anonymous 23andMe testing.
[Update: If you’re interested in learning more about genetic privacy and anonymous 23andMe testing you can also check out a new project that I’m involved with: DNASquirrel – helping you protect your genetic privacy when taking 23andMe, Ancestry, or other direct-to-consumer genetic test.]
- Gymrek M., McGuire A.L., Golan D., et al. Identifying personal genomes by surname inference. Science. 2013 Jan 18:339(6117):321-4. Abstract
- Erlich Y., Shor T., Pe’er L., et al. Identity inference of genomic data using long-range familial searches. Science. 2018 Nov 9;362(6415):690-694. Abstract
- Schwab A.P., Luu H.S., Wang J., Park J.Y. Genomic Privacy. Clin Chem. 2018 Dec;64(12):1696-1703. Abstract
- Lippert C., Sabatini R., Maher M.C., et al. Identification of individuals by trait prediction using whole-genome sequencing data. Proc Natl Acad Sci USA. 2017 114:10166 –71. Full article