
Christine Cox / NBC News file
Researchers say genetic genealogy databases can be leveraged to unlock more sensitive genetic information.
Researchers have shown that it's possible to link your identity to supposedly secret genetic information about your predisposition to diseases, merely by analyzing family-tree databases and other publicly available information.
"It was quite surprising," said Yaniv Erlich, a genetic researcher at the Whitehead Institute for Biomedical Research. "When we got the first family, I was surprised. ... It's as if you opened a box that for a long time was locked."
Erlich led the research team whose work is being published in this week's issue of the journal Science. The team's study already has led to a tightening of security measures for federally sponsored genetic databases.
The security-cracking trick relies on the availability of genetic information linked to surnames in a variety of public family-tree databases. DNA samples from males can be tested to look at dozens of genetic markers on the Y-chromosome that change only rarely from generation to generation. If the markers from two individuals with the same surname are a close match, that's a tip-off that the two are closely related, even if they don't know each other.
Tens of thousands of people (including yours truly) make that information public in hopes that someone else will match up with their results. The genealogical markers aren't linked to disease or other specific traits. But under the right circumstances, they could provide an opening for links with other, more sensitive genetic information.
How the secrets were revealed
Erlich and his colleagues conducted a three-step process to see how easy it'd be to use that opening. First, they analyzed anonymous Y-chromosome data from a public database for the 1000 Genomes Project, to come up with the DNA coding for markers that are used for genealogical purposes. Then they compared those markers against entries in the two largest family-tree databases, Ysearch and the Sorenson Molecular Genealogy Foundation.
The researchers said their analysis projected a success rate of 12 percent for recovering the surnames of U.S. Caucasian males. Another 5 percent would theoretically be linked up with the wrong surnames. They said upper- to middle-class Caucasian males were easier to identify, presumably because they're more likely to participate in the family-tree databases.
Once the surnames were identified, the third step was to look at other publicly available sources to go from the surname to a specific individual: Some genetic databases, for example, include information about the age and the state of residence of an anonymous participant, and even the number of children and their birth order. Those clues were added to information gleaned from other sources, ranging from public-record search engines to obituaries.
The researchers linked five specific individuals in three separate families with supposedly anonymous genetic records. The process took three to seven hours for each family pedigree, the scientists said. Then they traced those three family-tree pedigrees to find other connections between relatives and sensitive genetic data. "In total, surname inference breached the privacy of nearly 50 individuals from these three pedigrees," the researchers wrote.
"We show that if, for example, your Uncle Dave submitted his DNA to a genetic genealogy database, you could be identified," Melissa Gymrek, a member of the Erlich Lab and the Science paper's principal author, explained in a news release. "In fact, even your fourth cousin Patrick, whom you've never met, could identify you if his DNA is in the database, as long as he's paternally related to you."
What is to be done with data?
Erlich and his colleagues made a point not to reveal the identities of those individuals, and said they were not advocating a clampdown on the availability of genetic information.
"Quite the opposite," Erlich said. "We found the gene for two devastating pediatric disorders by analyzing the data in public databases. Using these databases, we gave hope to these families and to other parents. We don't want to take away these databases. ... What we really want to do here is to have this really mature conversation about privacy — to tell people we cannot completely protect the privacy, but also to tell them about the benefits."
For years, experts have worried that sensitive genetic data could be used to discriminate against patients, potential employees or would-be insurance customers. Such discrimination is illegal when it comes to employment or health insurance, but the law doesn't cover life insurance, disability insurance or long-term care insurance. Theoretically, an insurer could search through genetic records and turn you down because you have a genetic predisposition to, say, Alzheimer's disease.
In a Science policy paper, representatives of the National Human Genome Research Institute and the National Institute of General Medical Sciences at the National Institutes of Health said it was time to "re-examine how to balance the protection of research participants ... with the societal benefits likely to be gained through the enhanced research that broad data sharing facilitates."
They said NIH "acted swiftly to mitigate future risks" by working with the NIGMS' genetic repository to shift the data about the age of study participants out of public view and into a controlled-access area of the database.
"That reduces the risk," Erlich said. "It creates another fence."
And what about the genealogical genetic data? Max Blankfeld, vice president for operations and marketing at Family Tree DNA, said his company has been dealing with privacy issues for more than a decade — and doesn't expect the latest research to lead to policy changes. Family Tree DNA has been running the Ysearch database as a free public resource for a decade, but does not force any of its more than 400,000 participants to use it.
"People voluntarily post their information in that database, and therefore it has nothing really to do with the vast majority of the people who take the test and choose to have it protected by Family Tree DNA," Blankfeld said. "This data, we don't share with anyone."
More about genetic ancestry:
- DNA takes on a family's mysteries
- Update on Irish roots: The wearin' o' the genes
- Gene-tracing project gets an upgrade
In addition to Erlich and Gymrek, the authors of "Identifying Personal Genomes by Surname Inference" include Amy McGuire, David Golan and Eran Halperin. The work was supported by the National Defense Science and Engineering Graduate Fellowship, the Edmond J. Safra Center for Bioinformatics at Tel Aviv University, and a gift from James and Cathleen Stone.
The authors of the Science policy paper, "The Complexities of Genomic Identifiability," include Laura Rodriguez, Lisa Brooks and Erick Green of NHGRI and Judith Greenberg of NIGMS.
Alan Boyle is NBCNews.com's science editor, and also the administrator of the Boyle Surname Project at Family Tree DNA.
Connect with the Cosmic Log community by "liking" the log's Facebook page, following @b0yle on Twitter and adding the Cosmic Log page to your Google+ presence. To keep up with Cosmic Log as well as NBCNews.com's other stories about science and space, sign up for the Tech & Science newsletter, delivered to your email in-box every weekday. You can also check out "The Case for Pluto," my book about the controversial dwarf planet and the search for new worlds.


Wait, what happened? lol
Deep subject. There are genetic abnormalities that occur in children who's parents do not show any trace of. So I would think that judging a person by how previous relatives health is/was seems unfair. There is no guarantee that you will develop something that other members of the family did, and you could develop something new to the family. But I guess something like Cancel, Diabetes, or Alzheimer's might work in this situation, since those are common inherited illnesses.
Well then if we can now prove that we humans are related to each other, then we can move forward to treating each other more fairly, and with greater respect, and with infinitely improved willingness to cooperate in the join human endeavor collectively called humanity.
If we can be linked via DNA to all of our fellow humans in this way, then we can better know about our ancestors back to the beginning. If this can be done then the next great step in human development would be to determine how our DNA and every other bit of DNA found in this planet, in its infinite diversity is related to us and finally to see most clearly that the Earth belongs to everyone.
(Or we can just continue do what we are doing to spread hate, misunderstanding, and division and to murder both the bodies and the souls all our precious children for every generation left to come.)
If the spirit moves you, then by any and all means move.
https://genographic.nationalgeographic.com/
This was a special on Net Geo concerning the origins of humans and how we came to consist of different "races". In the special they show humans starting in Africa, and the migrations of different groups into difefrent areas. They did genetic testing on all the volunteers and were able to tell then where their ancesters came from, etc. A really fascinating show. I will see if I can find the original program. The link above is where you can join to participate in the project, though at the time during the show they charged for the test.
We need to hook everyone up to pain sharing devices. That will straighten things out in a hurry.
This just amounts to a form of statistical profiling - which is just an educated guess backed by some degree of statistical methods applied to some data. One could use this to derive a probability factor for particular illnesses - but significant uncertainty would still exist for a particular individual.
I could think of a number of ways this could be abused. The most obvious: insurance & health coverage providers could use this method to screen existing subscribers (and applicants) - bypassing the need notify anyone what they're doing and for what purpose.
One interesting legal implication: does the resulting data derived from such a practice trip over the boundaries of HIPAA regulations? At first glance (assuming this article is reasonably accurate), it seems to me that a case could be made that it does.
@ e-o-k
You are correct to be concerned! All it would take is a stroke of a pen (executive order, if you will) to link genetic databases to the MIB (Medical Information Bureau)!! The MIB is to the insurance industry as the credit bureaus are to the financial industry
Scary Huh!!??
Just wait until the "murderer gene" or some other such thing is found, then we can jail everyone before they can commit a crime. Something to look forward to. Hope that you are not predisposed.
It seems that the American Disabilities Act of 1990 could be used against such abuses.
Be leery of those who are overly interested in your genes .
Unless it's your personal trusted general practitioner .
Actually, most people asking for your DNA are researchers trying to get enough patients to reach statistical significance to determine if a disease has a genetic link.
Well, I am sure our goverment won't object to selling the federal prison DNA data base. After all we are broke and need any revenue available.
Is the author related to Robert Boyle or the Duke of Orrery?
If someone wants to get into my genes, I'll be happy to give them my number...
I'm sure our current form of Corporatocracy will use this genetic blueprint of our being in a careful and conscientious manner
save guns ban genetics
The authors of the study went to a lot of work to identify names of people behind the genetic profiles that are readily available. It seems like an exercise to raise unwarranted concerns about the use of genetic-based medical information. This does a disservice to an increasing number of individuals who assert their right to obtain their own genetic testing and have ready access to the data instead of having it locked away in a doctor's office. Current law in the US prohibits discrimination for health insurance coverage based on genetic disposition to certain diseases.
Too bad the original journal article cited is behind a pay-to-read firewall at Science Magazine. If there is another way to read the entire journal article it would be nice if the author could provide the link.
Yes; and No. . .
What a laugh this is for all the blue bloods, the rest of the world is Hines 57 how in the hell they ever going to get enough info to figure that out un-less some fool gives them the key to their history.
I want to see what is says about Obama... this will prove he is not a US Citizen, and all the names he used during college were in fact him.
It will prove that he has the incompetent gene.
(sigh) Your genes will say something about your regional ancestry, but it won't say squat about where you were conceived or born. There's no 'Hawaii gene,' no 'Kenya gene.'
Nothing is safe with this administration's Big brother attitude!
Hackers are scary enough! But I worry about insurance underwriters getting their hands on genetic databases! You think insurance premiums are high now!!
By the way, can anyone tell me what "federally sponsored genetic databases" are all about?
Prisons, military, and research facilities.
In other words, if you don't post anything online or keep any digital records of your personal information then hackers can't do zip on you.
That said, this article talks about "can" and not "will" unless given a true intent (usually targeted). If hackers do this for fun... well, they're really no use to humanity.
The only thing that I can see such technology being used for is Genetic Brainwashing. Basically the monkies in the cage would check your DNA and would find traits such as cancer that you do not have but another relative might have had where the group wanting to control your life to benefit their superiority would then insist that because you have the same DNA traits as someone else in your family that may have already died that you would in fact be a reincarnated form of the person where you would have to live your life based upon how that person lived their life thus creating a perfectly controlled society.
And how many of us are worth all that trouble? Really?
Sounds like a good sci-fi drama.