The search for John Doe – Who’s running the queries (Algorithms) and wants to know
Posted Feb 24 2009 10:44pm
De-identification of data base material, it’s all in a query and who’s running it. As far as I know, it’s not illegal to run queries and when matches are made, one has the rap sheet on any one of us. De-identified data bases are marketed and sold all the time for all types of purposes, genomic study, R and D, insurance research (the MIB), and so on.
The only thing that one can count on for sure is the initial data base, but matching and running queries (algorithms) is a way of life today with technology, thus matches on healthcare and other items are not that difficult or time consuming to create. Again, it all depends on who is creating the query for information and what they intend to do with the information. This is one of the reasons I speak out on being careful about what you put on the internet, more data to be mined “and queried”.
Computers do a real nice quick job of that for us as we can see every day when we are on the Internet. Some companies made millions and billions last year in creating matches when it came to money, perhaps the recently closed down subsidiary of United Health Care, Ingenix made use of some of some algorithms as they also had drug store data bases to see what drugs were being taken, again perhaps de-identified to be legal, but the money in the whole game here is queries.
As technology continues to roll on, perhaps someday certain Queries ( algorithms) with certain data bases might have a need to become registered, but that would be one big nightmare of sorts, unless the companies elected to provide audit tables of all software activity in that area with the SEC as a simple for instance, and that would require more data banks at the SEC and folks with IT to run and manage them, maybe not a bad idea after all, with leaders at the SEC in IT who could understand the process.
You can read below and see an example of how easy matches were made to find the former governor of Massachusetts though a query to match date of birth, zip codes, gender, zip and it found him, even with data that has been stripped of all potential identifiers. Again, de-identified data is sold to the highest bidder in some instances and generates money for those that sell the data bases. We really are for sale in one form or another.
Most folks in science probably don’t even give this any thought as their processes are to find a cure for diseases, but not the same with other who market and have financial gain at hand, so again, who’s query is it after all and what do they plan to do with the matches, a very big question we should all ponder and not take for granted as there’s been much of this behind the scenes for years, we just need some folks in high positions that understand the “hands on” knowledge here and how to create solutions, an area where politicians fall short with technical expertise, except for promises. One thing for sure if you hang around this blog long enough you’re bound to figure out what an algorithm is and how they produce both data and revenue. We need to shop at the “smart” stores real soon for our leaders, otherwise the folks who run algorithms for a profit keep socking the money away and at times it is can be denying healthcare to individuals who may end up dying as a result. BD
A new era for medical privacy dawned in 1997, when a computer scientist named Latanya Sweeney showed she could identify then-Gov. William Weld of Massachusetts on a list of patients discharged from a hospital, even though the data had been stripped of identifiers such as names, addresses and Social Security numbers.
Using a publicly available list of registered voters, Sweeney zeroed in on Weld’s ZIP code in Cambridge, Mass., and matched dates of birth and genders on two lists downloaded from the Internet. Weld emerged as the only match.
Sweeney said 87 percent of Americans could be similarly identified in a dataset even if it reveals only their birth dates, genders and ZIP codes. Lawmakers took her comments into account when they crafted the Health Insurance Portability and Accountability Act’s Privacy Rule, which took effect in 2003, nearly seven years after Congress passed HIPAA.
Today, medical data is increasingly being stripped of identifying information and sold to the highest bidders. However, a growing number of mathematics and computer science experts are saying that such de-identified datasets lend themselves to re-identification with today’s advanced data-mining techniques.
For example, one of Malin’s students, Allison Beck McCoy, recently showed that laboratory test results such as blood sugar values could be linked to individuals in de-identified records. McCoy used a de-identified dataset available to researchers at the National Institutes of Health and matched it with de-identified lab data from a DNA databank at Vanderbilt.