Some say we’ve only touched the tip of the iceberg when it comes to mining medical data. If we could put all this medical data in giant databases, we could find problems in service, comorbidity problems, root cause analysis, cure analysis, diagnosis analysis, yeegads, it is mind boggling.
But that is a very popular fallacy. The idea that if we got all this data, we could data mine the daylights out of it and discover god knows what. No, we can’t. It’s bad science. We can discover a lot of new relationships, but most of the discoveries will be wrong and misleading.
To understand this, you have to back down to how we discover new cause-effect relationships in medical research. The only reliable way of doing that, with a few exceptions, is with prospective, controlled, and preferrably double-blind studies. We subject one group to a potential cause (environment, drug, etc) then measure the difference in outcome with another group not exposed to the same factor. Since the effect we’re looking for can be the result of a myriad factors, most of which are unknown (this is key!), we have to be sure that any other confounding factors are evenly distributed between the study and control group (think “common mode” rejection, if you’re into electronics) to make the effect from the input factor stand out.
On the other hand, retrospective case studies have a very low evidence value, since we have no control over confounding factors. If you check the UK NHS classification of medical evidence quality, those retrospective systems end up second to last on the desirability scale at level II-3. The same goes for the US Preventive Services Task Force classification where it is at level C. You find both here:
Laws like HIPAA cripple these research teams, because of the need for privacy/legality, no matter how many NDAs, contracts and other things one signs. Other laws also hinder these misguided initiatives, namely the law that says that patients can only agree to having their data used for a predefined purpose. This law stipulates that if you want to do something else with the patient’s data than what you said when you got the informed consent, you have to ask for a new informed consent for that. The problem is that if you collect data before you have decided what you will do with it, this doesn’t compute. Rightly so. The phrase “if you don’t know where you’re going, you won’t know when you’ve arrived” comes to mind.
The short of it is this: many politicians and far too many IT people who really should know better, have fallen for the lure of the data warehouse, thinking that if we only have enough data, we’ll find the holy grail, the truth, the answer to everything. We’ll almost certainly do that, in fact, but since we skipped the question, it will have no meaning. Think “42″.
But what’s worse, much worse, is that funds are getting redirected from correct scientific research into the building of “Deep Thought”, and any result that comes forth from it, even if used for “good” scientific studies, will largely be false leads.
We’ve already seen terrible examples of this in Sweden, from the national registries. Even when proven wrong, these results keep sucking funding from good research. It’s a bad, bad move.
Registry studies, data warehouse analysis, are actually nothing more than grandiose scale anecdotic medicine, which we dumped with great satisfaction a couple of decades back, only to see it return as a computerized zombie that will soon proceed to eat our brains.