Since high school, I have been interested in fractals as well as chaos theory. Infinitely complex patterns repeatedly visible in everyday nature as well as the frustrating time limitations on the predictability of atmospheric and weather patterns are as much fascinating as they are approachable to the budding mathematicians and physicists toying with simple non-linear equations to create fantastically complex outputs. The recent Ebola outbreak in Africa illuminates a whole new space of complexity that piques the data scientist's curiosity: infectious disease predictability. With all the advances in medicine mixed with the recent massive data capture and computing capabilities, it begs the question on whether we can rein in or at least more accurately predict the infectious transmissions of various strains of viruses such as the influenza virus. In a lot of ways, the hysteria induced by dire Ebola predictions demonstrated a strong need for further understanding and devotion to contagion. One that utilizes all the advances in machine learning and large-scale data analysis.
NPR has an article from early this year All Predictions About Ebola Are Unpredictable that highlights some great questions we were all asking when the head of the USA's CDC got on television to announce a state of emergency last year: How bad is this going to get? And when is it going to end? The situation appears to be stabilizing at the moment with the WHO releasing an update on the Ebola outbreak in West Africa saying, "Liberia has been declared free of EVD transmission for a second time (3 September 2015), the overall case incidence in Guinea and Sierra Leone has been below 10 cases per week, and the Sierra Leonean capital city of Freetown has remained free of EVD transmission for over 42 days." However, Scientific American reminds us that Ebola Arises Again and Again numerous times, 23 times since the disease arose in 1976, of varying severity and seemingly with a lack of any sort of predictability. The NPR article nicely demonstrates the inability of various reputable medical bodies to make even remotely accurate predictions on the trajectory or severity of outbreaks. Why is this so and does it have to be?
The article mentions the work of Columbia's Jeffrey Shaman, who is surely not shy when it comes to attacking the difficult area of infectious disease prediction. He led a team to win the 2014 CDC ‘Predict the Influenza Season Challenge’, a challenge to forecast the timing, peak and intensity of the 2013-14 flu season using digital data. Shaman has interesting demonstrations of the team's work on his website. By getting in the muck of the data they have been able to tease out and explore solutions to many problems shared by data scientists in the business world:
- unreliable and biased information sources with uncertainty even on the sources themselves (getting, cleaning and labeling data)
- inability to accurately select and characterize pertinent parameters (ex. with ebola what are most impacting transmission rates and probabilities in various regions? culture, medical infrastructure, population densities, etc)
- a changing situation on the ground (ex. introduction of experimental vaccinations and better controls and procedures for ebola)
I believe the influenza challenge is a fantastic idea and look forward to further progress in our understanding of what is possible. The sharing of Ebola data, the CDC's openness to share their techniques for Influenza Surveillance in the United States, and the Columbia team's advancements are all wonderful for DIY data scientists looking to start building and improving solutions.
The non-linear dynamics and complex nature of disease-spread mixed with the grim and alarming potential of another deadly pandemic make the study of infectious disease predictability even more interesting than chaos theory and fractals.
What are your thoughts? Is disease prediction different from weather pattern prediction? Is it possible to harness our data analysis capabilities to accurately understand transmission patterns and risk profiles? What area of research will be the secret sauce to success?
About this blog
A place to review and share interesting data analysis and data presentations from around the web. Articles, papers, trends, thoughts from a data flooded world.
Created: October 10, 2015English