Two doctoral students in the department of biostatistics at the Columbia University Mailman School of Public Health beat the clock in their datathon debut and impressed the judges with their insights into the sharing economy. As runners ready themselves for the New York City Marathon, Ms. Shanghong Xie and Ms. Wodan Ling squared off in their own version of going the extra mile. Datathons, similar to hackathons, bring together computer scientists, engineers, and statisticians, challenging them to untangle complex datasets and solve problems for the thrill of victory, and sometimes, a substantial cash prize.
Ms. Wodan Ling and Ms. Shanghong Xie, competed in a six-hour datathon, sponsored by the financial services firm Citadel. From a pool of over 1,000 applicants, the Mailman School students were chosen to face off against 180 entrants challenged to uncover insights from real-world datasets supplied by participating companies.
Joined by another Columbia student and a fourth from NYU, Ms. Wodan and Ms. Shanghong’s team began by analyzing the structure of the data, scrubbing the data or removing false information, and organizing the figures — critical and time-consuming steps leading up to the question they proposed to answer.
[Photo: Ms. Shanghong Xie (left) and Ms. Wodan Ling]
“Our first task was to pick an interesting question to answer,” Ms. Wodan said. With time quickly passing, the team decided to answer the question that would find a way to select crucial features for the company in different scenarios, including season and geography. “It’s a simple but useful question,” Ms. Wodan explained.
Ms. Wodan and Ms. Shanghong tested and re-tested computer models, machine-learning and data-mining techniques to “train” the dataset and gather results for interpretation. At 3:00 p.m., it was laptops down and the results of each team were submitted to a panel of three judges from Columbia’s department of electrical engineering, Yale University’s department of statistics and data science, and AIG, the multinational insurance corporation. While it was their first datathon, the Mailman competitors felt confident in their work, particularly the strength of the question they posed with the data set they were given.
“We knew our question would be interesting to the company,” Ms. Shanghong said. Even so, when the judges concluded its assessment, they were caught off guard. Ms. Shanghong and Ms. Wodan’s team was victorious, earning a $20,000 cash prize.
Both women saw the competition as a way to strengthen their skills as data scientists in developing statistical methodologies to solve health-related challenges seen in epidemiology, neuroimaging, and genetic fields. Likewise, they say, what they learned at the Mailman School helped them excel in the competition.
On November 27, their skills will be tested again as the team returns for the Citadel datathon finals, facing off against competitors from a wide range of schools including Carnegie Mellon, Duke, and MIT for the chance at another cash prize.
[Photo: Dr. DuBois Bowman]
Dr. DuBois Bowman, chair of biostatistics, was inspired to see two of his students take home honors. “It is encouraging that we attract such talented students at Mailman and provide advanced training on analytic skills with applications to real-world societal matters at a level where our students shine brightly in an open competition that cuts across all sectors.”