As a member of the strategy and analytics team here at Zion & Zion, I recently participated in a Kaggle competition with a group of other data scientists from across the country. I wanted to write this article to highlight the importance of Kaggle competitions, their influence in today’s organizations, and key takeaways from my participation. As crowdsourcing has revolutionized donations and funding, Kaggle has used the same concept to revolutionize project workflow, quality, and cost.
What is a Kaggle Competition?
Kaggle was founded in 2010 as an online stage for predictive modeling and analytics competitions. It uses a crowdsourcing approach to complete work and generate data strategy. Companies, governments, and researchers can post their downloadable data onto the platform for data scientists and analysts, like myself, from around the globe to compete for who can produce the best model.
Submissions are scored based on predictive accuracy relative to a hidden solutions file. The winner(s) typically receives prize money or employment in exchange for an irrevocable and royalty free license to use the winning entry’s model and model strategy. Essentially, all intellectual property will be turned over to the host in exchange for prize money or employment as specified in the particular competition’s guidelines.
Kaggle is the largest and most diverse data community in the world with over 536,000 users in 194 countries. It has over 3,500 submissions for competitions per day. Data scientists who participate in Kaggle competitions come from diverse backgrounds including; computer science, public health, biology, psychology, anthropology, engineering, medicine, and more. Many of these researchers will turn their competition experience into papers for peer-reviewed journals.
Why is it important?
Since it was founded, Kaggle has held over 200 competitions in an array of industries and for an array of companies, such as animal shelters, Higgs boson at CERN, Microsoft, State Farm Insurance, City of San Francisco, Expedia, Facebook, and the NBA.
The results of these competitions have not only resulted in business and marketing intelligence advancement for companies, but have also funded health research, recruiting, menu items and pricing, photo classification, traffic forecasting, election outcomes, budgeting projections for non-profits, and more.
What did I experience?
The particular Kaggle competition I participated in was predicting the 2016 presidential primary results, state-by-state. The team I was a part of consisted of two developers and two analysists. As an analyst, I was responsible for determining methodology, developing the strategy, and interpreting the results.
Alas, some states were more straight forward to predict than others. Provided data sets were uploaded to Kaggle for us to use which included registered voters, party affiliations, and historical voting data. Additional factors were added to the model through secondary sources such as a candidate’s air time, digital media presence, social media sentiments, and campaign cash flow. Given this was a learning algorithm, each variable’s weights were subject to change as the algorithm learned. However, some of the best predictors came from registered voter counts and affiliation, candidate’s air time, and candidate’s cash flow. Social media sentiments were greatly affected by air time and as air time increased, social media sentiments saw greater impacts and shifts which also resulted in increased predictive power. Unlike other elections where air time tended to have a positive correlation with election likelihood, this election showed that air time did not necessarily have a positive correlation. This was affirmed by negative social media sentiments after increases in air time for several candidates.
How can this help your business?
The type of work done using data science to look at correlation and predictive modeling can help establish top-notch marketing campaigns for our clients. It can help determine which marketing efforts were most effective or have more weight on customers’ behaviors. It can determine which media impressions drive more website visits or store visits. And, it can help determine which imagery not only attracts customers’ attention, but drives consumption and at what frequency.
Our strategy and analytics team frequently takes advantage of opportunities to apply data science to our Zion & Zion clients. In addition to optimizing the marketing mix, focusing on correlation and predictive modeling through data science can help the client with revenue forecasting and with knowing what to expect operationally as their campaigns are rolled out.