We're quite happy to present our first guest Canvas blog post! Edan Shahmoon from Micro Focus was kind enough to share his experience with Canvas. Have a similar story you'd like to share? Feel free to tweet me at @alexf_elastic.
While watching the World Cup games, some questions came to mind. Are there any patterns for the timing of goals? Does the number of fouls affect the chances to win or lose? As well as many other questions that a professional football player won’t ask- but we will!
I chose Elasticsearch for this project for its capability to analyze time series data and Kibana to visualize the data and to transform it into knowledge. I also heard about Canvas and I found this project as an opportunity to experiment with it. So, let’s get started!
The data source for this project is “world_cup_json” - a JSON based REST API for the world cup statistics.
When I found this API, I thought that the ETL process will be a piece of cake. I’ll just have to upload these JSONs into Elasticsearch and thanks to the dynamic mapping of Elasticsearch, my data will be ready for analysis.
I created a Python script which implements this logic, prepared a Dockerfile to handle the dependencies of this script elegantly and it worked!
I could see all the data in Elasticsearch although it wasn’t good enough. In the raw data there were lists (or multilevel objects) that I would like to analyze as a standalone document.
Normalization – Since my data is structured and relatively flat, I decided to be “conservative” and to transform it as if I would transform it for a traditional SQL database. I decomposed the original documents into smaller documents which represent the entities:
- Matches – General information about a match (location, weather, teams...)
- Team stats – The statistics of each team at each match (goals, corners, off sides…)
- Events – information about events like goals, tickets and substitutions.
These steps eventually helped me to analyze the data with the new SQL feature.
Create artificial timestamps for Timelion – At the raw data the events (goals, yellow tickets, substitutions…) are represented as a list. In addition to each event, there was a field showing the time that the event occurred. That field was formatted as the minute that the event occurred. I wanted to use the time series analysis features of Timelion, so I decided to break these lists into standalone documents and to set a timestamp for each event. To compare events from different matches,I mapped the first minute of all the matches to the first minute of 2018. Thanks to this trick, I could use all the features Timelion offered.
After we loaded the data into Elasticsearch, we can begin to analyze the data in Canvas!
Visualize with Canvas
For this project, I decided to try out Canvas. It’s highly customizable and very promising. As with every innovation, it takes some time to get used to it.
In the first page of my workpad I used different elements of Canvas to show some statistics about the final match of the World Cup:
Here I used Markdown and the repeatImage to show the statistics in a user-friendly way.
I asked myself if the ball possession can help to predict the winner. It wasn’t significant as you can see at the final match. My hypothesis was that the reason for this is the high number of goals that came from penalties. I figured out that actually about 25% of the goals in the world cup came from penalties or own goals which aren’t affected by the ball possession!
The second page shows the distribution of goals over time:
The third one shows the distribution of substitutions over time:
Want to analyze the World Cup data by yourself?
The project is available on GitHub. I created a Docker compose file which set up everything automatically. The README file provides all the technical details, so head on over to try it out!
About the Author
Edan Shahmoon is a SaaS DevOps Engineer at Micro Focus. He has been a part of a team that is responsible for Elasticsearch logging clusters and helps gain insights from logs with Kibana. Edan is passionate about data mining, monitoring, Elasticsearch or any combination of them!