Data Visualisation & Buy-In
Data visualisation (“data viz”) is an important skill for all sports scientists. If you cannot convey your message, the data will remain a puzzle of numbers and graphs, rather than transforming into information/telling a story that can help inform decision making (stakeholders).
Some of the most practical work on data viz for sports science in recent times has come from the team at Paris Saint-Germain Football Club in the Aspetar Sports Medicine Journal; here, here, and here.
Buchheit (2017) lists considerations for presenting data that, in brief, include:
- Simple and powerful reports by; limiting the number of variables, removing extra decimals, listing all text orientated horizontally, using graph labels for values, highlighting meaningful differences, including error bars, and using advanced visualisation tools.
- Align the message with coach and athlete expectations and preferences, considering; visual vs verbal, paper vs digital, quantitative vs qualitative, and tables vs graphs.
Of course, data viz is just one step within a wider method of dealing with data. Mathieu Lacome list data visualisation, along with capture, storage, transformation, and analysis, as data processes that should be undergoing continual optimisation. Ultimately, all of these processes are working towards supporting the staff in their decision making.
“[G]ood data visualisation should help practitioners improve coaching staff ‘buy-in’ and, therefore, staff ability to make informed decisions.”
Lacome, M., Simpson, B., & Buchheit, M. (2018). Monitoring training status with player-tracking technology: still on the road to Rome. Aspetar Sports Med J, 7, 54-63.
Given the importance then of our data viz techniques, let’s explore this area in relation to designing figures. Here are some ideas on plotting continuous data that initially came from another realm of science.
Revealing, Not Concealing Data
One individual’s work I have found of interest on the topic of data visualization is Dr Tracey Weissgerber (@T_Weissgerber). She is an Assistant Professor at the Mayo Clinic, focussing her research on preeclampsia, a complication from pregnancy. Despite an unapparent link between her area of research and sports science, her ideas around designing figures are pertinent for anyone presenting graphs.
Dr Weissberger and collaborators at the Mayo Clinic have discussed how bar charts are frequently used in scientific research to present continuous data with small sample data. However, as they then demonstrate, employing a bar graph can display the underlying data in the same way despite differences in the dataset. This is, of course, greatly linked to the limitation of using a mean average, which we discussed in our last post on the “End of Average” book and its implications in sports science.
Even with normally distributed data, bar graphs do not allow you to critically evaluate continuous data, making reading a passive experience rather than active. They also distort the perception of the range of observed values due to, what they have called, the zones of irrelevance and invisibility.
So this leaves us with two key questions;
1. Are there any instances when we should use bar graphs?
2. In the instances when we shouldn’t use bar graphs, what should we use?
Well, bar graphs are best utilised for counts or proportions. This helps to explain why we were taught the y axis of bar graphs has to start at zero; because they are designed for such counts and therefore bar height in meaningful.
As for continuous data, the answer as to what we should use is (the frequent answer) “it depends” … although not a bar graph! Once again, Dr Weissgerber provides guidelines for what visualisation might work best based on the data you are working with (see below).
Dr Weissgerber outlines two important approaches to revealing your data when working with such plots to display continuous data:
- Make all data point visible by decreasing point size, using semi-transparent points, a random jitter, symmetric jitter [least to most effective]
- Emphasise summary statistics by increasing width, emphasising summary i.e. black median and de-emphasise points
Such work has led to numerous journals encouraging and/or requiring authors to display data distribution within their figures, including the Journal of Biological Chemistry, PLOS Biology, eLife, and Nature. This editorial in Molecular Pharmacology explains the rationale of their requests, along with useful figures in the full paper that demonstrate how these approaches improve the transparency of reporting.
“Depiction of data in figures should provide as much granularity as possible, e.g., by replacing bar graphs with scatter plots wherever feasible and violin or box-and-whisker plots when not.”
Michel, M. C., Murphy, T. J., & Motulsky, H. J. (2020). New Author Guidelines for Displaying Data and Reporting Data Analysis and Statistical Methods in Experimental Biology. Drug Metabolism and Disposition, 48(1), 64-74.
From more on Dr Weissberger’s work, you can explore these resources:
- “Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm” open access on PLOS Biology
- “Reveal, Don’t Conceal. Transforming Data Visualization to Improve Transparency” in Circulation in late 2019
- The themes of this paper are also outlined in this Twitter Q&A thread and in this presentation available on YouTube.
Applications in Sports Science
Sports science is often dealing with precisely this type of data; continuous data with a relatively small sample size. In these cases, an outlier can distort the mean average (see “The End of Average”). The limitations of using bar charts to present such data, along with recommendations for the use of dot, box, and violin plots have already been highlighted by a number of those in sports science, often citing Dr Weissberger’s work:
“Univariate scatterplots or dot plots are recommended, showing the raw data when there is a small sample size, or use box plots with interquartile ranges (25th and 75th percentile of the sample), where whiskers may be included to demonstrate outliers. Violin plots are effective in demonstrating the distribution of the data in medium and large sample sizes, and bar graphs should be avoided in presenting continuous data, particularly with small sample sizes.”
Thornton, H. R., Delaney, J. A., Duthie, G. M., & Dascombe, B. J. (2019). Developing Athlete Monitoring Systems in Team Sports: Data Analysis and Visualization. International journal of sports physiology and performance, 14(6), 698-705.
And some recent Twitter posts have shared figures that suggests sports science research is starting to more widely reveal the data according to some of the principles discussed above. Here is a selection of them:
Dr Alice Sweeting, a Sports Scientist working for Victoria University and the Western Bulldogs AFL club, has also recently written on this topic. In this excellent post she shares R code for creating these kinds of data viz using R programming and the ggplot2 package.
Final Thoughts
The importance of data viz in communicating the message of the information to key stakeholders is well established in sports science. However, clearly there are limitations in the accuracy and detail in which bar charts communicate the true story. Therefore, as we discussed in the End of Average post, there is a need to consider the individual, even when working within a team sport, and therefore our data visualisation approaches also need to reflect that.