The actual data will be encoded to the y-axis of the chart, and they need to be assigned different x-axis values so they don’t sit on top of each other. The data are shown in the gray area of the worksheet, with the original data in the columns with all capital letters. The final element to plot is the dots representing the data.
Because the whisker will reach up to the 90 th percentile and down to the 10 th percentile, the length of the lines will be equal to the differences between the median and those percentiles. With the scatters plotted, I then added the error bars, the data for which are shown in the blue area.
#Box and whiskers plot excel how to#
For the x-values, I need to find the right-edge of each box, and while I’m sure there is some rule to how to find it, I experimented and it turned out to be 1.15, 2.15, and 3.15.
The y-values are simple-they are equal to the 50 th percentile and thus the equation points to the original in the orange section above. The scatterplots require x- and y- values, and are shown in the green section of the worksheet. To add the whiskers on the right edge of the boxes, I’m going to add a scatterplot point and then add a vertical error bar to denote the 10 th and 90 th percentiles. To make it clear the middle of the box is the 50 th percentile, I add an outline around both segments. The second stack will be the difference between the 50 th and 25 th percentiles, and the third/top stack will be the difference between the 75 th and 50 th percentiles. I’ll plot it and then set the fill color to No Color. Thus, the bottom/base segment of the stacked column chart is simply the 25 th percentile. The bottom of the box in this chart shows the 25 th percentile the middle shows the median or 50 th percentile and the top shows the 75 th percentile. In the yellow section below, I pull out the percentiles and generate differences I need to create the box. This looks in the data column (A2:A51) and looks for the percentile point specified in cell H3. The percentiles are calculated below the orange header row in the screenshots using the PERCENTILE formula. This works here because I only have 50 observations if I had a larger dataset, I would make these calculations in Stata or R and bring them over. I’m going to calculate five percentile points from the data directly in Excel. It takes a little longer to build the initial chart, but that will pay off with time savings later on. It turns out, however, that you can’t combine it with other chart types, so I had to use a different approach, one that combines three different elements:Īs usual, I try to set up my Excel file so that this can be more easily replicated with other data later on. I haven’t yet had a good excuse to try the new box-and-whisker chart type in Excel 2016, so I thought this was my chance. Perhaps the big issue about visualizing distributions and uncertainty, therefore, is that they require more significant annotation to help explain not just how to read the graph, but what is being graphed.Īnyways, I digress.
It may be because many people simply don’t understand statistics and distributions, nor do they understand the mathematical concept of uncertainty and how it can be introduced into data and mathematical or statistical models. In my mind, visualizing distributions and uncertainty is a big data visualization challenge. I have no idea what I’d call it-a box-and-whisker-and-scatterplot chart? I’m also not sure whether it’s better than a violin chart, which shows the entire distribution, but it’s certainly intriguing. This graph popped up in my Twitter feed a few weeks ago.