"Living by coast improves health" graphic
It's always nice when one of my colleagues approaches me with some data they're really excited about. In this case it was Dr Ben Wheeler, who had been analysing UK census data, and found a relationship between how close people live to the coast and their self-reported quality of health. It fell to me to try and present the relationship he had found visually - and in a very small space, for publication in a scientific journal. I can talk about the development process now, as the paper has recently been published:
I knew from the outset that I wanted the resulting figure to look authoritative, given the context of use in a scientific journal. It of course needed to communicate the results accurately and clearly, but I also wanted it to quickly reveal to the reader what the data points were showing them. This is what I had to work with - two bar charts, some raw data, and a map of England:
Essentially, he had divided the census data into five groups - those who lived further than 50km from the sea, and those who lived closer than this. The map explained this grouping. The bar charts showed what happened to the health of these people, when subdivided into those that lived in cities, towns or rural areas. There was also a separate analysis of people in different income deprivation groups.
The bar charts and map I was provided with were a decent first look at the data, but a couple of things were bothering me. First of all, the groups were not evenly spaced - the first group presented were living 20-50km from the sea, the next group were 5-20km, then 1-5km, then only 0-1km for the final group. To me, the equal width bars made it seem like the groups were an equal set of distances at first glance.
Also, bar charts (or histograms) are generally used for counting quantities of something. Here, we're actually presenting the mean values, and the confidence intervals of these values (ie the range of values in which we are 95% sure that the true mean lies, statistically speaking). I decided that the audience would likely be more familiar with the use of single circular points and bars - which is the way that statistics packages like SPSS present this data.
This was my first sketch trying to combine all the data in one graphic:
I decided that I would reverse the colours used for the map, with the darker blue for those people closer to the sea (and therefore "more watery"). This also linked to the data points so that it was (hopefully) fairly obvious which linked to which. I also decided not to colour the central area of the map at all - as there was no data point for the people living here anyway (they were actually the baseline for the analysis - those that the other data points were compared to).
I had to use quite a large bar to show which colour linked to which distance so that all groups were visible, which isn't ideal, as this mental mapping isn't natural. I also wasn't happy with the way that the group closest to the sea (0-1km) was hard to pick out from the next group (1-5km).
A second major problem in my mind was the fact that the map was quite hard to pick out as the shape of England. The country's shape is much more recognisable with the other countries that make up the United Kingdom, as they form the shape of the whole landmass of Great Britain (and the North of Ireland, which helps recognisability too). Recognising what the figure represents also depends on knowing the shape of England - and for the international audience of the paper, this could not be guaranteed.
Going back to the drawing board, I decided that a side-on representation would make the message clearer: that this data relates to the sea and how far people live from it:
However, as soon as I started to draw this idea to scale, I realised that there were problems with this kind of representation. To scale, the highest peak in England is just less than 1km above sea level, so there wasn't going to be much height if we imagine looking at England from the side.
Also, the sea is quite shallow around the coast of the UK (it sits on a shelf) - in fact, about 3km out, it only goes down about 300m. Added to this, towns and cities would only be tiny:
While this representation was technically accurate, it didn't convey much of a sense of what the viewer was looking at directly:
In the next version, I tried to make the landmass more distinct by knocking back the colour of the scale at the top of the diagram, but text legibility was a problem.
Also, I was concerned that the main message of the paper - that living near the sea improves your health - was getting lost by all the data presented. Going back to the authors, I established that the income deprivation groups were all based on the urban dwellers anyway, so I requested to remove the town and rural data, which would make the message clearer.
Also note the changed wording on the scale - from "regression coefficient" to "health". Technical terms have their place, but not when they get in the way of understanding.
Having only one group of main results to present now, I initially thought that I might relate the data points directly to the scale at the top, by placing them directly below:
This had two serious problems. Firstly, the relationshop was far harder to see with the data points spaced so widely apart. Secondly, it was much harder to compare the overall results to the income deprivation groups.
Looking back at my earlier sketch at this point, I realised that my initial idea, of placing objects such as trees and turbines, on the graphic was the way to go:
While the trees in this version are, technically, about 3km high, I decided that the figurative representation was important for explanation purposes. Also, I replaced the (however accurate) side-view of the sea with a more illustrative set of waves, with a seagull flying over them.
Also, I returned the overall data to sit alongside the income deprivation groups - trying to keep some of the relationship to the scale using patches of colour at 30% opacity. However, I felt that these blocks of colour were also interfering with the message told by the data. After removing them, though:
The graphic was relying entirely on the colour of the dots to show which group was which. I never like to assume that people will be able to perceive colour differences like this if I can avoid it (or reinforce it in other ways (because of different abilities to see colour, seeing a bad photocopy, etc).
In the end, I added in some arrows leading from the houses to the data points - which serve two purposes. Firstly, they relate the different dwelling places to the data points that they represent. Secondly, they draw the viewer's attention to the key data - the fact that there is a difference in the health of people living near the sea that can't be explained within the bounds of statistical uncertainty (the error bars do not overlap).
The resulting figure is somewhat a compromise between scientific accuracy and aesthetic qualities. It is clearly something that is more appropriate for a scientific publication than a glossy design magazine. However, it does (hopefully) convey the message embodied in the data in a way that also clearly presents the subject matter - its relationship to the area in which people live, and their access to the coast - using visual language that is not often enough employed in scientific publications, in my opinion.









Comments