Python visualization demystified

Beautiful Bar Charts in Matplotlib

Transforming the default Matplotlib bar chart into a simple, stylish visualization

Bar charts are ubiquitous in the data visualization world. They may not be the sexiest of choices when plotting data, but their simplicity allows data to be presented in a straightforward way that's usually easy to understand for the intended audience.

That being said, there is a big difference (at least, in my humble opinion) between a good and bad bar chart. One of the more important pillars of making a bar chart a great bar chart is to make it visually "smart". That means a few main things:

  1. Make the base plot itself high quality and visually appealing
  2. Remove redundancies and elements that are not mandatory from an information perspective
  3. Add annotations to give the chart "at a glance" understandability

What does all that mean? Easiest to walk through it with an example.

The default Matplotlib bar chart

Let's first get some data. For this example, we'll use the popular cars dataset available in several sample data repositories.

# Load Matplotlib and data wrangling libraries.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Load cars dataset from Vega's dataset library.
from vega_datasets import data
df = data.cars()
df.head()
Acceleration Cylinders Displacement Horsepower Miles_per_Gallon Name Origin Weight_in_lbs Year
0 12.0 8 307.0 130.0 18.0 chevrolet chevelle malibu USA 3504 1970-01-01
1 11.5 8 350.0 165.0 15.0 buick skylark 320 USA 3693 1970-01-01
2 11.0 8 318.0 150.0 18.0 plymouth satellite USA 3436 1970-01-01
3 12.0 8 304.0 150.0 16.0 amc rebel sst USA 3433 1970-01-01
4 10.5 8 302.0 140.0 17.0 ford torino USA 3449 1970-01-01

Let's look at the average miles per gallon of cars over the years. To do that, we'll need to use pandas to group and aggregate.

mpg = df[['Miles_per_Gallon', 'Year']].groupby('Year').mean()
mpg.head()
Year Miles_per_Gallon
1970-01-01 17.689655
1971-01-01 21.250000
1972-01-01 18.714286
1973-01-01 17.100000
1974-01-01 22.703704

Let's create our first bar chart.

plt.bar(
    x=np.arange(mpg.size),
    height=mpg['Miles_per_Gallon']
)

default colab matplotlib bar chart

Interesting. We're using Google's Colaboratory (aka "Colab") to create our visualizations. Colab applies some default styles to Maplotlib using the Seaborn visualization library, hence the gray ggplot2-esque background instead of the Matplotlib defaults.

As a first step, let's remove those Seaborn styles to get back to base Matplotlib and re-plot.

# Colab sets some Seaborn styles by default; let's revert to the default
# Matplotlib styles and plot again.
plt.rcdefaults()

plt.bar(
    x=np.arange(mpg.size),
    height=mpg['Miles_per_Gallon']
)

default matplotlib bar chart

I actually prefer this to Colab's default. Nice and clean and a better blank canvas from which to start.

Now we need to fix the x-axis to actually be labeled with the year and we're good to go.

plt.bar(
    x=np.arange(mpg.size),
    height=mpg['Miles_per_Gallon'],
    tick_label=mpg.index.strftime('%Y')
)

default matplotlib bar chart

Not bad! It's a pretty nice default chart honestly. But we can make it significantly better with just a few more tweaks.

Create a high-resolution chart

The first thing we'll change is the size and resolution of the chart to make sure it looks good on all screens and can be copy/pasted easily into a presentation or website.

The first thing we'll do is to increase the resolution via an IPython default "retina" setting, which will output high-quality pngs. There are two ways to do this, both shown below.

# Increase the quality and resolution of our charts so we can copy/paste or just
# directly save from here.
# See:
# https://ipython.org/ipython-doc/3/api/generated/IPython.display.html
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', quality=100)

# You can also just do this in Colab/Jupyter, some "magic":
# %config InlineBackend.figure_format='retina'

Let's plot again, and make two more additions:

  1. Set the default size of the image to be a bit larger
  2. Use tight_layout() to take advantage of all the space allocated to the figure
# Set default figure size.
plt.rcParams['figure.figsize'] = (8, 5)

fig, ax = plt.subplots()

ax.bar(
    x=np.arange(mpg.size),
    height=mpg['Miles_per_Gallon'],
    tick_label=mpg.index.strftime('%Y')
)

# Make the chart fill out the figure better.
fig.tight_layout()

matplotlib bar chart hi res

Simple Axes: remove unnecessary lines

Our next step is to make the chart even simpler, but to also add back some horizontal gridlines. The latter is definitely optional, especially after the next step (we'll add text annotations for each bar value), but it does sometimes help make the chart more interpretable.

fig, ax = plt.subplots()

ax.bar(
    x=np.arange(mpg.size),
    height=mpg['Miles_per_Gallon'],
    tick_label=mpg.index.strftime('%Y')
)

# First, let's remove the top, right and left spines (figure borders)
# which really aren't necessary for a bar chart.
# Also, make the bottom spine gray instead of black.
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_color('#DDDDDD')

# Second, remove the ticks as well.
ax.tick_params(bottom=False, left=False)

# Third, add a horizontal grid (but keep the vertical grid hidden).
# Color the lines a light gray as well.
ax.set_axisbelow(True)
ax.yaxis.grid(True, color='#EEEEEE')
ax.xaxis.grid(False)

fig.tight_layout()

bar chart simple axes

Looking pretty sweet right? Almost there...

Adding text annotations

If the specific value of each bar is relevant or meaningful (as opposed to just the general trend), it's often useful to annotate each bar with the value it represents. To do this in Matplotlib, you basically loop through each of the bars and draw a text element right above.

fig, ax = plt.subplots()

# Save the chart so we can loop through the bars below.
bars = ax.bar(
    x=np.arange(mpg.size),
    height=mpg['Miles_per_Gallon'],
    tick_label=mpg.index.strftime('%Y')
)

# Axis formatting.
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_color('#DDDDDD')
ax.tick_params(bottom=False, left=False)
ax.set_axisbelow(True)
ax.yaxis.grid(True, color='#EEEEEE')
ax.xaxis.grid(False)

# Grab the color of the bars so we can make the
# text the same color.
bar_color = bars[0].get_facecolor()

# Add text annotations to the top of the bars.
# Note, you'll have to adjust this slightly (the 0.3)
# with different data.
for bar in bars:
  ax.text(
      bar.get_x() + bar.get_width() / 2,
      bar.get_height() + 0.3,
      round(bar.get_height(), 1),
      horizontalalignment='center',
      color=bar_color,
      weight='bold'
  )

fig.tight_layout()

bar chart with text annotations

It's a little much maybe, but is a great option to keep in mind when the value is important.

Finishing Touches: add nicely formatted labels and title

Up til now the plot hasn't had any labels or a title, so let's add those now. In many cases, axis labels or a title actually aren't needed, so always ask yourself whether they're redundant or necessary.

fig, ax = plt.subplots()

# Save the chart so we can loop through the bars below.
bars = ax.bar(
    x=np.arange(mpg.size),
    height=mpg['Miles_per_Gallon'],
    tick_label=mpg.index.strftime('%Y')
)

# Axis formatting.
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_color('#DDDDDD')
ax.tick_params(bottom=False, left=False)
ax.set_axisbelow(True)
ax.yaxis.grid(True, color='#EEEEEE')
ax.xaxis.grid(False)

# Add text annotations to the top of the bars.
bar_color = bars[0].get_facecolor()
for bar in bars:
  ax.text(
      bar.get_x() + bar.get_width() / 2,
      bar.get_height() + 0.3,
      round(bar.get_height(), 1),
      horizontalalignment='center',
      color=bar_color,
      weight='bold'
  )

# Add labels and a title. Note the use of `labelpad` and `pad` to add some
# extra space between the text and the tick labels.
ax.set_xlabel('Year of Car Release', labelpad=15, color='#333333')
ax.set_ylabel('Average Miles per Gallon (mpg)', labelpad=15, color='#333333')
ax.set_title('Average MPG in Cars [1970-1982]', pad=15, color='#333333',
             weight='bold')

fig.tight_layout()

bar chart with labels

Extra Credit: change the font

# Download the fonts we want from Github into our Colab-local font directory.
!wget --recursive --no-parent 'https://github.com/google/fonts/raw/master/apache/opensans/OpenSans-Regular.ttf' -P /usr/local/lib/python3.6/dist-packages/matplotlib/mpl-data/fonts/ttf
!wget --recursive --no-parent 'https://github.com/google/fonts/raw/master/apache/opensans/OpenSans-Light.ttf' -P /usr/local/lib/python3.6/dist-packages/matplotlib/mpl-data/fonts/ttf
!wget --recursive --no-parent 'https://github.com/google/fonts/raw/master/apache/opensans/OpenSans-SemiBold.ttf' -P /usr/local/lib/python3.6/dist-packages/matplotlib/mpl-data/fonts/ttf
!wget --recursive --no-parent 'https://github.com/google/fonts/raw/master/apache/opensans/OpenSans-Bold.ttf' -P /usr/local/lib/python3.6/dist-packages/matplotlib/mpl-data/fonts/ttf

# Use Matplotlib's font manager to rebuild the font library.
import matplotlib as mpl
mpl.font_manager._rebuild()

# Use the newly integrated Roboto font family for all text.
plt.rc('font', family='Open Sans')

fig, ax = plt.subplots()

# Save the chart so we can loop through the bars below.
bars = ax.bar(
    x=np.arange(mpg.size),
    height=mpg['Miles_per_Gallon'],
    tick_label=mpg.index.strftime('%Y')
)

# Axis formatting.
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_color('#DDDDDD')
ax.tick_params(bottom=False, left=False)
ax.set_axisbelow(True)
ax.yaxis.grid(True, color='#EEEEEE')
ax.xaxis.grid(False)

# Add text annotations to the top of the bars.
bar_color = bars[0].get_facecolor()
for bar in bars:
  ax.text(
      bar.get_x() + bar.get_width() / 2,
      bar.get_height() + 0.3,
      round(bar.get_height(), 1),
      horizontalalignment='center',
      color=bar_color,
      weight='bold'
  )

# Add labels and a title.
ax.set_xlabel('Year of Car Release', labelpad=15, color='#333333')
ax.set_ylabel('Average Miles per Gallon (mpg)', labelpad=15, color='#333333')
ax.set_title('Average MPG in Cars [1970-1982]', pad=15, color='#333333',
             weight='bold')

fig.tight_layout()

bar chart with Open Sans

Beautiful no? Maybe could still use a small tweak here or there, but in general, much cleaner than where we started. Have thoughts on ways to make it better? Let me know.