Python visualization demystified

An Intro to Plotnine

Getting started with Plotnine, Python's ggplot2 clone

Many data scientists, analysts and visualization gurus start their careers (or academic work) using the R language and statistical framework. And the large majority of those people, this author included, become intimately familiar with R's most popular visualization library: ggplot2. The syntax of most Python visualization libraries is pretty different from ggplot2, so to make the transition easier, there have been a few attempts at recreating ggplot2 in Python.

The most recent of those efforts is plotnine [documentation, github], a library that describes itself as A grammar of graphics for Python (also known as: a clone of ggplot2).

A Basic Chart

Even though usually frowned upon due to polluting the global namespace, the common way to import the library so you can use it as you would in R is via from plotnine import *. If you're using Google Colaboratory environment, as of this post, plotnine is not included so you'll have to download it using the command !pip install plotnine.

# Load plotnine.
from plotnine import *

# Import vega datasets and load iris dataset.
from vega_datasets import data

df = data.iris()

# Create a simple scatter plot.
# Note, the parens wrapping the statement allow you to use `+` at the end of the line
# without escaping with a backslash.
(ggplot(df, aes('petalWidth', 'petalLength')) +
  geom_point())

plotnine scatter plot

Let's break that down quickly:

  • Use ggplot() to create the base figure.
  • ggplot() takes your data as its first argument and the "aesthetic mapping" as the second; basically, how you want to map your data to the figure and axes.
  • aes() defines your mapping, the first argument being the x and the second the y. You can also explicitly map, e.g. `aes(x='petalWidth', y='petalLength')
  • We add layers to the plot using the plus sign +; the main layer here being the points we want to add for each x, y pair. We use geom_point() to do that.

Simple Style Changes

Style changes are easy and intuitive in Plotnine. For the marks themselves, just add arguments to the geom_<type>() function.

(ggplot(df, aes('petalWidth', 'petalLength')) +
  geom_point(color='darkgreen', size=4)
)

plotnine scatter plot with styling

Adding More Dimensions to the Aesthetic

What about adding another dimension to the chart, e.g. the species of the flower? Again, it's very simple and pretty intuitive: we just add another mapping to the aesthetic (aes()). For example, aes(..., color='species') to map different colors to the species column of the dataset.

Just to see how powerful the grammar of graphics is, let's add trendlines with confidence bands as well via adding on stat_smooth(method='lm').

(ggplot(df, aes('petalWidth', 'petalLength', color='species')) +
  geom_point() +
  stat_smooth(method='lm')
)

plotnine scatter plot by species with fit

This library is immensely powerful with an intuitive and consistent API. There are many more things to show which we'll follow up with in future posts. Hope that gives you a basic feel for plotnine.