7

I have a pandas dataframe of "factors", floats and integers. I would like to make "R Lattice" like plots on it using conditioning and grouping on the categorical variables. I've used R extensively and wrote custom panel functions to get the plots formatted exactly how I wanted them, but I'm struggling with matplotlib to do the same types of plots succinctly. I am playing around with layouts and subplot2grid, but just cant seem to get it right.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

nRows = 500
df = pd.DataFrame({'c1' : np.random.choice(['A','B','C','D'], size=nRows),
               'c2' : np.random.choice(['P','Q','R'], size=nRows),
               'i1' : np.random.randint(20,50, nRows),
               'i2' : np.random.randint(0,10, nRows),
               'x1' : 3 * np.random.randn(nRows) + 90,
               'x2' : 2 * np.random.randn(nRows) + 89})

I would like to plot things such as the following (R lattice code examples)

x1 vs. x2 for each level of c1 (lattice code)

xyplot(x1 ~ x2 | c1, data = df)

x1 vs. x2 for each level of c1 with "global" legend c2 (symbols or colors)

xyplot(x1 ~ x2 | c1, groups = c2, data = df)

histograms of x1 for each c2

hist (~x1 | c1, data = df)

I am also trying to make "conditioned" contour plots such as those produced here (1.4.4.4)

https://scipy-lectures.github.io/intro/matplotlib/matplotlib.html

I have read through these examples: http://nbviewer.ipython.org/github/fonnesbeck/Bios366/blob/master/notebooks/Section2_4-Matplotlib.ipynb

However, I would like the layout to be generated from the number of levels in the categorical conditioning (or "by") variable(s). i.e. specify a number of columns, and the rows would be computed based on the number levels.

Appreciate any good advice or steps in the right direction. I'd prefer not use rpy2 or python ggplot (I messed around with them - found them to be frustrating and limiting too).

Thanks! Randall

2
  • There is some experimental code in pandas for trellis plots: pandas.pydata.org/pandas-docs/stable/rplot.html. Would that help? See also ggplot.yhathq.com which is like ggplot in R, it supports facet grids. Commented Sep 14, 2014 at 10:25
  • Can you add some examples for the contour plot questions. Seaborn has functionality for hexbins and 2 dimension kde plots which i think would fulfil what you are looking for. Commented Sep 14, 2014 at 17:06

2 Answers 2

8

Seaborn is the most effective library I have found for doing faceted plots in python. Its a pandas aware wrapper around matplotlib which takes care of all the subplotting for you and updates the matplotlib styling to look more modern. It produces some really lovely output.

The faceting is done using the grid part of the library.

It works a little diffently from R in that you create the grid first and pass the data into it, along with the facets you want, row, columns, colours, etc. You then map plotting functions onto that grid, passing any required arguments to the mapped plotting functions.

#scatter plot one factor
import seaborn as sns
grid1 = sns.FacetGrid(df, col='c1')
grid1.map(plt.scatter, 'x1', 'x2')


#scatter plot with column and hue factor
grid2 = sns.FacetGrid(df, col='c1', hue='c2')
grid2.map(plt.scatter, 'x1', 'x2')


#histogram with one factor
grid3 = sns.FacetGrid(df, col='c1')
grid3.map(plt.hist, 'x1', alpha=.7)
Sign up to request clarification or add additional context in comments.

2 Comments

This is great, but just wanted to point out that some of these plots can be achieved a bit more easily with the lmplot function. You could make the first one with sns.lmplot("x1", "x2", col="c2", data=df). This will fit a regression line too, which may or may not be useful, but can be disabled by adding fit_reg=False.
Many thanks for the answer. I did spend some time running through the seaborn tutorial (and code) and will call this answered. I still miss R Lattice a bit though. Conditioning and grouping does not seem as "natural" in seaborn / pandas as with R Lattice formula interface. I'm also used to working with data in Long, not wide form (although I see many sns plotting function accept both). I do think your description will help me understand and begin leveraging it more.
0

Unlike lattice, ggplot uses facet_wrap and facet_grid to create trellis plots of numerical variables by a categorical one. Some people describe plotnine as the translation of ggplot into Python, but I believe Lets-Plot is a closer counterpart, with its own aesthetics. Moving from an R ggplot visualization to a Python Lets-Plot visualization is almost seamless.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.