2

I have several pandas dataframes. I want to plot several columns against one another in separate scatter plots, and combine them as subplots in a figure. I want to label each subplot accordingly. I had a lot of trouble with getting subplot labels working, until I discovered that there are two ways of plotting directly from dataframes, as far as I know; see SO and pandasdoc:

ax0 = plt.scatter(df.column0, df.column5)
type(ax0): matplotlib.collections.PathCollection

and

ax1 = df.plot(0,5,kind='scatter')
type(ax1): matplotlib.axes._subplots.AxesSubplot

ax.set_title('title') works on ax1 but not on ax0, which returns AttributeError: 'PathCollection' object has no attribute 'set_title'

I don't understand why the two separate ways exist. What is the purpose of the first method using PathCollections? The second one was added in 17.0; is the first one obsolete or has it a different purpose?

1
  • 1
    Since this may be useful to anyone visiting the question, I have found df.plot(0,5,style='.') to work better than df.plot(0,5,kind='scatter') since the former will work after using groupby() whereas the latter does not Commented Mar 13, 2018 at 22:33

2 Answers 2

2

As you have found, the pandas function returns an axes object. The PathCollection object can be interpreted as an axes object as well using the "get current axes" function. For instance:

plot = plt.scatter(df.column0, df.column5)
ax0 = plt.gca()
type(ax0)

< matplotlib.axes._subplots.AxesSubplot at 0x10d2cde10>

A more standard way you might see this is the following:

fig = plt.figure()
ax0 = plt.add_subplot()
ax0.scatter(df.column0, df.column5)

At this point you are welcome to do "set" commands such as your set_title.

Hope this helps.

Sign up to request clarification or add additional context in comments.

Comments

1

The difference between the two is that they are from different libraries. The first one is from matplotlib, the second one from pandas. They do the same, which is create a matplotlib scatter plot, but the matplotlib version returns a collection of points, whereas the pandas version returns a matplotlib subplot. This makes the matplotlib version a bit more versatile, as you can use the collection of points in another plot.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.