3

I feel like I'm missing something ridiculously basic here.

If I'm trying to create a bar chart with values from a dataframe, what's the difference between calling .plot on the dataframe object and just entering the data within plt.plot's parentheses?

e.g.

plt.plot([1, 2, 3, 4], [1, 4, 9, 16])

VERSUS

df.groupby('category').count().plot(kind='bar')?

Can someone please walk me through what the difference is and when I should use either? I get that with plt.plot I'm calling the plot method of the plt (Matplotlib) library, whereas when I do df.plot I'm calling plot on the dataframe? What does that mean exactly -- that the dataframe has a plot object?

3 Answers 3

6

Those are different plotting methods. Fundamentally, they both produce a matplotlib object, which can be shown via one of the matplotlib backends.

There is however an important difference. Pandas bar plots are categorical in nature. This means, bars are positionned at subsequent integer numbers, and each bar gets a tick with a label according to the index of the dataframe. For example:

import matplotlib.pyplot as plt
import pandas as pd

s = pd.Series([30,20,10,40], index=[1,4,5,9])
s.plot.bar()

plt.show()

enter image description here

Here, there are four bars, the first is at positon 0, with the first label of the series' index, 1. The second is at positon 1, with the label 4 etc.

In contrast, a matplotlib bar plot is numeric in nature. Compare this to

import matplotlib.pyplot as plt
import pandas as pd

s = pd.Series([30,20,10,40], index=[1,4,5,9])
plt.bar(s.index, s.values)

plt.show()

enter image description here

Here the bars are at the numerical position of the index; the first bar at 1, the second at 4 etc. and the axis labelling is independent of where the bars are.

Note that you can achieve a categorical bar plot with matplotlib by casting your values to strings.

plt.bar(s.index.astype(str), s.values)

enter image description here

The result looks similar to the pandas plot, except for some minor tweaks like rotated labels and bar widths. In case you are interested in tweaking some sophisticated properties, it will be easier to do with a matplotlib bar plot, because that directly returns the bar container with all the bars.

bc = plt.bar()
for bar in bc:
    bar.set_some_property(...)
Sign up to request clarification or add additional context in comments.

Comments

2

Pandas plot function is using Matplotlib's pyplot to do the plotting, but it's like a shortcut.

I was similarly confused when I started trying to visualise my data, but I decided in the end to learn matplotlib because in the end you get more control of the visualisation.

1 Comment

Hi, I'm a beginner. Are you saying Matplotlib gives more customization options than Pandas' built-in plot functions? So I should export a Series into a list and graph using Matplotlib?
1

I think it depends on the data you have. If you have a clean data frame and you just want to print something quickly, then you can use df.plot. For example, you can group by a column and then specify x and y axes.

If you want a more complicated graph, then working directly with matplotlib is better. At the end, matplotlib will give you more options.

This is a good reference to start with: http://jonathansoma.com/lede/algorithms-2017/classes/fuzziness-matplotlib/understand-df-plot-in-pandas/

1 Comment

Hi, I'm a beginner. Are you saying Matplotlib gives more customization options than Pandas' built-in plot functions? So I should export a Series into a list and graph using Matplotlib?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.