matplotlib scatter plot: How to use the data= argument

Question

The matplotlib documentation for scatter() states:

In addition to the above described arguments, this function can take a data keyword argument. If such a data argument is given, the following arguments are replaced by data[]:

All arguments with the following names: ‘s’, ‘color’, ‘y’, ‘c’, ‘linewidths’, ‘facecolor’, ‘facecolors’, ‘x’, ‘edgecolors’.

However, I cannot figure out how to get this to work. The minimal example

import matplotlib.pyplot as plt
import numpy as np

data = np.random.random(size=(3, 2))
props = {'c': ['r', 'g', 'b'],
         's': [50, 100, 20],
         'edgecolor': ['b', 'g', 'r']}

plt.scatter(data[:, 0], data[:, 1], data=props)
plt.show()

produces a plot with the default color and sizes, instead of the supplied one.

Anyone has used that functionality?

user2699 · Accepted Answer · 2017-10-24 15:36:50Z

8

This seems to be an overlooked feature added about two years ago. The release notes have a short example ( https://matplotlib.org/users/prev_whats_new/whats_new_1.5.html#working-with-labeled-data-like-pandas-dataframes). Besides this question and a short blog post (https://tomaugspurger.github.io/modern-6-visualization.html) that's all I could find.

Basically, any dict-like object ("labeled data" as the docs call it) is passed in the data argument, and plot parameters are specified based on its keys. For example, you can create a structured array with fields a, b, and c

coords = np.random.randn(250, 3).view(dtype=[('a', float), ('b', float), ('c', float)])

You would normally create a plot of a vs b using

pyplot.plot(coords['a'], coords['b'], 'x')

but using the data argument it can be done with

pyplot.plot('a', 'b','x', data=coords)

The label b can be confused with a style string setting the line to blue, but the third argument clears up that ambiguity. It's not limited to x and y data either,

pyplot.scatter(x='a', y='b', c='c', data=coords)

Will set the point color based on column 'c'.

It looks like this feature was added for pandas dataframes, and handles them better than other objects. Additionally, it seems to be poorly documented and somewhat unstable (using x and y keyword arguments fails with the plot command, but works fine with scatter, the error messages are not helpful). That being said, it gives a nice shorthand when the data you want to plot has labels.

answered Oct 24, 2017 at 15:36

user2699

3,1971 gold badge18 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Diziet Asahi Over a year ago

Thank you for your answer. After a year, I guess I've mostly given up on this syntax and I can't say I have missed it much after all. But in any case, I had totally misunderstood the documentation on this one, it does make sense with your examples now.

tsj · Accepted Answer · 2016-10-07 20:58:07Z

1

In reference to your example, I think the following does what you want:

plt.scatter(data[:, 0], data[:, 1], **props)

That bit in the docs is confusing to me, and looking at the sources, scatter in axes/_axes.py seems to do nothing with this data argument. Remaining kwargs end up as arguments to a PathCollection, maybe there is a bug there.

You could also set these parameters after scatter with the the various set methods in PathCollection, e.g.:

pc = plt.scatter(data[:, 0], data[:, 1])
pc.set_sizes([500,100,200])

answered Oct 7, 2016 at 20:58

tsj

7924 silver badges23 bronze badges

2 Comments

Diziet Asahi Over a year ago

Thanks for your answer. Obviously I could directly pass the arrays as arguments to the function. I'm working on some big code where using the data= argument could greatly simplify my code, which is why I was curious. I also checked the code from scatter(), and traced the use of data to a function update in the class Artist, but even then I cannot figure what it does.

tsj Over a year ago

Wouldn't using **props versus what we expect data=props to do be just as simple? I'm assuming you just don't want to spell out each keyword every time.

Collectives™ on Stack Overflow

matplotlib scatter plot: How to use the data= argument

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related