7

The matplotlib documentation for scatter() states:

In addition to the above described arguments, this function can take a data keyword argument. If such a data argument is given, the following arguments are replaced by data[]:

All arguments with the following names: ‘s’, ‘color’, ‘y’, ‘c’, ‘linewidths’, ‘facecolor’, ‘facecolors’, ‘x’, ‘edgecolors’.

However, I cannot figure out how to get this to work. The minimal example

import matplotlib.pyplot as plt
import numpy as np

data = np.random.random(size=(3, 2))
props = {'c': ['r', 'g', 'b'],
         's': [50, 100, 20],
         'edgecolor': ['b', 'g', 'r']}

plt.scatter(data[:, 0], data[:, 1], data=props)
plt.show()

produces a plot with the default color and sizes, instead of the supplied one.

Anyone has used that functionality?

2 Answers 2

8

This seems to be an overlooked feature added about two years ago. The release notes have a short example ( https://matplotlib.org/users/prev_whats_new/whats_new_1.5.html#working-with-labeled-data-like-pandas-dataframes). Besides this question and a short blog post (https://tomaugspurger.github.io/modern-6-visualization.html) that's all I could find.

Basically, any dict-like object ("labeled data" as the docs call it) is passed in the data argument, and plot parameters are specified based on its keys. For example, you can create a structured array with fields a, b, and c

coords = np.random.randn(250, 3).view(dtype=[('a', float), ('b', float), ('c', float)])

You would normally create a plot of a vs b using

pyplot.plot(coords['a'], coords['b'], 'x')

but using the data argument it can be done with

pyplot.plot('a', 'b','x', data=coords)

The label b can be confused with a style string setting the line to blue, but the third argument clears up that ambiguity. It's not limited to x and y data either,

pyplot.scatter(x='a', y='b', c='c', data=coords)

Will set the point color based on column 'c'.

It looks like this feature was added for pandas dataframes, and handles them better than other objects. Additionally, it seems to be poorly documented and somewhat unstable (using x and y keyword arguments fails with the plot command, but works fine with scatter, the error messages are not helpful). That being said, it gives a nice shorthand when the data you want to plot has labels.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your answer. After a year, I guess I've mostly given up on this syntax and I can't say I have missed it much after all. But in any case, I had totally misunderstood the documentation on this one, it does make sense with your examples now.
1

In reference to your example, I think the following does what you want:

plt.scatter(data[:, 0], data[:, 1], **props)

That bit in the docs is confusing to me, and looking at the sources, scatter in axes/_axes.py seems to do nothing with this data argument. Remaining kwargs end up as arguments to a PathCollection, maybe there is a bug there.

You could also set these parameters after scatter with the the various set methods in PathCollection, e.g.:

pc = plt.scatter(data[:, 0], data[:, 1])
pc.set_sizes([500,100,200])

2 Comments

Thanks for your answer. Obviously I could directly pass the arrays as arguments to the function. I'm working on some big code where using the data= argument could greatly simplify my code, which is why I was curious. I also checked the code from scatter(), and traced the use of data to a function update in the class Artist, but even then I cannot figure what it does.
Wouldn't using **props versus what we expect data=props to do be just as simple? I'm assuming you just don't want to spell out each keyword every time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.