scatter plots with string arrays in matplotlib

Question

this seems like it should be an easy one but I can't figure it out. I have a pandas data frame and would like to do a 3D scatter plot with 3 of the columns. The X and Y columns are not numeric, they are strings, but I don't see how this should be a problem.

X= myDataFrame.columnX.values #string
Y= myDataFrame.columnY.values #string
Z= myDataFrame.columnY.values #float

fig = pl.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X, Y, np.log10(Z), s=20, c='b')
pl.show()

isn't there an easy way to do this? Thanks.

How would you plot points without numerical coordinates? I don't see how you could think that would not be a problem. — M4rtini
– M4rtini, Commented Feb 28, 2014 at 12:57
what? create as many bins in the X and Y axes as different strings you find in the X and Y arrays. And for every bin in X and Y, plot the value of Z in the Z axis. It's really not that hard. — elelias
– elelias, Commented Feb 28, 2014 at 13:02
No it's not that hard, but it's a problem hard enough that scatter won't to it automatically for you. And it sounds like you know the solution, did you try to do what you just said? — M4rtini
– M4rtini, Commented Feb 28, 2014 at 13:05
well, I can do some array manipulation and come up with it. But I thought this is something that a lot of people would encounter on a daily basis and thus there'd be a way of doing it automatically. If there isn't...well, that's ok I guess. Maybe I'm spoiled by how good python libraries generally are (and matplotlib is certainly an example) — elelias
– elelias, Commented Feb 28, 2014 at 13:07
Using a combination of enumerate and set\dictionary should easily give you sensible coordinates to unique strings in your list. Matplotlib is good for plotting, not preparing your data for plotting. — M4rtini
– M4rtini, Commented Feb 28, 2014 at 13:26

unutbu · Accepted Answer · 2014-02-28 14:17:52Z

11

You could use np.unique(..., return_inverse=True) to get representative ints for each string. For example,

In [117]: uniques, X = np.unique(['foo', 'baz', 'bar', 'foo', 'baz', 'bar'], return_inverse=True)

In [118]: X
Out[118]: array([2, 1, 0, 2, 1, 0])

Note that X has dtype int32, as np.unique can handle at most 2**31 unique strings.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d.axes3d as axes3d

N = 12
arr = np.arange(N*2).reshape(N,2)
words = np.array(['foo', 'bar', 'baz', 'quux', 'corge'])
df = pd.DataFrame(words[arr % 5], columns=list('XY'))
df['Z'] = np.linspace(1, 1000, N)
Z = np.log10(df['Z'])
Xuniques, X = np.unique(df['X'], return_inverse=True)
Yuniques, Y = np.unique(df['Y'], return_inverse=True)

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='3d')
ax.scatter(X, Y, Z, s=20, c='b')
ax.set(xticks=range(len(Xuniques)), xticklabels=Xuniques,
       yticks=range(len(Yuniques)), yticklabels=Yuniques) 
plt.show()

enter image description here

edited Feb 28, 2014 at 14:17

answered Feb 28, 2014 at 13:51

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Joe Kington Over a year ago

You might want to demonstrate labeling the x and y ticks with their respective strings. E.g. ax.set(xticks=range(len(xuniques)), xticklabels=xuniques, ...) Either way, nice answer!

unutbu Over a year ago

@JoeKington: Thanks! That's much better.

naught101 · Accepted Answer · 2019-07-23 01:29:29Z

3

Scatter does this automatically now (from at least matplotlib 2.1.0):

plt.scatter(['A', 'B', 'B', 'C'], [0, 1, 2, 1])

edited Jul 23, 2019 at 1:29

answered Nov 21, 2017 at 8:10

naught101

19.7k20 gold badges97 silver badges143 bronze badges

6 Comments

astudentofmaths Over a year ago

Doing this, I get ValueError: could not convert string to float: 'A'

naught101 Over a year ago

@Arthurim you'll need to update matplotlib then. Not sure what version is required, but it works in 2.1.0 at least.

Yuan Tao Over a year ago

Python 2.7.12 doesn't work. Also, the graph you showed here doesn't match the code?

naught101 Over a year ago

@YuanTao: Python 3 has been out for more than 10 years. Python 2 is at end of life in 5 months. Upgrade.

Yuan Tao Over a year ago

@naught101 Alright, I just realized you were meaning matplotlib version. Then the figure still not match? Plus: Thank you for the advice. I have both installed but haven't completely moved to Python 3.

|

jmetz · Accepted Answer · 2014-02-28 15:09:29Z

2

Try converting the characters to numbers for the plotting and then use the characters again for the axis labels.

Using hash

You could use the hash function for the conversion;

from mpl_toolkits.mplot3d import Axes3D
xlab = myDataFrame.columnX.values
ylab = myDataFrame.columnY.values

X =[hash(l) for l in xlab] 
Y =[hash(l) for l in xlab] 

Z= myDataFrame.columnY.values #float

fig = figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X, Y, np.log10(Z), s=20, c='b')
ax.set_xticks(X)
ax.set_xticklabels(xlab)
ax.set_yticks(Y)
ax.set_yticklabels(ylab)
show()

As M4rtini has pointed out in the comments, it't not clear what the spacing/scaling of string coordinates should be; the hash function could give unexpected spacings.

Nondegenerate uniform spacing

If you wanted to have the points uniformly spaced then you would have to use a different conversion. For example you could use

X =[i for i in range(len(xlab))]

though that would cause each point to have a unique x-position even if the label is the same, and the x and y points would be correlated if you used the same approach for Y.

Degenerate uniform spacing

A third alternative is to first get the unique members of xlab (using e.g. set) and then map each xlab to a position using the unique set for the mapping; e.g.

xmap = dict((sn, i)for i,sn in enumerate(set(xlab)))
X = [xmap[l] for l in xlab]

edited Feb 28, 2014 at 15:09

answered Feb 28, 2014 at 13:07

jmetz

12.9k3 gold badges32 silver badges41 bronze badges

4 Comments

M4rtini Over a year ago

Using the hash values for coordinates is not really a good idea. The magnitude of those numbers will mess up the scales.

jmetz Over a year ago

@M4rtini: It's not exactly clear what the scales should be for string based coordinates - I don't see there being a clear answer to that issue, while hashes still gives a workable result.

jmetz Over a year ago

@M4rtini - I've added in two alternative schemes that both give uniform scaling; one gives every point a new x/y position, while using e.g. set allows for the same label to map to the same x/y position.

M4rtini Over a year ago

That's about exactly what i had in mind when i wrote my last comment at the question. +1

Collectives™ on Stack Overflow

scatter plots with string arrays in matplotlib

3 Answers 3

2 Comments

6 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

6 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related