10

this seems like it should be an easy one but I can't figure it out. I have a pandas data frame and would like to do a 3D scatter plot with 3 of the columns. The X and Y columns are not numeric, they are strings, but I don't see how this should be a problem.

X= myDataFrame.columnX.values #string
Y= myDataFrame.columnY.values #string
Z= myDataFrame.columnY.values #float

fig = pl.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X, Y, np.log10(Z), s=20, c='b')
pl.show()

isn't there an easy way to do this? Thanks.

5
  • 2
    How would you plot points without numerical coordinates? I don't see how you could think that would not be a problem. Commented Feb 28, 2014 at 12:57
  • what? create as many bins in the X and Y axes as different strings you find in the X and Y arrays. And for every bin in X and Y, plot the value of Z in the Z axis. It's really not that hard. Commented Feb 28, 2014 at 13:02
  • 2
    No it's not that hard, but it's a problem hard enough that scatter won't to it automatically for you. And it sounds like you know the solution, did you try to do what you just said? Commented Feb 28, 2014 at 13:05
  • well, I can do some array manipulation and come up with it. But I thought this is something that a lot of people would encounter on a daily basis and thus there'd be a way of doing it automatically. If there isn't...well, that's ok I guess. Maybe I'm spoiled by how good python libraries generally are (and matplotlib is certainly an example) Commented Feb 28, 2014 at 13:07
  • Using a combination of enumerate and set\dictionary should easily give you sensible coordinates to unique strings in your list. Matplotlib is good for plotting, not preparing your data for plotting. Commented Feb 28, 2014 at 13:26

3 Answers 3

11

You could use np.unique(..., return_inverse=True) to get representative ints for each string. For example,

In [117]: uniques, X = np.unique(['foo', 'baz', 'bar', 'foo', 'baz', 'bar'], return_inverse=True)

In [118]: X
Out[118]: array([2, 1, 0, 2, 1, 0])

Note that X has dtype int32, as np.unique can handle at most 2**31 unique strings.


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d.axes3d as axes3d

N = 12
arr = np.arange(N*2).reshape(N,2)
words = np.array(['foo', 'bar', 'baz', 'quux', 'corge'])
df = pd.DataFrame(words[arr % 5], columns=list('XY'))
df['Z'] = np.linspace(1, 1000, N)
Z = np.log10(df['Z'])
Xuniques, X = np.unique(df['X'], return_inverse=True)
Yuniques, Y = np.unique(df['Y'], return_inverse=True)

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='3d')
ax.scatter(X, Y, Z, s=20, c='b')
ax.set(xticks=range(len(Xuniques)), xticklabels=Xuniques,
       yticks=range(len(Yuniques)), yticklabels=Yuniques) 
plt.show()

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

You might want to demonstrate labeling the x and y ticks with their respective strings. E.g. ax.set(xticks=range(len(xuniques)), xticklabels=xuniques, ...) Either way, nice answer!
@JoeKington: Thanks! That's much better.
3

Scatter does this automatically now (from at least matplotlib 2.1.0):

plt.scatter(['A', 'B', 'B', 'C'], [0, 1, 2, 1])   

scatter plot

6 Comments

Doing this, I get ValueError: could not convert string to float: 'A'
@Arthurim you'll need to update matplotlib then. Not sure what version is required, but it works in 2.1.0 at least.
Python 2.7.12 doesn't work. Also, the graph you showed here doesn't match the code?
@YuanTao: Python 3 has been out for more than 10 years. Python 2 is at end of life in 5 months. Upgrade.
@naught101 Alright, I just realized you were meaning matplotlib version. Then the figure still not match? Plus: Thank you for the advice. I have both installed but haven't completely moved to Python 3.
|
2

Try converting the characters to numbers for the plotting and then use the characters again for the axis labels.

Using hash

You could use the hash function for the conversion;

from mpl_toolkits.mplot3d import Axes3D
xlab = myDataFrame.columnX.values
ylab = myDataFrame.columnY.values

X =[hash(l) for l in xlab] 
Y =[hash(l) for l in xlab] 

Z= myDataFrame.columnY.values #float

fig = figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X, Y, np.log10(Z), s=20, c='b')
ax.set_xticks(X)
ax.set_xticklabels(xlab)
ax.set_yticks(Y)
ax.set_yticklabels(ylab)
show()

As M4rtini has pointed out in the comments, it't not clear what the spacing/scaling of string coordinates should be; the hash function could give unexpected spacings.

Nondegenerate uniform spacing

If you wanted to have the points uniformly spaced then you would have to use a different conversion. For example you could use

X =[i for i in range(len(xlab))]

though that would cause each point to have a unique x-position even if the label is the same, and the x and y points would be correlated if you used the same approach for Y.

Degenerate uniform spacing

A third alternative is to first get the unique members of xlab (using e.g. set) and then map each xlab to a position using the unique set for the mapping; e.g.

xmap = dict((sn, i)for i,sn in enumerate(set(xlab)))
X = [xmap[l] for l in xlab]

4 Comments

Using the hash values for coordinates is not really a good idea. The magnitude of those numbers will mess up the scales.
@M4rtini: It's not exactly clear what the scales should be for string based coordinates - I don't see there being a clear answer to that issue, while hashes still gives a workable result.
@M4rtini - I've added in two alternative schemes that both give uniform scaling; one gives every point a new x/y position, while using e.g. set allows for the same label to map to the same x/y position.
That's about exactly what i had in mind when i wrote my last comment at the question. +1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.