1

I am trying to display some points using matplotlib, Although I can display them using print command but matplotlib gives error. The command that works is also there(commented).

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

data = np.array([[-1,-1,'C1'],[-2,-1,'C1'],[-3,-2,'C1'],[1,1,'C2'],[2,1,'C2'],[3,2,'C2']])
query=[-2.5,-1.5]

df=pd.DataFrame(data)
df.columns =['x','y','Cat']
df

for i in range(6):
    if(df.ix[i]['Cat'] == 'C1'):
        plt.scatter(df.iloc[i]['x'], df.iloc[i]['y'], s=150, c='r') #error line
         #working linke below
         #print(df.iloc[i]['x'],df.iloc[i]['y'])
    else:
        plt.scatter(df.iloc[i]['x'], df.iloc[i]['y'], s=150, c='b')
        #working line below
        #print(df.iloc[i]['x'],df.iloc[i]['y'])

Please help. Thanks in advance

Thanks @Haleemur Ali for your help I am able to run it now but still not fully functional. Not all points are showing not sure why?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

data = np.array([[-1,-1,'r'],[-2,-1,'r'],[-3,-2,'r'],[1,1,'b'],[2,1,'b'],[3,2,'b'],[-2.5,-1.5,'y']])
query=[-2.5,-1.5]

df=pd.DataFrame(data)
df.columns =['x','y','Cat']
print(df)

plt.scatter(df.x, df.y, s=150, c=df.Cat)

Graph generated

enter image description here

2
  • 1
    please post the full error message Commented Mar 2, 2018 at 15:14
  • In contrast to builtin list numpy arrays can only hold data of the same type. In your case data is cast to a string type (<u21) because of the strings on the second index. Matplotlib does not know how to plot non-numeric data; that's the problem here. Commented Mar 2, 2018 at 17:25

2 Answers 2

2

If the numbers are strings, they are not recognized as numbers, hence they are plotted as categories, just as you would expect if you plotted ["apple", "banana", "cherry"]. You would need to convert your data to floats:

df[['x', 'y']] = df[['x', 'y']].astype(float)

Complete code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = np.array([[-1,-1,'r'],[-2,-1,'r'],[-3,-2,'r'],[1,1,'b'],
                 [2,1,'b'],[3,2,'b'],[-2.5,-1.5,'y']])

df=pd.DataFrame(data, columns=['x','y','Cat'])
df[['x', 'y']] = df[['x', 'y']].astype(float)

plt.scatter(df.x, df.y, s=150, c=df.Cat)

plt.show()

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

this is true! What I find interesting is that matplotlib correctly inferred that the series passed to scatter were actually numeric, as you can see in my answer, but clearly not the case for the OP. BTW, I'm using the following versions on Windows Anaconda pandas=0.20.3, matplotlib=2.0.2. What versions are you using?
In matplotlib < 2.1, strings are converted to floats if possible, else an error is raised. In matplotlib >= 2.1 strings are interpreted as categorical values.
I am using 2.1.0 version of matplotlib. That clears the confusion. But thing is even if matplotlib interpreted values as category why it did not displayed all the data. Thanks – ImportanceOfBeingErnest & – Haleemur Ali.
1

Scatter plots aren't built by iterating through the data.

You can build the scatter plot for a particular category, like this:

plt.scatter(df.x[df.Cat=='C1'], df.y[df.Cat=='C1'], s=150, c='r')

scatter plot for 1 category

You can also create a scatter plot where each category gets a distinct colour

plt.scatter(df.x, df.y, s=150, c=df.Cat)

scatter plot for all categories, where category determines point colour

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.