1

Let's say I have the following dataframe, for each month separately I have a bunch of data, stores in arrays for three variables :

ID         Y        X1        X2  month
0   [2,4,6,8] [2,4,6,8] [2,4,6,8]    01
1   [Nan,4,6,8] [1,3,5,4] [4,3,3,3]  02
2   [3,4,5,6] [1,9,7,7] [2,2,6,Nan]  03
3   [1,2,3,4] [5,6,7,8] [9,9,Nan,6]  04
4   [2,4,6,8] [2,4,6,8] [2,4,6,8]    05


What I ultimately want to do is to make a scatterplot between Y and X1 for month 01 with markers in darkblue, for month two in lightblue, and so on. Maybe I also want the scatterplot for Y and X2 in different shades of red as well in the same plot..
I tried this one:

df.iloc[0:1].plot.scatter(x = 'X1', y='Y')

but get the message that there are no numeric objects to plot...
Are the Nan values a problem???

Any ideas?! Thanks a lot for helping!

1 Answer 1

1

You need to change the structure of your data frame:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np


data =  {"ID":[0,1,2,3,4],
         "Y":[np.array([2,4,6,8]), 
              np.array([np.nan,4,6,8]),
              np.array([3,4,5,6]), 
              np.array([1,2,3,4]), 
              np.array([2,4,6,8])],
        "X1":[np.array([2,4,6,8]), 
              np.array([1,2,5,4]),
              np.array([1,9,7,7]), 
              np.array([5,6,7,8]), 
              np.array([2,4,6,8])],
        "X2":[np.array([2,4,6,8]), 
              np.array([4,3,3,3]),
              np.array([2,2,6,np.nan]), 
              np.array([9,9,np.nan,6]), 
              np.array([2,4,6,8])],
        "month":[1,2,3,4,5]
}


df = pd.DataFrame(data)

check = 0
for v in range(len(df["Y"])):
    val_y = df["Y"][v]
    val_x1 = df["X1"][v]
    val_x2 = df["X2"][v]
    ID = df["ID"][v]
    month = df["month"][v]

    if check == 0:
        helper_dat = {"ID":ID,"Y":list(val_y),"X1":list(val_x1),"X2":list(val_x2),"month":month}
        new_df = pd.DataFrame(helper_dat)
    else:
        helper_dat = {"ID":ID,"Y":list(val_y),"X1":list(val_x1),"X2":list(val_x2),"month":month}
        helper = pd.DataFrame(helper_dat)
        new_df = new_df.append(helper,ignore_index=True)   
    check += 1

new_df now looks like this:

    ID    Y  X1   X2  month
0    0  2.0   2  2.0      1
1    0  4.0   4  4.0      1
2    0  6.0   6  6.0      1
3    0  8.0   8  8.0      1
4    1  NaN   1  4.0      2
5    1  4.0   2  3.0      2
6    1  6.0   5  3.0      2
7    1  8.0   4  3.0      2
8    2  3.0   1  2.0      3
9    2  4.0   9  2.0      3
10   2  5.0   7  6.0      3
11   2  6.0   7  NaN      3
12   3  1.0   5  9.0      4
13   3  2.0   6  9.0      4
14   3  3.0   7  NaN      4
15   3  4.0   8  6.0      4
16   4  2.0   2  2.0      5
17   4  4.0   4  4.0      5
18   4  6.0   6  6.0      5
19   4  8.0   8  8.0      5

now it is easy to plot the values:

plt.scatter(new_df["X1"],new_df["Y"],c=new_df["month"], marker='^',label="X1")
plt.scatter(new_df["X2"],new_df["Y"],c=new_df["month"], marker='o',label="X2")
plt.legend()

enter image description here

Edit: If you want to plot only one specific month:

plt.scatter(new_df[**new_df["month"]==4]["X1"]**,new_df[new_df["month"]==4]["Y"], marker='^',label="X1")
plt.scatter(new_df[new_df["month"]==4]["X2"],new_df[new_df["month"]==4]["Y"], marker='o',label="X2")

Found a way based on this Answer:

sc = plt.scatter(new_df["X1"],new_df["Y"],c=new_df["month"], marker='^',label="X1")
plt.scatter(new_df["X2"],new_df["Y"],c=new_df["month"], marker='o',label="X2")
lp = lambda i: plt.plot([],color=sc.cmap(sc.norm(i)),
                        label="Month {:g}".format(i))[0]
handles = [lp(i) for i in np.unique(new_df["month"])]
plt.legend(handles=handles,bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

Is there a possibility to get the colours in the legend? That I know which month is represented by which colour? I tried new_df.groupby('month').plot.scatter(x='X1', y='Y',c=new_df["month"], ax=ax, legend=True) but that doesn't work?! And how do I plot only month 4 values for example?
I edited my post. I however do not know how to create the additional Legend you need. I hope it helped anyway.
Yes, a lot! Thanks!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.