2

I have a dataframe, df1 as shown below:

Observed PeakFlow (cfs)      Modelled Peak Flow (cfs)
     9.78768                       10.93963
     1.999368                      2.037152
     11.63652                      8.541796
     3.237471                      3.970588
     54.04929                      22.94427
     4.68197                       3.139319
     16.41346                      12.17337
     14.97399                      7.224458
     2.114172                      5.775542
     22.80021                      22.69659
     25.3347                       13.0805
     33.4092                       11.3452
     13.81051                      7.640867
     6.794793                      4.26161
     9.008561                      6.634675
     5.957804                      4.176471
     2.337406                      2.071208
     32.6419                       4.368421
     3.567871                      2.894737
     5.776844                       3.0387
     39.54993                      5.849845
     4.511765                       2.28483
     6.989101                      3.218266
     14.63979                      9.024768

I also have another dataframe, df2 as shown below:

        1-1 Match    |  -15% Peak Flow  |   +25% Peak Flow
      ----------------------------------------------------- 
      X-Axis| Y-Axis |  X-Axis| Y-Axis  |   X-Axis| Y-Axis
      -----------------------------------------------------
          0 |  0     |     0  |   0     |      0  |   0
        200 | 200    |    200 |  170    |     200 |  250

I would like to have a scatterplot of these 2 dataframes. Desired output is as shown in the image below. How can it be possibly done?

enter image description here

When i load df2 as csv i'm getting like shown in image below. How can i remove the unnamed part and have it as a merged column as shown in code?

enter image description here

1 Answer 1

1

You can use:

print (df2)
 1-1 Match        -15% Peak Flow        +25% Peak Flow       
     X-Axis Y-Axis         X-Axis Y-Axis         X-Axis Y-Axis
0         0      0              0      0              0      0
1       200    200            200    170            200    250

print (df2.columns)
MultiIndex(levels=[['+25% Peak Flow', '-15% Peak Flow', '1-1 Match'], ['X-Axis', 'Y-Axis']],
           labels=[[2, 2, 1, 1, 0, 0], [0, 1, 0, 1, 0, 1]])

ax = df.plot.scatter(x='Modelled Peak Flow (cfs)', y='Observed PeakFlow (cfs)', s=50)

for i, df3 in df2.groupby(level=0, axis=1):
    df3 = df3.set_index([(i, 'X-Axis')])
    df3.index.name = None
    df3.columns = [i]
#    print (df3)
    df3.plot(ax=ax)

graph

If need customized colors and markers:

ax = df.plot.scatter(x='Modelled Peak Flow (cfs)', 
                     y='Observed PeakFlow (cfs)', 
                     s=50, 
                     marker='d', 
                     color='r')

df21 = df2.xs('1-1 Match', axis=1).set_index('X-Axis')
df21.index.name = None
df21.columns = ['1-1 Match']
df21.plot(c='black', ax=ax)

df22 = df2.xs('-15% Peak Flow', axis=1).set_index('X-Axis')
df22.index.name = None
df22.columns = ['-15% Peak Flow']
df22.plot(c='blue',ls='--', ax=ax)

df23 = df2.xs('+25% Peak Flow', axis=1).set_index('X-Axis')
df23.index.name = None
df23.columns = ['+25% Peak Flow']
df23.plot(c='blue',ls='--', ax=ax)

graphs

EDIT1:

MultiIndex is problematic, so need:

df2 = df2.read_csv('file', header=[0,1])

print (df2)
  1-1 Match Unnamed: 1_level_0 -15% Peak Flow Unnamed: 3_level_0  \
     X-Axis             Y-Axis         X-Axis             Y-Axis   
0         0                  0              0                  0   
1       200                200            200                170   

  +25% Peak Flow Unnamed: 5_level_0  
          X-Axis             Y-Axis  
0              0                  0  
1            200                250 
cols = df2.columns.get_level_values(0)
cols = cols.where(~cols.str.contains('Unnamed')).to_series().ffill().tolist()
df2.columns = [cols, df2.columns.get_level_values(1)]
df2 = df2.sort_index(level=0, axis=1)
print (df2)
  +25% Peak Flow        -15% Peak Flow        1-1 Match       
          X-Axis Y-Axis         X-Axis Y-Axis    X-Axis Y-Axis
0              0      0              0      0         0      0
1            200    250            200    170       200    200

print (df2.columns)
MultiIndex(levels=[['+25% Peak Flow', '-15% Peak Flow', '1-1 Match'], 
                   ['X-Axis', 'Y-Axis']],
           labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])
Sign up to request clarification or add additional context in comments.

8 Comments

Thanks Jezrael. Please check my question edits. Because of that i'm getting 'Multiindex not defined' error as well.
Sorry again this error: 'Index' object has no attribute 'where'
But this work cols = cols.to_series().where(~cols.str.contains('Unnamed')).ffill().tolist()
I'm using ipython notebook. '0.18.1' version i just checked.
It was implemented in 0.19.0 version - pandas.pydata.org/pandas-docs/stable/…
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.