How to add column of dataframe from csv data

Question

i have csv data

index   username
1       ailee
2       yura
3       sony
4       lily
5       alex
6       eunji
7       hyun
8       jingo
9       kim
10      min

and dataframe result of cluster :

index   cluster
1        1
3        1
5        1
7        1
8        1
9        2
4        2
2        2
10       2
6        2

it it possible to add a username column in pd.dataframe based on csv data?

seven7e · Accepted Answer · 2016-11-25 01:39:05Z

I am using 'DataFrame.merge' for this. Here is the code

>>> import StringIO as sio
>>> import pandas as pd

>>> s1='''index   username
1       ailee
2       yura
3       sony
4       lily
5       alex
6       eunji
7       hyun
8       jingo
9       kim
10      min'''
>>> s2 = '''index   cluster
1        1
3        1
5        1
7        1
8        1
9        2
4        2
2        2
10       2
6        2'''

>>> df1=pd.read_csv(sio.StringIO(s1), index_col=0, delim_whitespace=True)
>>> df2=pd.read_csv(sio.StringIO(s2), index_col=0, delim_whitespace=True)

>>> df1
      username
index
1        ailee
2         yura
3         sony
4         lily
5         alex
6        eunji
7         hyun
8        jingo
9          kim
10         min
>>> df2
       cluster
index
1            1
3            1
5            1
7            1
8            1
9            2
4            2
2            2
10           2
6            2

>>> df1.merge(df2, left_index=True, right_index=True)
      username  cluster
index
1        ailee        1
3         sony        1
5         alex        1
7         hyun        1
8        jingo        1
9          kim        2
4         lily        2
2         yura        2
10         min        2
6        eunji        2

jezrael · Accepted Answer · 2016-11-25 07:36:39Z

You can use join:

print (df2.join(df1))
       cluster username
index                  
1            1    ailee
3            1     sony
5            1     alex
7            1     hyun
8            1    jingo
9            2      kim
4            2     lily
2            2     yura
10           2      min
6            2    eunji

Or map:

#map by column cluster
df2['username'] = df2.cluster.map(df1.username)
#map by index
df2['username1'] = df2.index.to_series().map(df1.username)
print (df2)
       cluster username username1
index                            
1            1    ailee     ailee
3            1    ailee      sony
5            1    ailee      alex
7            1    ailee      hyun
8            1    ailee     jingo
9            2     yura       kim
4            2     yura      lily
2            2     yura      yura
10           2     yura       min
6            2     yura     eunji

Collectives™ on Stack Overflow

How to add column of dataframe from csv data

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related