1

i have csv data

index   username
1       ailee
2       yura
3       sony
4       lily
5       alex
6       eunji
7       hyun
8       jingo
9       kim
10      min

and dataframe result of cluster :

index   cluster
1        1
3        1
5        1
7        1
8        1
9        2
4        2
2        2
10       2
6        2

it it possible to add a username column in pd.dataframe based on csv data?

2 Answers 2

1

I am using 'DataFrame.merge' for this. Here is the code

>>> import StringIO as sio
>>> import pandas as pd

>>> s1='''index   username
1       ailee
2       yura
3       sony
4       lily
5       alex
6       eunji
7       hyun
8       jingo
9       kim
10      min'''
>>> s2 = '''index   cluster
1        1
3        1
5        1
7        1
8        1
9        2
4        2
2        2
10       2
6        2'''

>>> df1=pd.read_csv(sio.StringIO(s1), index_col=0, delim_whitespace=True)
>>> df2=pd.read_csv(sio.StringIO(s2), index_col=0, delim_whitespace=True)

>>> df1
      username
index
1        ailee
2         yura
3         sony
4         lily
5         alex
6        eunji
7         hyun
8        jingo
9          kim
10         min
>>> df2
       cluster
index
1            1
3            1
5            1
7            1
8            1
9            2
4            2
2            2
10           2
6            2

>>> df1.merge(df2, left_index=True, right_index=True)
      username  cluster
index
1        ailee        1
3         sony        1
5         alex        1
7         hyun        1
8        jingo        1
9          kim        2
4         lily        2
2         yura        2
10         min        2
6        eunji        2
Sign up to request clarification or add additional context in comments.

Comments

0

You can use join:

print (df2.join(df1))
       cluster username
index                  
1            1    ailee
3            1     sony
5            1     alex
7            1     hyun
8            1    jingo
9            2      kim
4            2     lily
2            2     yura
10           2      min
6            2    eunji

Or map:

#map by column cluster
df2['username'] = df2.cluster.map(df1.username)
#map by index
df2['username1'] = df2.index.to_series().map(df1.username)
print (df2)
       cluster username username1
index                            
1            1    ailee     ailee
3            1    ailee      sony
5            1    ailee      alex
7            1    ailee      hyun
8            1    ailee     jingo
9            2     yura       kim
4            2     yura      lily
2            2     yura      yura
10           2     yura       min
6            2     yura     eunji

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.