Merge columns with different number of rows based on two first columns in Pandas

Question

I have two different files containing the same number of columns but different lengths, i.e,

file1.txt

1650,A,1,1,1
1650,A,1,1,1
1650,A,1,1,1
1650,B,2,2,2
1650,B,2,2,2
1650,B,2,2,2
1650,B,2,2,2
1650,B,2,2,2

file2.txt

1650,A,3,3,3
1650,A,3,3,3
1650,A,3,3,3
1650,A,3,3,3
1650,A,3,3,3
1650,B,4,4,4
1650,B,4,4,4

I want to concatenate both of them using pandas such that the result is as follows:

1650,A,1,1,1,3,3,3
1650,A,1,1,1,3,3,3
1650,A,1,1,1,3,3,3
1650,A,NaN,NaN,NaN,3,3,3
1650,A,NaN,NaN,NaN,3,3,3
1650,B,2,2,2,4,4,4
1650,B,2,2,2,4,4,4
1650,B,2,2,2,NaN,NaN,NaN
1650,B,2,2,2,NaN,NaN,NaN
1650,B,2,2,2,NaN,NaN,NaN

I use the following codes but it seems it does not work properly:

df1 = read_data('file1')
df2 = read_data('file2')
result = pd.merge_ordered(df1,df2, how='outer', on=['a', 'b'])

How to solve this problem?

jezrael · Accepted Answer · 2021-02-08 06:58:42Z

1

Use GroupBy.cumcount for counter, so possible merge by merge with add column group:

df1['group'] = df1.groupby(['a', 'b']).cumcount()
df2['group'] = df2.groupby(['a', 'b']).cumcount()
result = pd.merge(df1,df2, how='outer', on=['a', 'b', 'group']).drop('group', axis=1)

edited Feb 8, 2021 at 6:58

answered Feb 8, 2021 at 6:51

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Merge columns with different number of rows based on two first columns in Pandas

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related