1

Im trying to merge two csv's based on a condition. The Value 'KEYS' on csv2 has to match the 'TCNUM' on CSV1, and append it the third column. The csv's are very large and it has to be done through code.

df1 - CSV1:

ID                                       TC_NUM
dialog_testcase_0101.0001_greeting.xml  101.0001
dialog_testcase_0101.0002_greeting.xml  101.0002
dialog_testcase_0101.0003_greeting.xml  101.0003
dialog_testcase_0101.0004_greeting.xml  101.0004
dialog_testcase_0101.0005_greeting.xml  101.0005
dialog_testcase_0101.0006_greeting.xml  101.0006
dialog_testcase_0901.0008_greeting.xml  901.0007
dialog_testcase_0101.0008_greeting.xml  101.0008
dialog_testcase_0501.001_greeting.xml   501.001
dialog_testcase_0801.0011_greeting.xml  801.0011

df2 - CSV2:

KEYS             TC_NUM
FIT-3982    TC 101.0011, 101.0004
FIT-3980    TC 801.0011.901.007
FIT-3979    TC 101.0006, 501.001, 1907.0019, 1907.0020, 1907.0021

What I want:

csvFinal:

ID                                       TC_NUM        Keys
dialog_testcase_0101.0001_greeting.xml  101.0011     FIT-3982  
dialog_testcase_0101.0002_greeting.xml  101.0002       
dialog_testcase_0101.0003_greeting.xml  101.0006     FIT_3979
dialog_testcase_0101.0004_greeting.xml  101.0004     FIT-3982
dialog_testcase_0101.0005_greeting.xml  101.0005
dialog_testcase_0101.0006_greeting.xml  101.0011     FIT_3982
dialog_testcase_0901.0008_greeting.xml  901.0007     FIT_3979
dialog_testcase_0101.0008_greeting.xml  101.0008
dialog_testcase_0501.001_greeting.xml   501.001      FIT-3979
dialog_testcase_0801.0011_greeting.xml  801.0011     FIT-3980

My code ..

mergedOpen = pd.merge(df1, df2, on=['TC_NUM'])
mergedOpen.set_index('TC_NUM', inplace=True)

mergedOpen.to_csv('MergedCSVOPEN.csv')

1 Answer 1

1

You can after set_index remove first 3 char from column TC_NUM, split by , with unstack and reset_index create new DataFrame for merge. Both columns TC_NUM have to be set to equal dtype - string or numeric. I choose numeric, so I convert column df2.TC_NUM to_numeric:

df2.set_index('KEYS',inplace=True)

df2 = df2.TC_NUM.str[3:]
                .str.split(', ', expand=True)
                .unstack()
                .reset_index(drop=True, level=0)
                .reset_index(name='TC_NUM')

df2['TC_NUM'] = pd.to_numeric(df2['TC_NUM'])
print (df2)
        KEYS     TC_NUM
0   FIT-3982   101.0011
1   FIT-3980   801.0011
2   FIT-3979   101.0006
3   FIT-3982   101.0004
4   FIT-3980   901.0070
5   FIT-3979   501.0010
6   FIT-3982        NaN
7   FIT-3980        NaN
8   FIT-3979  1907.0019
9   FIT-3982        NaN
10  FIT-3980        NaN
11  FIT-3979  1907.0020
12  FIT-3982        NaN
13  FIT-3980        NaN
14  FIT-3979  1907.0021
mergedOpen = pd.merge(df1, df2, on='TC_NUM', how='left')
print (mergedOpen)
                                       ID    TC_NUM      KEYS
0  dialog_testcase_0101.0001_greeting.xml  101.0001       NaN
1  dialog_testcase_0101.0002_greeting.xml  101.0002       NaN
2  dialog_testcase_0101.0003_greeting.xml  101.0003       NaN
3  dialog_testcase_0101.0004_greeting.xml  101.0004  FIT-3982
4  dialog_testcase_0101.0005_greeting.xml  101.0005       NaN
5  dialog_testcase_0101.0006_greeting.xml  101.0006  FIT-3979
6  dialog_testcase_0901.0008_greeting.xml  901.0007       NaN
7  dialog_testcase_0101.0008_greeting.xml  101.0008       NaN
8   dialog_testcase_0501.001_greeting.xml  501.0010  FIT-3979
9  dialog_testcase_0801.0011_greeting.xml  801.0011  FIT-3980

mergedOpen.set_index('TC_NUM', inplace=True)
mergedOpen.to_csv('MergedCSVOPEN.csv')
Sign up to request clarification or add additional context in comments.

1 Comment

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.