2

Can I merge multiple csv files that have the same header in one csv using python? I would like to keep the same column headers but if i merge my files using the terminal it creates a file with repeating headers; so there's a way to do it in Python?

1
  • Does this answer your question? Pandas Merging 101 Commented Sep 17, 2022 at 9:57

2 Answers 2

3

Quickest way is to use csvkit in particular csvstack.

cat csv_1.csv                                                                                                                                                              
id,col1,col2,col3
1,'test','dog','cat'
2,'foo','fish','rabbit'

cat csv_2.csv                                                                                                                                                              
id,col1,col2,col3
3,'bar','owl','crow'
4,'spam','eel','cow'

# To stdout
csvstack csv_1.csv csv_2.csv

id,col1,col2,col3
1,'test','dog','cat'
2,'foo','fish','rabbit'
3,'bar','owl','crow'
4,'spam','eel','cow'



# To new file.
csvstack csv_1.csv csv_2.csv > csv_1_2.csv

Using the global library csv.

import csv

header = None
new_file = []
for f in ('csv_1.csv', 'csv_2.csv'):
    with open(f, newline='') as csv_file:
        reader = csv.reader(csv_file)
        if not header:
            new_file.append(next(reader))
            header = True
        else:
            next(reader)
        for row in reader:
            new_file.append(row)

with open('csv_new_file.csv', 'w', newline='') as csv_out:
    writer = csv.writer(csv_out)
    writer.writerows(new_file)


cat csv_new_file.csv

id,col1,col2,col3
1,'test','dog','cat'
2,'foo','fish','rabbit'
3,'bar','owl','crow'
4,'spam','eel','cow'
Sign up to request clarification or add additional context in comments.

2 Comments

Why reader.__next__() and not just next(reader)? I’m pretty sure the latter is Pythonic and recommend, and the former is discouraged. And, I’m glad to see non-programming suggestions like csvkit! Also, writer.writerows(new_file) might be cleaner, if there’s nothing to do per-row while writing.
@ZachYoung. Updated with suggestions, thanks. To me the point of SO is to solve problems, if the best solution is an existing program then I see no issue in suggesting it.
2

i hope you find this helpful

import pandas as pd

# this two dataframes are just for showcasing the merge
# But can be replaced but any other valid DataFrame
# even more than 2 dataframes, just need to repeat this process

df1 = pd.DataFrame()
df1['col1'] = 1,2,3
# it even supports dataframes with not exactly the same columns
df1['col3'] = 4,4,8

df2 = pd.DataFrame()
df2['col1'] = 4,5,6
df2['col2'] = 1,4,8

# here df3 gets the resulting merged DataFrame
df3 = df1.append(df2)

# As you can see they get merged
# but as the df1 didnt had the "col2" its values are NaN
# Same with df2 and "col3"
>>   test1  col3  col2
0      1   1.0   NaN
1      2   4.0   NaN
2      3   8.0   NaN
0      4   NaN   4.0
1      5   NaN   4.0
2      6   NaN   8.0

# dumps dataframes to "file.csv" in the current folder
df3.to_csv("file.csv")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.