Python merge two csv files python

Question

I have 2 CSV files.

File1.csv

    Frame_Nr; Data1; Data2; Labeled
    0          0       1        1
    1          0       0        1
    2          1       1        1
    3          0       0        0
    4          0       0        0
    5          1       0        1
    6          0       0        0
    7          0       0        0
   11          0       1        1
   12          1       1        1

File2.csv

Frame_Nr; Data1; Data2; Labeled
    0          0       0        0
    1          0       0        0
    2          0       0        0
    3          0       0        0
    4          0       0        0
    5          0       0        0
    6          0       0        0
    7          0       0        0
    8          0       0        0
    9          0       0        0
   10          0       0        0

I want the output to look something like this. And should merge file2.csv with file file1.csv and if there are some changes to replace with data from file1.csv else to keep data from file2.csv

Expected output.csv

    Frame_Nr; Data1; Data2; Labeled
    0          0       1        1
    1          0       0        1
    2          1       1        1
    3          0       0        0
    4          0       0        0
    5          1       0        1
    6          0       0        0
    7          0       0        0
    8          0       0        0
    9          0       0        0
   10          0       0        0
   11          0       1        1
   12          1       1        1

My code :

import csv
import os

f = open('file2', 'r')
reader = csv.reader(f, delimiter=';')   
reader = list(reader)
f1 = open('file1', 'r')
reader1 = csv.reader(f1, delimiter=';')
next(reader1)
reader1 = list(reader1)


for line1 in reader1:
    for line in reader:
        if line1[0] != line[0]:
            print(line1)
        else:
            print(line)

There are not valid CSV files: your header use "; " as a separator, the other lines use spaces. — Laurent LAPORTE
– Laurent LAPORTE, Commented Nov 22, 2019 at 17:19

Lukas Thaler · Accepted Answer · 2019-11-22 17:34:57Z

2

Pandas has two very nice functions to help you avoid a nested for loop and make the process more efficient:

import pandas as pd
df1 = pd.read_csv('file1.csv', options='whatever makes your csvs load')
df2 = pd.read_csv('file2.csv', options='whatever makes your csvs load')
df = pd.concat([df1, df2]).drop_duplicates('Frame_Nr')

Optionally, if you want the resulting DataFrame sorted by Frame_Nr, chain a .sort_values('Frame_Nr') to the last line

To explain the code snippet: pd.concat concatenates both DataFrames so that you first have all rows from file 1 and after that all rows from file 2, the drop_duplicates after that removes all rows with duplicate values in Frame_Nr, keeping the first. Since file1 was the first file in the concatenation, all lines from that file are kept and lines from file2 are only retained if they have a frame number that was not in file1. Optionally, the sort_values will sort the DataFrame by the frame number column

edited Nov 22, 2019 at 17:34

answered Nov 22, 2019 at 17:20

Lukas Thaler

2,7205 gold badges19 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

jack Over a year ago

can you please explain me a little bit the code, it works but idk why :))

Lukas Thaler Over a year ago

pd.concat concatenates both DataFrames so that you first have all rows in file 1 and after that all rows from file 2, the drop_duplicates after that removes all rows with duplicate values in Frame_Nr, keeping the first. Since file1 was the first file in the concatenation, all lines from that file are kept and lines from file2 are only retained if they have a frame number that was not in file1

Lukas Thaler Over a year ago

I'll edit the description into my answer for easier reference

jack Over a year ago

one last question :D before my data i have a number equal wth frame nr how i can remove this?

Lukas Thaler Over a year ago

That is Pandas' default index. You can choose index=False when calling df.to_csv() if you don't want the index saved to a file, but as long as your data is in a DataFrame, you'll have to "cope" with it. You may also want to choose to make Frame_Nr your index, in which case you can use df.set_index('Frame_Nr')

seralouk · Accepted Answer · 2019-11-22 17:25:59Z

2

import pandas as pd

df1 = pd.read_csv("file1.csv", delim_whitespace=True)
df2 = pd.read_csv("file2.csv", delim_whitespace=True)

df=pd.concat([df1, df2]).drop_duplicates('Frame_Nr;').sort_values("Frame_Nr;")

answered Nov 22, 2019 at 17:25

seralouk

33.6k10 gold badges127 silver badges141 bronze badges

Collectives™ on Stack Overflow

Python merge two csv files python

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related