Combining text files with similar values into one file using python

Question

I have searched the site but can't find anything exactly similar to what im trying to accomplish. I have 2 text files that I want to merge into 1 file based on the first row in each file (lets call this row x). For example, if x exists in file1 and file2 then I want to take x and display the proceeding info from file1 and file2 on its line. Note, file1 contains a header. Below is a preview of how each file reads:

File 1:

X, DES1, DES2, DES3, NUMBERS
123, text, text, text, 456
321, text, text, text, 43222
124, text, text, text, 3254
125, text, text, text, 2352634
279, text, text, text, 3243
567, text, text, text, 00001
345, text, text, text, 02

File 2:

123, 152352364
124, 32535
125, 745734
345, 4000

And so on. Each element(or x) in file2 exists in file1. However, file1 contains other values for x that are not in file2. Can I still combine the data from the two files together in a new file? Below is what I tried but I get a KeyError on my print statement. Im sure the code is very wrong, FYI.

f1 = {}
with open ("file1.txt") as my1:
    for line in my1.readlines():
        f1[line.split(",")[0]] = line.strip().split(",")[1:]

f2={}
with open ("file2.txt") as my2:
    for line in f.readlines():
        f2[line.split(",")[0]] = line.strip().split(",")[1:]

for key in f1.keys():
    print(key, str.join(",",f1[key]), str.join(",",f2[key]))

Any help would be appreciated. I understand i will likely have to heavily rework or scrap what I have so far. My expected output would look as follows:

X, DES1, DES2, DES3, NUMBERS, NEWNUMB        
123, text, text, text, 456, 152352364    
321, text, text, text, 43222, 0    
124, text, text, text, 3254, 32535    
125, text, text, text, 2352634, 745743    
279, text, text, text, 3243, 0    
567, text, text, text, 00001, 0    
345, text, text, text, 02, 4000

Valid question, indeed. I have updated my original post to help further clarify what I am trying to end up with. Thanks for your reply! — user13764245
– user13764245, Commented Jun 17, 2020 at 18:09

Diptangsu Goswami · Accepted Answer · 2020-06-17 19:46:33Z

1

You are not skipping the header line from file1.txt

f1 = {}
with open ("file1.txt") as file1:
    next(file1)  # skip the header (first line)
    for line in file1:  # for loop iterates over lines by default
        f1[line.split(",")[0]] = line.strip().split(",")[1:]

f2 = {}
with open ("file2.txt") as file2:
    for line in file2:
        f2[line.split(",")[0]] = line.strip().split(",")[1:]


# generate the contents of the new file
lines = [
    ['X', 'DES1', 'DES2', 'DES3', 'NUMBERS', 'NEWNUMB']  # headings
]
for key, value in f1.items():
    # get will return the second argument if the key doesn't exist
    new_num = f2.get(key, ['0'])
    # unpack the values into a new list and append it to lines
    lines.append([key, *value, *new_num])

for line in lines:
    print(','.join(line))

You need to make more necessary changes to your code. You should play around with it and try to do it yourself. I have simply fixed the error.

disciple@diptangsu:~/Desktop/sample$ cat file1.txt 
X, DES1, DES2, DES3, NUMBERS
123, text, text, text, 456
321, text, text, text, 43222
124, text, text, text, 3254
125, text, text, text, 2352634
279, text, text, text, 3243
567, text, text, text, 00001
345, text, text, text, 02
disciple@diptangsu:~/Desktop/sample$ cat file2.txt 
123, 152352364
124, 32535
125, 745734
345, 4000 
disciple@diptangsu:~/Desktop/sample$ python3 code.py 
X,DES1,DES2,DES3,NUMBERS,NEWNUMB
123, text, text, text, 456, 152352364
321, text, text, text, 43222,0
124, text, text, text, 3254, 32535
125, text, text, text, 2352634, 745734
279, text, text, text, 3243,0
567, text, text, text, 00001,0
345, text, text, text, 02, 4000

If you don't know what next is, I suggest you read about generators in python.

edited Jun 17, 2020 at 19:46

answered Jun 17, 2020 at 17:43

Diptangsu Goswami

6,0353 gold badges28 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

user13764245 Over a year ago

Hm this seems to work better but I still get a KeyError "114". I think the "114" is coming form file 1 and is populating because "114" is not found in file2.

Diptangsu Goswami Over a year ago

Then please post your actual data so I can edit my answer. And also please post your expected output.

Diptangsu Goswami Over a year ago

Oh, you've edited the question. Let me look into it.

user13764245 Over a year ago

Yes, I just updated the original post again per your request so you can see the full data in both files! Again, I am getting stuck on how to get values that are in file1 but not file2 to populate correctly. If its easier, I can assign a value of 0 to NEWNUMB if the element is in file1 but not file2. Hopefully I am making sense.

Diptangsu Goswami Over a year ago

Give me a few minutes.

|

Collectives™ on Stack Overflow

Combining text files with similar values into one file using python

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related