1

I am trying to merge two csv files with a common column and write it to a new file. For example product.csv table will have columns

      product_id     name        
       1           Handwash      
       2           Soap          

and subproduct.csv will have columns

      product_id subproduct_name volume
       1           Dettol         20
       1           Lifebuoy      50
       2           Lux           100

The output sales.csv file should be like:

  product_id        name      subproduct_name     volume 
       1           Handwash      Dettol            20   
       1           Handwash      Lifebuoy          50
       2           Soap           Lux             100 

I have tried to create two dictionaries:

with open('product.csv', 'r') as f:
r = csv.reader(f)

dict1 = {row[0]: row[1:] for row in r}

with open('subproduct.csv', 'r') as f:
r = csv.reader(f)

dict2 = {row[0]: row[1:] for row in r}
1
  • 1
    What have you tried so far? Please always post code with your questions so we can make suggestions/edits to what you are doing and not writing your project from scratch. Commented May 10, 2020 at 6:35

6 Answers 6

2

Other have proposed ways using pandas. You should considere it if your files are big, or if you need to do this operation quite often. But the csv module is enough here.

You cannot use plain dicts here because the keys are not unique: subproduct.csv has 2 different rows with the same id 1. So I would use dicts of lists instead.

I will admit here that all keys have to be present in product.csv, but some product may have no associated subproducts (meaning a left outer join in database wordings).

So I will use:

  • a dict for product.csv because I assume that product_id are unique per product
  • a defaultdict of lists for subproduct.csv because a single product may have many subproducts
  • the list of ids from product.csv to build the final file
  • a default empty list for subproduct.csv if a product had no subproducts
  • and process headers separately

Code could be:

with open('product.csv') as f:
    r = csv.reader(f)
    header1 = next(r)
    dict1 = {row[0]: row[1:] for row in r}
dict2 = collections.defaultdict(list)
with open('subproduct.csv', 'r') as f:
    r = csv.reader(f)
    header2 = next(r)
    for row in r:
        dict2[row[0]].append(row[1:])

with open('merged.csv', 'w', newline='') as f:
    w = csv.writer(f)
    _ = w.writerow(header1 + header2[1:])
    empty2 = [[] * (len(header2) - 1)]
    for k in sorted(dict1.keys()):
        for row2 in dict2.get(k, empty2):          # accept no subproducts
            _ = w.writerow([k] + dict1[k] + row2)

Assuming that your csv files are truely Comma Separated Values files, this gives:

product_id,name,subproduct_name,volume
1,Handwash,Dettol,20
1,Handwash,Lifebuoy,50
2,Soap,Lux,100
Sign up to request clarification or add additional context in comments.

Comments

2

Use pandas:

import pandas as pd

products_df = pd.read_csv('product.csv')
subproducts_df = pd.read_csv('subproduct.csv')

sales_df = pd.merge(products_df, subproducts_df, on=0)

Comments

1

Merging with Pandas

Stage 1: First Pip install pandas if you haven't done that

Stage 2: Creating the data

data1 = {'product_id': [1, 2], 
         'name': ['Handwash', 'Soap'], 
              }
data2  {'product_id': [1, 1, 2], 
'subproduct_name': ['Dettol', 'Lifebuoy', 'Lux'], 'volume' : [20, 50, 100]} 

Stage 3: Putting it into dataframe

df1 = pd.DataFrame(data1) 
df2 = pd.DataFrame(data2))

Stage 4: Merging the dataframes

output = pd.merge(df1, df2, how="inner")

Merging with Pandas with CSV

df1=pd.read_csv('product.csv')
df2=pd.read_csv('subproduct.csv')

Do Stage 4

Comments

1

You can work a script with pure python. It has a powerful lib called csv, that should do the trick

import csv

with open('product.csv') as csv_produto:
    with open('subproduct.csv') as csv_subproduct:
        produto_reader = list(csv.reader(csv_produto, delimiter=','))
        subproduct_reader = list(csv.reader(csv_subproduct, delimiter=','))
        for p in produto_reader:
            for sp in subproduct_reader:
                if(p[0]==sp[0]):
                    print('{},{},{},{}'.format(p[0], p[1], sp[1], sp[2]))

That's the main idea, now you can save the output in csv and add a header handling exceptions.

3 Comments

You can only browse once a csv reader. You should at least use: subproduct_reader = list(csv.reader(csv_subproduct, delimiter=',')) to save the result in a list and process it once per row from the first file.
I'm browsing only once, inside the loops.
for p in produto_reader: for sp in subproduct_reader: you try to browse subproduct_reader for each row from produto_reader
0

You can read the data straight into a pandas dataframes, and then merge the two dataframes:

import pandas as pd

# load data
product = pd.read_csv('product.csv')
subproduct = pd.read_csv('subproduct.csv')

# merge data
merged = pd.merge(product,subproduct)

# write results to csv
merged.to_csv('sales.csv',index=False)

This works perfectly for your example. Depending on how your actual data looks like, you might need to tweak some of the additional arguments of pd.merge.

Edit: added the write to csv part

Comments

0

Please try this:

import pandas as pd

output = pd.merge(product, sub_product, how = 'outer', left_on= 'product_id', right_on = 'product_id')

It's joining two data frames (product and sub_product) by product_id column which is common for both. The outer join returns all records that match the key on both the data frames. Even how = 'inner' would have also worked in this case

3 Comments

please explain a little bit like what is what and how this will help
It's joining two data frames (product and sub_product) by product_id column which is common for both. The outer join returns all records that match the key on both the data frames. Even how = 'inner' would have also worked in this case.
Please add this to you answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.