0

i have 3 csv files, 1st has 1m records, 2nd has 2m, 3rd has 5m records. file 1 has columns cust_id,fname,lname file 2 has columns cust_id, prod_id, price, date file 3 has columns prod_id, prod_code, price, quantity

so, what i want is select details of 10 customers from above three files and place them into 3 different new csv files. i.e. for each customer (from 10 customers) i want cust_id,fname,lname from file1 and place the result in new csv file, cust_id, prod_id, price, date from file2 place the result in new csv file, prod_id, prod_code, price, quantity from file3 place the result in new csv file.

code:

import pandas as pd

customers = pd.read_csv("customers10.csv")

customer_details = pd.read_csv("file1.csv")

products = pd.read_csv("file2.csv")

product_items = pd.read_csv("file3.csv")

table1 = pd.DataFrame(columns=file1.columns)

table1 = pd.concat([customer_details[customer_details['cust_id'].isin(customer_details['cust_id'])],table1])

table2 = pd.DataFrame(columns=products.columns)

table2 = pd.concat([products[products['cust_id'].isin(customer_details['cust_id')],table2])

table3 = pd.DataFrame(columns=product_items.columns)

table3 = pd.concat([product_items[product_items['prod_id'].isin(products['prod_id'])],table3])

i want to operate this on files with millions of records, is this efficient to do or there are any other ways?

1 Answer 1

1

pandas read_csv() has parameters that may be useful for relative large data sets, like these. See iterator, chunk size and memory_map in the docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

pandas is an in-memory system, so 'large data set' is relative to the amount of RAM in the computer.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.