efficiently searching through data using pandas dataframe

Question

i have 3 csv files, 1st has 1m records, 2nd has 2m, 3rd has 5m records. file 1 has columns cust_id,fname,lname file 2 has columns cust_id, prod_id, price, date file 3 has columns prod_id, prod_code, price, quantity

so, what i want is select details of 10 customers from above three files and place them into 3 different new csv files. i.e. for each customer (from 10 customers) i want cust_id,fname,lname from file1 and place the result in new csv file, cust_id, prod_id, price, date from file2 place the result in new csv file, prod_id, prod_code, price, quantity from file3 place the result in new csv file.

code:

import pandas as pd

customers = pd.read_csv("customers10.csv")

customer_details = pd.read_csv("file1.csv")

products = pd.read_csv("file2.csv")

product_items = pd.read_csv("file3.csv")

table1 = pd.DataFrame(columns=file1.columns)

table1 = pd.concat([customer_details[customer_details['cust_id'].isin(customer_details['cust_id'])],table1])

table2 = pd.DataFrame(columns=products.columns)

table2 = pd.concat([products[products['cust_id'].isin(customer_details['cust_id')],table2])

table3 = pd.DataFrame(columns=product_items.columns)

table3 = pd.concat([product_items[product_items['prod_id'].isin(products['prod_id'])],table3])

i want to operate this on files with millions of records, is this efficient to do or there are any other ways?

jsmart · Accepted Answer · 2020-08-06 14:37:34Z

1

pandas read_csv() has parameters that may be useful for relative large data sets, like these. See iterator, chunk size and memory_map in the docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

pandas is an in-memory system, so 'large data set' is relative to the amount of RAM in the computer.

answered Aug 6, 2020 at 14:37

jsmart

3,0111 gold badge9 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

efficiently searching through data using pandas dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related