process multiple csv file in python

Question

I have multiple csv files in the following manner. All of the files have the same format.

|    | items   |   per_unit_amount |   number of units |
|---:|:--------|------------------:|------------------:|
|  0 | book    |                25 |                 5 |
|  1 | pencil  |                 3 |                10 |

First, I want to calculate the total amount of bills in python. Once calculated the total amount, I need to calculate the total amount of bills for all the csv files at the same time i.e in a multi-threaded manner.

I need to do it using multi threading.

all your file have same items? Also please share what you have done and where you stuck? — Mohsen
– Mohsen, Commented Feb 25, 2020 at 5:43
As you didn't share any code, i would have my way first merge all files, then cum up them i'm posting the answer — Mohsen
– Mohsen, Commented Feb 25, 2020 at 6:00

Mohsen · Accepted Answer · 2020-02-25 06:19:53Z

1

this would be my way, first merge all CSV files then sum each item:

import glob
import os
import pandas as pd

# the path to your csv file directory
mycsvdir = 'C:\\your csv location\\your csv location'

#select all csv file you can have some kind of filter too
csvfiles = glob.glob(os.path.join(mycsvdir, '*.csv'))

# loop through the files and read them in with pandas
dataframes = []  # a list to hold all the individual pandas DataFrames
for csvfile in csvfiles:
    df = pd.read_csv(csvfile)
    dataframes.append(df)

# concatenate them all together
result = pd.concat(dataframes, ignore_index=True)

# print out to a new csv file
result.to_csv('all.csv')

now you have all.csv file that is the merge of your CSV file. we can sum any item now by below code:

dff = pd.read_csv('C:\\output folder\\output folder\\all.csv')


table = pd.pivot_table(dff, index =['items', 'per_unit_amount'])
print(table)

edited Feb 25, 2020 at 6:19

answered Feb 25, 2020 at 6:10

Mohsen

1,0892 gold badges9 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

code_nation Over a year ago

I want to do this using multi threading, is it possible @Mohsen

rednafi · Accepted Answer · 2020-02-25 06:15:30Z

0

You can use pandas library to achieve that. Install pandas via, pip install pandas.

The workflow should go like this:

Get a list of the filenames (filepath actually) of the csv files via glob
Iterate the filenames, load the files using pandas and keep them in a list
Concat the list of the dataframes into a big dataframe
Perform you desired calculations

from glob import glob
import pandas as pd

# getting a list of all the csv files' path
filenames = glob('./*csv')

# list of dataframes
dfs = [pd.read_csv(filename) for filename in filenames]

# concat all dataframes into one dataframe
big_df = pd.concat(dfs, ignore_index=True)

The big_df should look like this. Here, I have used two csv files with two rows of input. So the concatenated dataframe has 4 rows in total.

|    | items   |   per_unit_amount |   number of units |
|---:|:--------|------------------:|------------------:|
|  0 | book    |                25 |                 5 |
|  1 | pencil  |                 3 |                10 |
|  2 | book    |                25 |                 5 |
|  3 | pencil  |                 3 |                10 |

Now let's multiply per_unit_amount with number of units to get unit_total:

big_df['unit_total'] = big_df['per_unit_amount'] * big_df['number of units']

Now the dataframe has an extra column:

|    | items   |   per_unit_amount |   number of units |   unit_total |
|---:|:--------|------------------:|------------------:|-------------:|
|  0 | book    |                25 |                 5 |          125 |
|  1 | pencil  |                 3 |                10 |           30 |
|  2 | book    |                25 |                 5 |          125 |
|  3 | pencil  |                 3 |                10 |           30 |

You can calculate the total by summing all the entries in the unit_total column:

total_amount = big_df['unit_total'].sum()

> 310

edited Feb 25, 2020 at 6:15

answered Feb 25, 2020 at 6:10

rednafi

1,7161 gold badge19 silver badges40 bronze badges

4 Comments

code_nation Over a year ago

What if all my csv files have different format and i want to process all of them at same time ?

rednafi Over a year ago

In that case you have to manually pick up the columns that you want to sum, format the dataframes so that they are uniform, concat and then perform the operations. You can't concat if the dataframes aren't uniform.

rednafi Over a year ago

Also, this is a CPU heavy operation, not i/o heavy. So multithreading is sort of redundant here.

code_nation Over a year ago

The data will be more than 1000 in one csv file, so I think multi threading can help here ?

Collectives™ on Stack Overflow

process multiple csv file in python

2 Answers 2

1 Comment

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related