1

I have multiple csv files in the following manner. All of the files have the same format.

|    | items   |   per_unit_amount |   number of units |
|---:|:--------|------------------:|------------------:|
|  0 | book    |                25 |                 5 |
|  1 | pencil  |                 3 |                10 |

First, I want to calculate the total amount of bills in python. Once calculated the total amount, I need to calculate the total amount of bills for all the csv files at the same time i.e in a multi-threaded manner.

I need to do it using multi threading.

8
  • all your file have same items? Also please share what you have done and where you stuck? Commented Feb 25, 2020 at 5:43
  • provide more information in your question ! Commented Feb 25, 2020 at 5:43
  • yes, all files have same items Commented Feb 25, 2020 at 5:44
  • Made some changes @Jai hope you'll get it Commented Feb 25, 2020 at 5:59
  • As you didn't share any code, i would have my way first merge all files, then cum up them i'm posting the answer Commented Feb 25, 2020 at 6:00

2 Answers 2

1

this would be my way, first merge all CSV files then sum each item:

import glob
import os
import pandas as pd

# the path to your csv file directory
mycsvdir = 'C:\\your csv location\\your csv location'

#select all csv file you can have some kind of filter too
csvfiles = glob.glob(os.path.join(mycsvdir, '*.csv'))

# loop through the files and read them in with pandas
dataframes = []  # a list to hold all the individual pandas DataFrames
for csvfile in csvfiles:
    df = pd.read_csv(csvfile)
    dataframes.append(df)

# concatenate them all together
result = pd.concat(dataframes, ignore_index=True)

# print out to a new csv file
result.to_csv('all.csv')

now you have all.csv file that is the merge of your CSV file. we can sum any item now by below code:

dff = pd.read_csv('C:\\output folder\\output folder\\all.csv')


table = pd.pivot_table(dff, index =['items', 'per_unit_amount'])
print(table)
Sign up to request clarification or add additional context in comments.

1 Comment

I want to do this using multi threading, is it possible @Mohsen
0

You can use pandas library to achieve that. Install pandas via, pip install pandas.

The workflow should go like this:

  • Get a list of the filenames (filepath actually) of the csv files via glob
  • Iterate the filenames, load the files using pandas and keep them in a list
  • Concat the list of the dataframes into a big dataframe
  • Perform you desired calculations
from glob import glob
import pandas as pd

# getting a list of all the csv files' path
filenames = glob('./*csv')

# list of dataframes
dfs = [pd.read_csv(filename) for filename in filenames]

# concat all dataframes into one dataframe
big_df = pd.concat(dfs, ignore_index=True)

The big_df should look like this. Here, I have used two csv files with two rows of input. So the concatenated dataframe has 4 rows in total.

|    | items   |   per_unit_amount |   number of units |
|---:|:--------|------------------:|------------------:|
|  0 | book    |                25 |                 5 |
|  1 | pencil  |                 3 |                10 |
|  2 | book    |                25 |                 5 |
|  3 | pencil  |                 3 |                10 |

Now let's multiply per_unit_amount with number of units to get unit_total:

big_df['unit_total'] = big_df['per_unit_amount'] * big_df['number of units']

Now the dataframe has an extra column:

|    | items   |   per_unit_amount |   number of units |   unit_total |
|---:|:--------|------------------:|------------------:|-------------:|
|  0 | book    |                25 |                 5 |          125 |
|  1 | pencil  |                 3 |                10 |           30 |
|  2 | book    |                25 |                 5 |          125 |
|  3 | pencil  |                 3 |                10 |           30 |

You can calculate the total by summing all the entries in the unit_total column:

total_amount = big_df['unit_total'].sum()
> 310

4 Comments

What if all my csv files have different format and i want to process all of them at same time ?
In that case you have to manually pick up the columns that you want to sum, format the dataframes so that they are uniform, concat and then perform the operations. You can't concat if the dataframes aren't uniform.
Also, this is a CPU heavy operation, not i/o heavy. So multithreading is sort of redundant here.
The data will be more than 1000 in one csv file, so I think multi threading can help here ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.