Python - Combining Columns in a CSV file

Question

I'm trying to create code that will take data form certain columns in a CSV file and combine them into a new CSV file. I was directed to use Pandas but I'm not sure if I'm even on the right track. I'm fairly new to Python so prepare yourselves for potentially awful code.

I want to use data.csv:

Customer_ID,Date,Time,OtherColumns,A,B,C,Cost
1003,January,2:00,Stuff,1,5,2,519
1003,January,2:00,Stuff,1,3,2,530
1003,January,2:00,Stuff,1,3,2,530
1004,Feb,2:00,Stuff,1,1,0,699

and create a new CSV that looks like this:

Customer_ID,ABC
1003,152
1003,132
1003,132
1004,110

What I have so far is:

import csv
import pandas as pd

df = pd.read_csv('test.csv', delimiter = ',')
custID = df.customer_ID
choiceA = df.A
choiceB = df.B
choiceC = df.C

ofile  = open('answer.csv', "wb")
writer = csv.writer(ofile, delimiter = ',')
writer.writerow(custID + choiceA + choiceB + choiceC)

Unfortunately all that does is add each row together, then create a CSV of each row summed together as one row. My true end goal would be to find the most occurring values in columns A-C and combine each customer into the same row, using the most occurring values. I'm awful at explaining. I'd want something that takes data.csv and makes this:

Customer_ID,ABC
1003,132
1004,110

"the most occurring values"? What do you want to happen if there are two ID/ABC pairs with the same number of occurrences? (E.g. 1003, 132 and 1003, 142, say.) — DSM
– DSM, Commented Mar 2, 2014 at 1:02
I don't particularly care which is chosen for now but I'd like to know for future reference how to manipulate which is chosen based on other calculations. Perhaps if the sale was during the first half of the year, it chooses the lower value, but if it's the second half of the year it chooses the higher. I'm still learning Python as I said so I greatly appreciate your help — SgtSeamonkey
– SgtSeamonkey, Commented Mar 2, 2014 at 1:09

Atalajaka · Accepted Answer · 2021-07-13 09:33:40Z

3

You can sum the columns you're interested in, if their type is string:

In [11]: df = pd.read_csv('data.csv', index_col='Customer_ID')

In [12]: df
Out[12]:
                Date  Time OtherColumns  A  B  C  Cost
Customer_ID
1003         January  2:00        Stuff  1  5  2   519
1003         January  2:00        Stuff  1  3  2   530
1003         January  2:00        Stuff  1  3  2   530
1004             Feb  2:00        Stuff  1  1  0   699

In [13]: res = df[list('ABC')].astype(str).sum(1)  # cols = list('ABC')

In [14]: res
Out[14]:
Customer_ID
1003           152
1003           132
1003           132
1004           110
dtype: float64

To get the csv, you can first use to_frame to add the desired column name:

In [15]: res.to_frame(name='ABC')  # ''.join(cols)
Out[15]:
             ABC
Customer_ID
1003         152
1003         132
1003         132
1004         110

In [16]: res.to_frame(name='ABC').to_csv('new.csv')

edited Jul 13, 2021 at 9:33

Atalajaka

1251 gold badge2 silver badges14 bronze badges

answered Mar 2, 2014 at 1:18

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

SgtSeamonkey Over a year ago

Amazing that it can be done in only 3 lines, I was clearly over-thinking it, thank you so much. Do you know of a way to code: if customer_ID = customer_ID on the last line then find the mode of all the A's, all the B's, and all the C's and return on row for each customer?

Andy Hayden Over a year ago

Not sure what asking, one part perhaps is res.groupby(level=0).last(). Best to ask new question explicitly in the scope of DataFrames rather than csvs! :)

SgtSeamonkey Over a year ago

I will look into it more, thank you! I posted a new question about it, but while I wait for a response I will look into res.groupby

Collectives™ on Stack Overflow

Python - Combining Columns in a CSV file

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related