0

I would like to sum amount by a company name, but often format of company name is different.. such as Apple Inc is sometimes Apple computer, Apple Inc. Also.. I don't know how I could handle "header"

My file format is CSV.

company amount
a   20
b   10
A'  30
bb  20

I would like to do like this:

line = readline() if line=='':
break
if 'Apple' in line:
sum(amount)
1
  • You should post the first few lines of the CSV file Commented Nov 20, 2011 at 13:44

2 Answers 2

2

Your data is not in true CSV format. The apparent columns are not separated by a comma, nor a tab, or even a single space. Sometimes there are multiple spaces... If this were a space-separated-values file, each space would indicate a new column. Multiple spaces would mean you have more than two columns per line.

This detail is important since CSV files are easily parsed by the csv module. But since this is not a true CSV file, we can't use the csv module.

Assuming there are always supposed to be just two columns separated by spaces, and the last column represents a numeric amount (except for the first header line):

total=0
with open('data.csv','r') as f:
    next(f)  # skip the first (header) line 
    for line in f:
        company,amount=line.rsplit(' ',1)
        amount=float(amount)
        if 'Apple' in company:
            total+=amount
print(total)
Sign up to request clarification or add additional context in comments.

1 Comment

thank you sooo much! but Python said "empty in string" I have no idea.. ohh
0

You're going to need to map the name variations somehow, either by totaling each name separately and combining afterward by hand, or by making a dictionary up front that identifies all the aliases used by each company. if 'Apple' in line: fails hard because it can undetectably mix the amounts from different companies together.

Company = {"Apple": 1, "Apple Computer": 1, "AAPL": 1, "Apple, Inc": 1,
           "Apple Vacations": 2, "Applebee's Restaurant": 3 }

sum[Company[name]] += amount

Edit 2: If you don't know all the company names beforehand, then the best you can do is keep track of the unique names contained in the input file and decide whether to merge them later:

Company = {}
for <name, amount> in file:  # pseudo-code for reading and parsing the input
    if name in Company:
        Company[name] += amount
    else:
        Company[name] = amount

2 Comments

thank you so much! file is like this: company,amount Apple,300 Apple.inc,500 Apple Computer,1000 aa,750 bb,250
If you don't know all the names beforehand, then you'll need to keep track of each unique name contained in the input. If your input is Apple 10, Apple 20, Applebees 75, then your output would have 2 companies: Apple 30, Applebees 75. You'd have to decide yourself whether to add 30 and 75 together as one company (as you would for Apple and Apple Computer) or leave them separate as two different companies.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.