2

I can't seem to figure out how to copy my header row from master to matched... I need to grab the first row in my master csv and write it first in matched, then write the remaining lines if they match the criteria...

with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    for line in master:
            if any(city in line.split('","')[5] for city in citys) and \
            any(state in line.split('","')[6] for state in states) and \
            not any(category in line.split('","')[2] for category in categorys):
                matched.write(line)

Please help. I am new to python and don't know how to use pandas or anything else...

4
  • What's the need for the single-double-comma-single-double pattern? Is that so that it ignores commas embedded within quotes? Commented Dec 28, 2016 at 22:12
  • Do you need the "for city in citys"? You are only running the IF statement on one line at a time, right? Commented Dec 28, 2016 at 22:13
  • @ScottEdwards2000 The single-double-comma-single-double pattern is due to the format of my csv Commented Dec 28, 2016 at 22:14
  • ah, ok - could you elaborate on the specific format of your csv that requires that pattern? just haven't seen it before Commented Dec 28, 2016 at 22:15

2 Answers 2

3

you can just consume the first line of the file to read and write it back in the file to be written:

with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    matched.write(next(master)) # can't use readline when iterating on the file afterwards

Seems that you really need csv module, though, for the rest. I'll edit my answer to attempt something in that direction

With the csv module, no need for those unsafe split. Comma is the default separator and quotes are also handled properly. So I'd just write:

import csv
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    cr = csv.reader(master)
    cw = csv.writer(matched)
    cw.writerow(next(cr))  # copy title

    for row in cr:  # iterate on the rows, already organized as lists
        if any(city in row[5] for city in citys) and \
        any(state in row[6] for state in states) and \
        not any(category in row[2] for category in categorys):
            cw.writerow(row)

BTW your filter checks that city is contained in row[5], but maybe you'd like an exact match. Ex: "York" would match "New York", which is probably not what you want. So my proposal would be using in to check if the string is in the list of strings, for each criterion:

import csv
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    cr = csv.reader(master)
    cw = csv.writer(matched)
    cw.writerow(next(cr))  # copy title
    for row in cr:
        if row[5] in citys and row[6] in states and not row[2] in categorys:
           cw.writerow(row)

which can be even bettered using generator comprehension and write all lines at once:

import csv
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    cr = csv.reader(master)
    cw = csv.writer(matched)
    cw.writerow(next(cr))  # copy title
    cw.writerows(row for row in cr if row[5] in citys and row[6] in states and not row[2] in categorys)

note that citys, states, and categorys would be better as sets rather than lists so lookup algorithm is much faster (you didn't provide that information)

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the help and advice. I keep getting this error after adding your code in... Traceback (most recent call last): File "yelpscrape.py", line 51, in <module> cw.writerow(next(cr)) # copy title ValueError: I/O operation on closed file
Nevermind. I got it working. Didn't piece it in quite right. I believe it's working now. Thank you for you help!
0

If you don't want to think too hard about how the line-producing iterator works, oOne straightforward way to do it is to treat the first line special:

with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    first_line = True
    for line in master:
            if first_line or (any(city in line.split('","')[5] for city in citys) and \
            any(state in line.split('","')[6] for state in states) and \
            not any(category in line.split('","')[2] for category in categorys)):
                matched.write(line)
            first_line = False

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.