1

I have similar problem to this guy: find position of a substring in a string

The difference is that I don't know what my "mystr" is. I know my substring but my string in the input file could be random amount of words in any order, but i know one of those words include substring cola.

For example a csv file: fanta,coca_cola,sprite in any order.

If my substring is "cola", then how can I make a code that says

mystr.find('cola')

or

match = re.search(r"[^a-zA-Z](cola)[^a-zA-Z]", mystr)

or

if "cola" in mystr

When I don't know what my "mystr" is?

this is my code:

import csv

with open('first.csv', 'rb') as fp_in, open('second.csv', 'wb') as fp_out:
        reader = csv.DictReader(fp_in)
        rows = [row for row in reader]
        writer = csv.writer(fp_out, delimiter = ',')

        writer.writerow(["new_cola"])

        def headers1(name):
            if "cola" in name:
                    return row.get("cola")


        for row in rows:
                writer.writerow([headers1("cola")])

and the first.csv:

fanta,cocacola,banana
0,1,0
1,2,1                      

so it prints out

new_cola
""
""

when it should print out

new_cola
1
2
3
  • what does these numbers in first.csv:mean ? Are they desired results ? Commented Apr 15, 2014 at 11:01
  • You should explain how you get mystr, why do you expect 1,2 under "new cola". Commented Apr 15, 2014 at 11:04
  • When you call headers1("cola"), of course "cola" in name; name == "cola"! I think you need to rethink your approach. Try looking at what is actually in rows. mystr is just a filler variable - it is whatever string you are trying to process, in this case name. Commented Apr 15, 2014 at 11:05

1 Answer 1

1

Here is a working example:

import csv

with open("first.csv", "rb") as fp_in, open("second.csv", "wb") as fp_out:
        reader = csv.DictReader(fp_in)
        writer = csv.writer(fp_out, delimiter = ",")

        writer.writerow(["new_cola"])

        def filter_cola(row):
            for k,v in row.iteritems():
                if "cola" in k:
                    yield v

        for row in reader:
            writer.writerow(list(filter_cola(row)))

Notes:

  • rows = [row for row in reader] is unnecessary and inefficient (here you convert a generator to list which consumes a lot of memory for huge data)
  • instead of return row.get("cola") you meant return row.get(name)
  • in the statement return row.get("cola") you access a variable outside of the current scope
  • you can also use the unix tool cut. For example:

    cut -d "," -f 2 < first.csv > second.csv
    
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks this was helpful, but what if i would have 2 filters? for row in reader: writer.writerow(list(filter_cola(row)), list(filter_fanta(row))) it gives me error (writerow takes only 1 argument). What am i not understanding here?
writer.writerow(list(filter_cola(row)) + list(filter_fanta(row))) – you have to concatenate the two returned lists with +

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.