1

I have a csv file that is "name, place, thing". the thing column often has "word\nanotherword\nanotherword\n" I'm trying to figure out how to parse this out into individual lines instead of multiline entries in a single column. i.e.

name, place, word

name, place, anotherword

name, place , anotherword

I'm certain this is simple, but im having a hard time grasping what i need to do.

2
  • 3
    have you written any code for this yourself? Commented Jan 7, 2014 at 21:41
  • Check out the csv module: docs.python.org/2/library/csv.html Commented Jan 7, 2014 at 21:43

4 Answers 4

3

Without going into the code, essentially what you want to do is check to see if there are any newline characters in your 'thing'. If there are, you need to split them on the newline characters. This will give you a list of tokens (the lines in the 'thing') and since this is essentially an inner loop, you can use the original name and place along with your new thing_token. A generator function lends itself well to this.

This is brings me to kroolik's answer. However, there's a slight error in kroolik's answer:

If you want to go with the column_wrapper generator, you will need to account for the fact that the csv reader escapes backslash in the newlines, so they look like \\n instead of \n. Also, you need to check for blank 'things'.

def column_wrapper(reader):
    for name, place, thing in reader:
        for split_thing in thing.strip().split('\\n'):
            if split_thing:
                yield name, place, split_thing

Then you can obtain the data like this:

with open('filewithdata.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    data = [[data, name, thing] for data, name, thing in column_wrapper(reader)]

OR (without column_wrapper):

data = []
with open('filewithdata.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        name, place, thing = tuple(row)
        if '\\n' in thing:
            for item in thing.split('\\n'):
                if item != '\n':
                    data.append([name, place, item)]

I recommend using column_wrapper as generators are more generic and pythonic.

Be sure to add import csv to the top of your file (although I'm sure you knew that already). Hope that helps!

Sign up to request clarification or add additional context in comments.

Comments

3

Wrap your csv reader with this column_wrapper:

def column_wrapper(reader):
    for name, place, thing in reader:
        for split_thing in thing.strip().split('\n'):
            yield name, place, split_thing

And you will be golden.

Comments

0

You could always the file read line by line

#! /usr/bin/env python2.7.2
file = open("demo.csv", "r+");
for line in file:
   line =  line.replace(",", " ")
   words = line.split()
   print(words[0])
   print(words[1])
   print(words[2])   
file.close()

Assuming the file content is

name1,place1,word1
name2,place2,anotherword2
name3,place3,anotherword3

Comments

0

If someone runs into this with the same issue I had. If you have multiline strings in on of your cells, use the quotechar field as specified in this answer:

how to read a csv file that has multiple lines within the same cell?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.