1

I have a CSV file of interview transcripts exported from an h5 file. When I read the rows into python, the output looks something like this:

    line[0]=['title,date,responses']
    line[1]=['[\'Transcript 1 title\'],"[\' July 7, 1997\']","[ '\nms. vogel: i look at all sectors of insurance, although to date i\nhaven\'t really focused on the reinsurers and the brokers.\n']'] 
    line[2]=['[\'Transcript 2 title\'],"[\' July 8, 1997\']","[ '\nmr. tozzi: i formed cambridge in 1981. we are top-down sector managers,\nconstantly searching for non-consensus companies and industries.\n']']
    etc...

I'd like to extract the text from the "responses" column ONLY into separate .txt files for every row in the CSV file, saving the .txt files into a specified directory and naming them as "t1.txt", "t2.txt", etc. according to the row number. The CSV file has roughly 30K rows.

Drawing from what I've already been able to find online, this is the code I have so far:

    import csv
    with open("twst.csv", "r") as f:
        reader = csv.reader(f)
        rownumber = 0
        for row in reader:
             g=open("t"+str(rownumber)+".txt","w")
             g.write(row)
             rownumber = rownumber + 1
             g.close()

My biggest problem is that this pulls all columns from the row into the .txt file, but I only want the text from the "responses" column. Once I have that, I know I can loop through the various rows in the file (right now, what I have set up is just to test the first row), but I haven't found any guidance on pulling specific columns in the python documentation. I'm also not familiar enough with python to figure out the code on my own.

Thanks in advance for the help!

2
  • 3
    Sorry this is not "rent a coder". You should try something and then post your specific problem. Commented Aug 19, 2015 at 5:29
  • Apologies, this is my first attempt to get help on here and I was just trying to be as concise as possible. I actually did try a few things, but because I'm just learning python, I assumed any code I included would be irrelevant. I've edited the post to show what I've already tried and will make sure to do so in any future posts, as well. Commented Aug 20, 2015 at 1:22

1 Answer 1

2

There may be something that can be done with the built-in csv module. However, if the format of the csv does not change, the following code should work by just using for loops and built-in read/write.

with open('test.csv', 'r') as file:
    data = file.read().split('\n')

for row in range(1, len(data)):
    third_col= data[x].split(',')
    with open('t' + str(x) + '.txt', 'w') as output:
        output.write(third_col[2])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.