Reading columns of a csv file with python

Question

I am using python's CSV module to iterate over the rows of a column.

What I need to do is:

Get the first row for column "title"
Remove any spanish characters (accents, Ñ)
Remove single quotes
Finally, replace spaces with dashes and convert everything to lowercase.

I got this to work with a simple test file,not a csv. I also managed to print each title in it's own separate line.

But now I'm using this code to go over the CSV file (sorry for the VERY ugly code, I'm a newbie programmer):

import csv
import unicodedata
import ast

def strip_accents(s):
  return ''.join((c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn'))

dic_read = csv.DictReader(open("output.csv", encoding = "utf8"))

for line in dic_read:

    #print(line)     #I get each line of the csv file as a dictionary.
    #print(line["title"])  # I get only the "title" column on each line

    line = line.replace(' ', '-').lower()
    line = line.replace("´", "")
    line = strip_accents(line)
    fp=open("cleantitles.txt", "a")
    fp.write(line)
    fp.close()

I get the following error:

Traceback (most recent call last):
  File "C:/csvreader3.py", line 15, in <module> line = strip_accents(line)
 File "C:/csvreader3.py", line 7, in strip_accents
  return ''.join((c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn'))
  TypeError: must be str, not dict

I also get a similar error when I try to do a .replace only. I understand now that these methods only apply to strings.

How can I get this to work? I searched around for a way to convert a dict to a string object but that didn't work.

Also, any criticism to optimize and make my code more readable are also welcome!

What exactly are you trying to do here? Read in a csv file, modify all 'title' entries to the format you want, then output that same line with the modified title column into a new file? — James Hurford
– James Hurford, Commented Jul 30, 2011 at 19:35
Exactly. I actually got my script to read all the title entries, but since I couldn't find a way to treat these as strings I had to output all the title entries to a new text file AND THEN process this text file in to another new, final text file. Not very efficient but I got the result in the end. — RafaelM
– RafaelM, Commented Jul 30, 2011 at 20:01

James Hurford · Accepted Answer · 2011-07-30 20:42:13Z

With the new information at hand, I think you might find this method to be simpler.

Use the inbuilt function 'map'. I'll leave the explanation of what 'map' does to the python documentation.

Here is what I think you should do

Create a function that takes a line/dict and processes it to the format you want

def strip_unwanted(line):
    title = str(line['title']).replace(' ', '-').replace("´", "")
    title = ''.join((c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn'))
    line['title'] = title
    return line

with open("output.csv", encoding = "utf8") as input:
    dic_entries = csv.DictReader(input)
    # use the 'map' function
    new_entries = map(strip_unwanted, dic_entries)

    with open('some.csv', 'wb') as output:
        writer = csv.DictWriter(output)
        writer.writerows(new_entries)

amit kumar · Accepted Answer · 2011-07-30 17:56:04Z

1

line is a dict. Probably you want to call replace on line['title'].

answered Jul 30, 2011 at 17:56

amit kumar

21.2k24 gold badges96 silver badges129 bronze badges

5 Comments

RafaelM Over a year ago

I tried line = line['title'].replace("´", "") and got this error TypeError: string indices must be integers

James Hurford Over a year ago

What is the first line of your csv file?

amit kumar Over a year ago

DictReader forms the keys for the dict line using the first row of the CSV file, unless keys are otherwise specified. See doughellmann.com/PyMOTW/csv

James Hurford Over a year ago

Yes which would explain why "line = line['title'].replace("´", "")" would come up with the exception "TypeError: string indices must be integers" if the first line did not have a delimited list of column headers, or the one that is there is missing the heading "title". Otherwise I would say your answer is correct.

RafaelM Over a year ago

@james-hurford yes, the first line of the CSV file has the headers for each column. I actually managed to get it to print the key values for the column I want. The problem is I cannot modify the values as strings.

2 revs · Accepted Answer · 2011-07-30 17:50:50Z

0

When you have problems with a function try making it output something instead of trying to return it. That way, you can verify that it works and isolate the problem. You have too many statements on one line. That makes it difficult to know where the problem is. Do you realize what a dict is? Of course there is no straightforward way to convert at dict to a string. You need to find out what data you want to keep.

Also, did you mean to make a list comprehension? You should use square brackets then.

edited Jul 30, 2011 at 17:50

community wiki

2 revs
Janus Troelsen

3 Comments

RafaelM Over a year ago

Yes, I did manage to print out each dict using print(line) . See the code that I commented out. I'm not sure about the list comprehension part you mention. I'm only trying to get the value for title in the dict.

James Hurford Over a year ago

The methods used in list comprehension don't have to be done inside square braces, what RafaelM has done in this case is perfectly correct. Try doing (x for x in range(10) if x % 2 == 0) and see what you get.

RafaelM Over a year ago

I just understood what @Ysangkok meant. The actual problem is not in the strip_accents() function, the problem is that the function or the .replace won't work with dicts.

Collectives™ on Stack Overflow

Reading columns of a csv file with python

3 Answers 3

Comments

5 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

5 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related