1

I am using python's CSV module to iterate over the rows of a column.

What I need to do is:

  1. Get the first row for column "title"
  2. Remove any spanish characters (accents, Ñ)
  3. Remove single quotes
  4. Finally, replace spaces with dashes and convert everything to lowercase.

I got this to work with a simple test file,not a csv. I also managed to print each title in it's own separate line.

But now I'm using this code to go over the CSV file (sorry for the VERY ugly code, I'm a newbie programmer):

import csv
import unicodedata
import ast

def strip_accents(s):
  return ''.join((c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn'))

dic_read = csv.DictReader(open("output.csv", encoding = "utf8"))

for line in dic_read:

    #print(line)     #I get each line of the csv file as a dictionary.
    #print(line["title"])  # I get only the "title" column on each line

    line = line.replace(' ', '-').lower()
    line = line.replace("´", "")
    line = strip_accents(line)
    fp=open("cleantitles.txt", "a")
    fp.write(line)
    fp.close()

I get the following error:

Traceback (most recent call last):
  File "C:/csvreader3.py", line 15, in <module> line = strip_accents(line)
 File "C:/csvreader3.py", line 7, in strip_accents
  return ''.join((c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn'))
  TypeError: must be str, not dict

I also get a similar error when I try to do a .replace only. I understand now that these methods only apply to strings.

How can I get this to work? I searched around for a way to convert a dict to a string object but that didn't work.

Also, any criticism to optimize and make my code more readable are also welcome!

2
  • What exactly are you trying to do here? Read in a csv file, modify all 'title' entries to the format you want, then output that same line with the modified title column into a new file? Commented Jul 30, 2011 at 19:35
  • Exactly. I actually got my script to read all the title entries, but since I couldn't find a way to treat these as strings I had to output all the title entries to a new text file AND THEN process this text file in to another new, final text file. Not very efficient but I got the result in the end. Commented Jul 30, 2011 at 20:01

3 Answers 3

1

With the new information at hand, I think you might find this method to be simpler.

Use the inbuilt function 'map'. I'll leave the explanation of what 'map' does to the python documentation.

Here is what I think you should do

Create a function that takes a line/dict and processes it to the format you want

def strip_unwanted(line):
    title = str(line['title']).replace(' ', '-').replace("´", "")
    title = ''.join((c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn'))
    line['title'] = title
    return line

with open("output.csv", encoding = "utf8") as input:
    dic_entries = csv.DictReader(input)
    # use the 'map' function
    new_entries = map(strip_unwanted, dic_entries)

    with open('some.csv', 'wb') as output:
        writer = csv.DictWriter(output)
        writer.writerows(new_entries)
Sign up to request clarification or add additional context in comments.

Comments

1

line is a dict. Probably you want to call replace on line['title'].

5 Comments

I tried line = line['title'].replace("´", "") and got this error TypeError: string indices must be integers
What is the first line of your csv file?
DictReader forms the keys for the dict line using the first row of the CSV file, unless keys are otherwise specified. See doughellmann.com/PyMOTW/csv
Yes which would explain why "line = line['title'].replace("´", "")" would come up with the exception "TypeError: string indices must be integers" if the first line did not have a delimited list of column headers, or the one that is there is missing the heading "title". Otherwise I would say your answer is correct.
@james-hurford yes, the first line of the CSV file has the headers for each column. I actually managed to get it to print the key values for the column I want. The problem is I cannot modify the values as strings.
0

When you have problems with a function try making it output something instead of trying to return it. That way, you can verify that it works and isolate the problem. You have too many statements on one line. That makes it difficult to know where the problem is. Do you realize what a dict is? Of course there is no straightforward way to convert at dict to a string. You need to find out what data you want to keep.

Also, did you mean to make a list comprehension? You should use square brackets then.

3 Comments

Yes, I did manage to print out each dict using print(line) . See the code that I commented out. I'm not sure about the list comprehension part you mention. I'm only trying to get the value for title in the dict.
The methods used in list comprehension don't have to be done inside square braces, what RafaelM has done in this case is perfectly correct. Try doing (x for x in range(10) if x % 2 == 0) and see what you get.
I just understood what @Ysangkok meant. The actual problem is not in the strip_accents() function, the problem is that the function or the .replace won't work with dicts.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.