python: list element in CSV file

Question

I have a csv file of such structure:

Id,Country,Cities
1,Canada,"['Toronto','Ottawa','Montreal']"
2,Italy,"['Rome','Milan','Naples', 'Palermo']"
3,France,"['Paris','Cannes','Lyon']"
4,Spain,"['Seville','Alicante','Barcelona']"

The last column contains a list, but it is represented as a string so that it is treated as a single element. When parsing the file, I need to have this element as a list, not a string. So far I've found the way to convert it:

L = "['Toronto','Ottawa','Montreal']"
seq = ast.literal_eval(L)

Since I'm a newbie in python, my question is -- is this normal way of doing this, or there's a right way to represent lists in CSV so that I don't have to do conversions, or there's a simpler way to convert?

Thanks!

I'm sure this link will help you stackoverflow.com/questions/1894269/… — kederrac
– kederrac, Commented Jan 29, 2020 at 22:44

damon · Accepted Answer · 2020-02-11 19:50:04Z

2

Using ast.literal_eval(...) will work, but it requires special syntax that other CSV-reading software won't recognize, and uses an eval statement which is a red flag.

Using eval can be dangerous, even though in this case you're using the safer literal_eval option which is more restrained than the raw eval function.

Usually what you'll see in CSV files that have many values in a single column is that they'll use a simple delimiter and quote the field.

For instance:

ID,Country,Cities
1,Canada,"Toronto;Ottawa;Montreal"

Then in python, or any other language, it becomes trivial to read without having to resort to eval:

import csv

with open("data.csv") as fobj:
    reader = csv.reader(fobj)
    field_names = next(reader)

    rows = []
    for row in reader:
        row[-1] = row[-1].split(";")
        rows.append(row)

Issues with `ast.literal_eval`

Even though the ast.literal_eval function is much safer than using a regular eval on user input, it still might be exploitable. The documentation for literal_eval has this warning:

Warning: It is possible to crash the Python interpreter with a sufficiently large/complex string due to stack depth limitations in Python’s AST compiler.

A demonstration of this can be found here:

>>> import ast
>>> ast.literal_eval("()" * 10 ** 6)
[1]    48513 segmentation fault  python

I'm definitely not an expert, but giving a user the ability to crash a program and potentially exploit some obscure memory vulnerability is bad, and in this use-case can be avoided.

If the reason you want to use literal_eval is to get proper typing, and you're positive that the input data is 100% trusted, then I suppose it's fine to use. But, you could always wrap the function to perform some sanity checks:

def sanely_eval(value: str, max_size: int = 100_000) -> object:
    if len(value) > max_size:
        raise ValueError(f"len(value) is greater than the max_size={max_size!r}")
    return ast.literal_eval(value)

But, depending on how you're creating and using the CSV files, this may make the data less portable, since it's a python-specific format.

edited Feb 11, 2020 at 19:50

answered Jan 29, 2020 at 21:31

damon

15.2k16 gold badges61 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Mark Over a year ago

thanks for feedback. Did you probably mean rows.append(row[-1].split(";")) because otherwise it complains AttributeError: 'list' object has no attribute 'split'

damon Over a year ago

@Mark Thank you for spotting that! I've updated my example. The code is slightly different from yours so that it preserves the whole row, and not just the cities column.

Mark Over a year ago

could you explain why eval statement is a big red flag? Is it considered dangerous?

damon Over a year ago

@Mark I've added detail to my answer. I'm assuming you meant literal_eval, rather than eval. If you want info about why calling eval("user input...") is dangerous, look here: nedbatchelder.com/blog/201206/eval_really_is_dangerous.html

kederrac · Accepted Answer · 2020-01-29 22:42:20Z

2

If you can control the CSV, you could separate the items with some other known character that isn't going to be in a city and isn't a comma. Say colon (:).

Then row one, for example, would look like this:

1,Canada,Toronto:Ottawa:Montreal

When it comes to processing the data, you'll have that whole element, and you can just do

cities.split(':')

If you want to go the other way (you have the cities in a Python list, and you want to create this string) you can use join()

':'.join(['Toronto', 'Ottawa', 'Montreal'])

edited Jan 29, 2020 at 22:42

kederrac

17.4k6 gold badges36 silver badges58 bronze badges

answered Jan 29, 2020 at 21:28

blueteeth

3,5851 gold badge15 silver badges25 bronze badges

1 Comment

Slam Over a year ago

Separating with special characters is inventing a wheel. CSV standard has notion for double-quotes that handles delimiters inside. For TS there's no difference if it's .split or deserializing json

Giannis Clipper · Accepted Answer · 2020-01-29 23:17:56Z

0

For the specific structure of the csv, you could convert cities to list like this:

cities = '''"['Rome','Milan','Naples', 'Palermo']"'''

cities = cities[2:-2]  # remove "[ and ]"

print(cities)  # 'Rome','Milan','Naples', 'Palermo'

cities = cities.split(',')  # convert to list

print(cities)  # ["'Rome'", "'Milan'", "'Naples'", " 'Palermo'"]

cities = [x.strip() for x in cities]  # remove leading or following spaces (if exists)

print(cities)  # ["'Rome'", "'Milan'", "'Naples'", "'Palermo'"]

cities = [x[1:-1] for x in cities]  # remove quotes '' from each city

print(cities)  # ['Rome', 'Milan', 'Naples', 'Palermo']

answered Jan 29, 2020 at 23:17

Giannis Clipper

7075 silver badges9 bronze badges

Collectives™ on Stack Overflow

python: list element in CSV file

3 Answers 3

Issues with `ast.literal_eval`

4 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Issues with ast.literal_eval

4 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Issues with `ast.literal_eval`