Python import csv to list [duplicate]

Question

I have a CSV file with about 2000 records.

Each record has a string, and a category to it:

This is the first line,Line1
This is the second line,Line2
This is the third line,Line3

I need to read this file into a list that looks like this:

data = [('This is the first line', 'Line1'),
        ('This is the second line', 'Line2'),
        ('This is the third line', 'Line3')]

How can import this CSV to the list I need using Python?

If there is an answer that suits your question, please accept it. — Maciej Gol
– Maciej Gol, Commented Mar 24, 2015 at 21:37

AMC · Accepted Answer · 2020-02-15 06:42:51Z

462

Using the csv module:

import csv

with open('file.csv', newline='') as f:
    reader = csv.reader(f)
    data = list(reader)

print(data)

Output:

[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]

If you need tuples:

import csv

with open('file.csv', newline='') as f:
    reader = csv.reader(f)
    data = [tuple(row) for row in reader]

print(data)

Output:

[('This is the first line', 'Line1'), ('This is the second line', 'Line2'), ('This is the third line', 'Line3')]

Old Python 2 answer, also using the csv module:

import csv
with open('file.csv', 'rb') as f:
    reader = csv.reader(f)
    your_list = list(reader)

print your_list
# [['This is the first line', 'Line1'],
#  ['This is the second line', 'Line2'],
#  ['This is the third line', 'Line3']]

edited Feb 15, 2020 at 6:42

AMC

2,6977 gold badges15 silver badges35 bronze badges

answered Jul 9, 2014 at 19:55

Maciej Gol

15.9k4 gold badges35 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

18 Comments

imrek Over a year ago

Why do you use 'rb' instead of 'r'?

Maciej Gol Over a year ago

@DrunkenMaster, b causes the file to be opened in binary mode as opposed to text mode. On some systems text mode means that \n will be convertes to platform-specific new line when reading or writing. See docs.

Gilbert Over a year ago

This does not work in Python 3.x : "csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)" See below for the answer that works in Python 3.x

paradite Over a year ago

to save a few seconds of time debugging, you should probably add a note for the first solution, like "Python 2.x version"

Sigur Over a year ago

How to use your 1st solution but with only some columns from the csv file?

|

AMC · Accepted Answer · 2020-01-09 19:46:06Z

72

Updated for Python 3:

import csv

with open('file.csv', newline='') as f:
    reader = csv.reader(f)
    your_list = list(reader)

print(your_list)

Output:

[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]

edited Jan 9, 2020 at 19:46

AMC

2,6977 gold badges15 silver badges35 bronze badges

answered Feb 11, 2016 at 13:43

seokhoonlee

1,03813 silver badges19 bronze badges

1 Comment

AMC Over a year ago

Specifying 'r' is the default mode, so specifying it is unnecessary. The docs also mention If csvfile is a file object, it should be opened with newline=''.

Martin Thoma · Accepted Answer · 2020-01-06 10:01:05Z

Pandas is pretty good at dealing with data. Here is one example how to use it:

import pandas as pd

# Read the CSV into a pandas data frame (df)
#   With a df you can do many things
#   most important: visualize data with Seaborn
df = pd.read_csv('filename.csv', delimiter=',')

# Or export it in many ways, e.g. a list of tuples
tuples = [tuple(x) for x in df.values]

# or export it as a list of dicts
dicts = df.to_dict().values()

One big advantage is that pandas deals automatically with header rows.

If you haven't heard of Seaborn, I recommend having a look at it.

See also: How do I read and write CSV files with Python?

Pandas #2

import pandas as pd

# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()

# Convert
dicts = df.to_dict('records')

The content of df is:

     country   population population_time    EUR
0    Germany   82521653.0      2016-12-01   True
1     France   66991000.0      2017-01-01   True
2  Indonesia  255461700.0      2017-01-01  False
3    Ireland    4761865.0             NaT   True
4      Spain   46549045.0      2017-06-01   True
5    Vatican          NaN             NaT   True

The content of dicts is

[{'country': 'Germany', 'population': 82521653.0, 'population_time': Timestamp('2016-12-01 00:00:00'), 'EUR': True},
 {'country': 'France', 'population': 66991000.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': True},
 {'country': 'Indonesia', 'population': 255461700.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': False},
 {'country': 'Ireland', 'population': 4761865.0, 'population_time': NaT, 'EUR': True},
 {'country': 'Spain', 'population': 46549045.0, 'population_time': Timestamp('2017-06-01 00:00:00'), 'EUR': True},
 {'country': 'Vatican', 'population': nan, 'population_time': NaT, 'EUR': True}]

Pandas #3

import pandas as pd

# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()

# Convert
lists = [[row[col] for col in df.columns] for row in df.to_dict('records')]

The content of lists is:

[['Germany', 82521653.0, Timestamp('2016-12-01 00:00:00'), True],
 ['France', 66991000.0, Timestamp('2017-01-01 00:00:00'), True],
 ['Indonesia', 255461700.0, Timestamp('2017-01-01 00:00:00'), False],
 ['Ireland', 4761865.0, NaT, True],
 ['Spain', 46549045.0, Timestamp('2017-06-01 00:00:00'), True],
 ['Vatican', nan, NaT, True]]

tuples = [tuple(x) for x in df.values] can be written tuples = list(df.itertuples(index=False)) instead. Do note that the Pandas docs discourage the use of .values in favour of .to_numpy(). The third example is confusing to me. First, because the variable is named tuples, which would imply that it is a list of tuples, whereas it's actually a list of lists. Second, because as far as I can tell that entire expression can be replaced with df.to_list(). I also don't know if the second example is really relevant here.

AMC · Accepted Answer · 2020-01-06 19:11:10Z

12

Update for Python3:

import csv
from pprint import pprint

with open('text.csv', newline='') as file:
    reader = csv.reader(file)
    res = list(map(tuple, reader))

pprint(res)

Output:

[('This is the first line', ' Line1'),
 ('This is the second line', ' Line2'),
 ('This is the third line', ' Line3')]

If csvfile is a file object, it should be opened with newline=''.
csv module

edited Jan 6, 2020 at 19:11

AMC

2,6977 gold badges15 silver badges35 bronze badges

answered Jan 5, 2018 at 3:12

Wizard

22.7k22 gold badges95 silver badges161 bronze badges

1 Comment

AMC Over a year ago

Why use list(map()) over a list comprehension? Also, notice the whitespace in at the beginning of each element of the second column.

Community · Accepted Answer · 2017-05-23 12:02:56Z

5

If you are sure there are no commas in your input, other than to separate the category, you can read the file line by line and split on ,, then push the result to List

That said, it looks like you are looking at a CSV file, so you might consider using the modules for it

edited May 23, 2017 at 12:02

CommunityBot

11 silver badge

answered Jul 9, 2014 at 19:53

Miquel

15.7k8 gold badges56 silver badges88 bronze badges

Comments

Acid_Snake · Accepted Answer · 2014-07-09 19:54:42Z

5

result = []
for line in text.splitlines():
    result.append(tuple(line.split(",")))

answered Jul 9, 2014 at 19:54

Acid_Snake

692 bronze badges

4 Comments

Barranka Over a year ago

Can you please add a bit of explanation to this post? Code only is (sometimes) good, but code and explanation is (most times) better

Louis Over a year ago

I know Barranka's comment is over a year old, but for anyone who stumbles upon this and can't figure it out: for line in text.splitlines(): puts each individual line in temp variable "line". line.split(",") creates a list of strings that are split on the comma. tuple(~) puts that list in a tuple and append(~) adds it to the result. After the loop, result is a list of tuples, with each tuple a line, and each tuple element an element in the csv file.

AMC Over a year ago

In addition to what @Louis said, there is no need to use .read().splitlines(), you can iterate over the each line of the file directly: for line in in_file: res.append(tuple(line.rstrip().split(","))) Also, do note that using .split(',') means that every element of the second column will begin with extra whitespace.

AMC Over a year ago

Addendum to the code I just shared above: line.rstrip() -> line.rstrip('\n').

Pedro Contipelli · Accepted Answer · 2022-05-01 16:39:08Z

5

You can use the list() function to convert csv reader object to list

import csv

with open('input.csv', newline='') as csv_file:
    reader = csv.reader(csv_file, delimiter=',')
    rows = list(reader)
    print(rows)

edited May 1, 2022 at 16:39

Pedro Contipelli

3162 silver badges10 bronze badges

answered Aug 23, 2020 at 3:01

Kiddo

1,1931 gold badge12 silver badges24 bronze badges

Comments

Hunter McMillen · Accepted Answer · 2014-07-09 19:54:52Z

3

A simple loop would suffice:

lines = []
with open('test.txt', 'r') as f:
    for line in f.readlines():
        l,name = line.strip().split(',')
        lines.append((l,name))

print lines

answered Jul 9, 2014 at 19:54

Hunter McMillen

61.8k25 gold badges124 silver badges176 bronze badges

2 Comments

Tony Ennis Over a year ago

What if some of the entries have commas in them?

Hunter McMillen Over a year ago

@TonyEnnis Then you would need to use a more advanced processing loop. The answer by Maciej above shows how to use the csv parser that comes with Python to perform this operation. This parser most likely has all of the logic you need.

roschach · Accepted Answer · 2018-12-04 15:21:26Z

3

As said already in the comments you can use the csv library in python. csv means comma separated values which seems exactly your case: a label and a value separated by a comma.

Being a category and value type I would rather use a dictionary type instead of a list of tuples.

Anyway in the code below I show both ways: d is the dictionary and l is the list of tuples.

import csv

file_name = "test.txt"
try:
    csvfile = open(file_name, 'rt')
except:
    print("File not found")
csvReader = csv.reader(csvfile, delimiter=",")
d = dict()
l =  list()
for row in csvReader:
    d[row[1]] = row[0]
    l.append((row[0], row[1]))
print(d)
print(l)

edited Dec 4, 2018 at 15:21

answered Jun 11, 2018 at 8:26

roschach

9,60617 gold badges92 silver badges141 bronze badges

5 Comments

AMC Over a year ago

Why not use a context manager to handle the file? Why are you mixing two different variable naming conventions? Isn't (row[0], row[1]) weaker/more error-prone than just using tuple(row)?

roschach Over a year ago

Why do u think doing tuple(row) is less error prone?what variable naming convention are u referring to? Please link an official python naming convention. As far as I know try -except is a good way to handle files: what do u mean by context handler?

AMC Over a year ago

Why do u think doing tuple(row) is less error prone? Because it doesn’t require that you write out every single index manually. If you make a mistake, or the number of elements changes, you have to go back and change your code. The try-except is fine, context managers are the with statement. You can find plenty of resources on the subject, such as this one.

roschach Over a year ago

I don't see how the context manager would be better than the ol' good try-except block. For the other the positive aspect is that u type less code; for the rest if number of elements (I guess u mean the number of columns) changes mine is better because it is extracting only the desired values while the other it's extracting all the excel. Without any specific requirement you cannot say which is better so it's a waste of time arguing which is better:in this case both are valid

AMC Over a year ago

I don't see how the context manager would be better than the ol' good try-except block. Please see my previous comment, the context manager would not replace the try-except.

AMC · Accepted Answer · 2020-01-09 17:57:59Z

1

Unfortunately I find none of the existing answers particularly satisfying.

Here is a straightforward and complete Python 3 solution, using the csv module.

import csv

with open('../resources/temp_in.csv', newline='') as f:
    reader = csv.reader(f, skipinitialspace=True)
    rows = list(reader)

print(rows)

Notice the skipinitialspace=True argument. This is necessary since, unfortunately, OP's CSV contains whitespace after each comma.

Output:

[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]

edited Jan 9, 2020 at 17:57

answered Jan 6, 2020 at 1:18

AMC

2,6977 gold badges15 silver badges35 bronze badges

Comments

Marvin · Accepted Answer · 2018-04-03 21:40:09Z

0

Extending your requirements a bit and assuming you do not care about the order of lines and want to get them grouped under categories, the following solution may work for you:

>>> fname = "lines.txt"
>>> from collections import defaultdict
>>> dct = defaultdict(list)
>>> with open(fname) as f:
...     for line in f:
...         text, cat = line.rstrip("\n").split(",", 1)
...         dct[cat].append(text)
...
>>> dct
defaultdict(<type 'list'>, {' CatA': ['This is the first line', 'This is the another line'], ' CatC': ['This is the third line'], ' CatB': ['This is the second line', 'This is the last line']})

This way you get all relevant lines available in the dictionary under key being the category.

edited Apr 3, 2018 at 21:40

Marvin

2291 gold badge3 silver badges11 bronze badges

answered Jul 9, 2014 at 20:08

Jan Vlcinsky

44.4k12 gold badges106 silver badges103 bronze badges

Comments

Jason Boucher · Accepted Answer · 2019-08-30 21:38:01Z

0

Here is the easiest way in Python 3.x to import a CSV to a multidimensional array, and its only 4 lines of code without importing anything!

#pull a CSV into a multidimensional array in 4 lines!

L=[]                            #Create an empty list for the main array
for line in open('log.txt'):    #Open the file and read all the lines
    x=line.rstrip()             #Strip the \n from each line
    L.append(x.split(','))      #Split each line into a list and add it to the
                                #Multidimensional array
print(L)

answered Aug 30, 2019 at 21:38

Jason Boucher

91 bronze badge

1 Comment

AMC Over a year ago

Be careful, it's a list, not an array! Why not use a context manager to properly handle the file object? Note that this solution leaves extra whitespace on the second item in each row, and that it will fail if any of the data contains a comma.

Alexey Antonenko · Accepted Answer · 2017-07-12 08:06:24Z

-1

Next is a piece of code which uses csv module but extracts file.csv contents to a list of dicts using the first line which is a header of csv table

import csv
def csv2dicts(filename):
  with open(filename, 'rb') as f:
    reader = csv.reader(f)
    lines = list(reader)
    if len(lines) < 2: return None
    names = lines[0]
    if len(names) < 1: return None
    dicts = []
    for values in lines[1:]:
      if len(values) != len(names): return None
      d = {}
      for i,_ in enumerate(names):
        d[names[i]] = values[i]
      dicts.append(d)
    return dicts
  return None

if __name__ == '__main__':
  your_list = csv2dicts('file.csv')
  print your_list

answered Jul 12, 2017 at 8:06

Alexey Antonenko

2,6571 gold badge21 silver badges18 bronze badges

1 Comment

AMC Over a year ago

Why not just use csv.DictReader?

Collectives™ on Stack Overflow

Python import csv to list [duplicate]

13 Answers 13

18 Comments

1 Comment

Pandas #2

Pandas #3

1 Comment

Update for Python3:

1 Comment

Comments

4 Comments

Comments

2 Comments

5 Comments

Comments

Comments

1 Comment

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

13 Answers 13

18 Comments

1 Comment

Pandas #2

Pandas #3

1 Comment

Update for Python3:

1 Comment

Comments

4 Comments

Comments

2 Comments

5 Comments

Comments

Comments

1 Comment

1 Comment

Linked

Related