Why the object, which I read a csv file using pandas from, is TextFileReader object

Question

I read a csv file using pandas:

data_raw = pd.read_csv(filename, chunksize=chunksize)
print(data_raw['id'])

Then, it reports TypeError:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'TextFileReader' object has no attribute '__getitem__'

What can I do to resolve the problem? And how can I change the data_raw into a dataFrame object? I use the python2.7 and pandas v0.19.1

Show your csv file. It is not clear what your objective is. Make it clear what are you trying to do. data_raw is already a DataFrame object. Check with print(type(data_raw)) — Mohammad Yusuf
– Mohammad Yusuf, Commented Jan 25, 2017 at 6:03
Thanks. But the type of data_raw is TextFileReader because of the chunk. (pandas.pydata.org/pandas-docs/stable/generated/…) . You can also see my another question (stackoverflow.com/questions/41843342/…). The purpose of the code is read a big csv file(4GB) into a dataFrame. But the RAM of the computer is just about 3GB. — Long Ye
– Long Ye, Commented Jan 25, 2017 at 9:17

Community · Accepted Answer · 2017-05-23 10:29:15Z

21

When you pass chunksize option to read_csv(), it creates a TextFileReader reader - an open-file-like object that can be used to read the original file in chunks. See usage example here: How to read a 6 GB csv file with pandas When this option is not provided, the function indeed reads the file content.

edited May 23, 2017 at 10:29

CommunityBot

11 silver badge

answered Jan 25, 2017 at 6:23

DYZ

57.3k10 gold badges73 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Long Ye Over a year ago

Thank you very much. But I still don't know how to modify the code if I want to read the data into a DataFrame. Could you help me? BTW the RAM of computer is just 3GB. And CSV file is more than 3GB.

Long Ye Over a year ago

And I have read the "How to read a 6 GB csv file with pandas". I use the pd.DataFrame.append(chunk, ....) in a for cycle sentence. But it reports "You haven't enough memory "error.

Mihajlo T. · Accepted Answer · 2021-04-30 17:44:11Z

19

One way around this problem is to set nrows parameter in pd.read_csv() function and that way you select subset of data you want to load into the dataframe. Of course, drawback is that you wont be able to see and work with full dataset. Code example:

data = pd.read_csv(filename, nrows=100000)

edited Apr 30, 2021 at 17:44

answered Jun 3, 2018 at 0:41

Mihajlo T.

3564 silver badges9 bronze badges

Comments

questionto42 · Accepted Answer · 2021-12-03 09:59:23Z

1

You can convert the TextFileReader to a Dataframe. For small data, use:

df = pd.concat(MyTextFileReader, ignore_index=True)

See How to read data in Python dataframe without concatenating?, also for a solution for large data.

answered Dec 3, 2021 at 9:59

questionto42

9,9228 gold badges80 silver badges125 bronze badges

Comments

Onkar · Accepted Answer · 2024-02-18 14:25:02Z

0

Passing chunksize to read_csv create a iterator of "chunks" i.e. TextFileReader which needs to be processed individually.

Like this -

df_chunks:TextFileReader = pd.read_csv(csv_file_path, sep=',', engine='python', dtype='unicode', chunksize=chunk_size)

for chunk in df_chunks:
    # do something with chunk.
    # chunk is dataframe

answered Feb 18, 2024 at 14:25

Onkar

3645 silver badges13 bronze badges

Collectives™ on Stack Overflow

Why the object, which I read a csv file using pandas from, is TextFileReader object

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related