Errors when loading .csv file using pandas in python

Question

I have a large sized csv file, approximately 6gb, and it's taking a lot of time to load on to python. I get the following error:

import pandas as pd
df = pd.read_csv('nyc311.csv', low_memory=False)


Python(1284,0x7fffa37773c0) malloc: *** mach_vm_map(size=18446744071562067968) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 646, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 401, in _read
    data = parser.read()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 939, in read
    ret = self._engine.read(nrows)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 1508, in read
    data = self._reader.read(nrows)
  File "pandas/parser.pyx", line 851, in pandas.parser.TextReader.read (pandas/parser.c:10438)
  File "pandas/parser.pyx", line 939, in pandas.parser.TextReader._read_rows (pandas/parser.c:11607)
  File "pandas/parser.pyx", line 2024, in pandas.parser.raise_parser_error (pandas/parser.c:27037)
pandas.io.common.CParserError: Error tokenizing data. C error: out of memory

I don't think I am understanding the error code, the last line seems to suggest that the file is too big to load? I also tried low_memory=FALSE option but this did not work either.

I'm not sure what " can't allocate region" mean, could it be possible that the header includes 'region' and pandas cannot locate the column underneath?

you need to read the file in chunks. Use the chunksize parameter — Ted Petrou
– Ted Petrou, Commented Feb 8, 2017 at 4:24
Just a heads up on another cause of this in pandas 0.20.3, I have the *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug error in a script that I last ran in a previous pandas version. The cause in this case, or at least the thing that rectified the error, was the low_memory = False option. The script is loading a large (1.2Gb) dataset but with 32Gb of RAM available and it and larger datasets load happily on the same machine, but my script failed at df = pd.read_csv(datasetName, low_memory = False) until low_memory = False was removed. — jnPy
– jnPy, Commented Feb 10, 2018 at 13:52

Shubham R · Accepted Answer · 2017-02-08 05:52:52Z

1

Out of memory issue occur due to RAM. There's no other explaination for that.

Sum of all data memory-overheads for in-RAM objects !< RAM

malloc: *** mach_vm_map(size=18446744071562067968) failed You can clearly understand from this error statement.

Try using.

df = pd.read_csv('nyc311.csv',chunksize =5000,lineterminator='\r')

Or, if reading this csv is only a part of your program, and if there are any other dataframes created before,try cleaning them if not in use.

import gc
del old_df              #clear dataframes not in use
gc.collect()        # collect Garbage 
del gc.garbage[:]   # Clears RAM

`

answered Feb 8, 2017 at 5:52

Shubham R

7,67618 gold badges65 silver badges127 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Asteroid098 Over a year ago

Hi, thank you for your comment. Why would I get the following error message:Python(5431,0x7fffa37773c0) malloc: *** mach_vm_map(size=18446744071562067968) failed (error code=3) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug Python(5431,0x7fffa37773c0) malloc: *** error for object 0x104623257: pointer being freed was not allocated *** set a breakpoint in malloc_error_break to debug

Shubham R Over a year ago

@song0089 malloc meands memory allocation , it seems there's some issue with allocating free memory to store your dataframe. It begins with a pointer, and then each row of your dataframe is saved in your memory and the pointer is incremented each time, as you could see, at object 0x104623257 (which maybe some nth row) the pointer has no more free address(i.e memory) where it could point that row to be stored, that's why you are getting this error. If you're satisfied, kindly upvote/accept answer as it is a common practice here.

Collectives™ on Stack Overflow

Errors when loading .csv file using pandas in python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related