How to read csv with pandas in tmp directory in aws lambda

Question

I am writing a lambda to read some data from a csv into a dataframe, manipulate said data then convert it back to a csv and make an api call with the new csv all on a python lambda.

I am running into an issue using pandas.read_csv command. It ends my lambdas trigger execution with no errors.

os.chdir('/tmp')
for root, dirs, files in os.walk('/tmp', topdown=True):
    for name in files:
        if '.csv' in name:
            testdic[name] = root
            print(os.path.isfile('/tmp/' + name))
            print(os.path.isfile(name))
            df = pd.read_csv(name)
            df = pd.read_csv('/tmp/' + name)

Both os.path.isfile return true and i have tried both versions of read_csv, both do not work and end the lambda prematurely without error.

I have confirmed the csv is downloaded into the lambda tmp directory, I can read and print off rows of the csv in tmp. However when i run = pd.read_csv('/tmp/file.csv') or changing my directory to /tmp and doing = pd.read_csv('file.csv') it ends the lambda with no error and does not pass that point in the code. I am using pandas 0.23.4 as that is what I need to use and the code works locally. Any suggestions would be helpful

Expected results should be the csv being read into a dataframe so I can manipulate it.

FIXED: Could not just use '/tmp/' + filename. Had to use os.path.join(root, filename), also had to increase the timeout of my lambda due to file size.

Use file_path = os.path.join(root, name) and then pd.read_csv(file_path)? — Vishnudev Krishnadas
– Vishnudev Krishnadas, Commented Jun 3, 2019 at 16:57
I used chdir based on other stack overflow advice. The os.path.join allowed the smaller file to read in which showed me the issue was also my timeout was too short. Thanks! — Gabe Maurer
– Gabe Maurer, Commented Jun 4, 2019 at 16:02
Yes increasing the lambda timeout and using the os.path.join got it working. Thank you — Gabe Maurer
– Gabe Maurer, Commented Jun 5, 2019 at 17:05

Vishnudev Krishnadas · Accepted Answer · 2019-06-06 08:49:58Z

1

os.path.join - works for different platforms

Use

file_path = os.path.join(root, name)

and then

pd.read_csv(file_path)

NOTE: Increase the AWS lambda timeout as suggested in comments by @Gabe Maurer

answered Jun 6, 2019 at 8:49

Vishnudev Krishnadas

11k2 gold badges29 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to read csv with pandas in tmp directory in aws lambda

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related