aws lambda w/ messy file in python

Question

I'm new to AWS and lambda and would like to trigger a messy file to be converted to a json format using python pandas. The file has no filetype but it can be read in notepad. I have the python file and its working properly but now Im struggling on how to get it incorporated into AWS. Do I transform the file within lambda itself or do I use another service?

I'd like to be able to run my python script and clean up the data within lambda or anther service (whichever is easiest)

Here's a copy of what the file looks like: (colunmns are all over the place and it has no headers)

option19971675181  ACHILLE BLA BLA BLA1         randomblablalba           blabla    88   498
option19971675182  ACHILLE BLA BLA BLA  1                                blabla       176498
option19971675183  ACHILLE BLA BLA BLA1                                  blabla   191   498
option19971675184  ACHILLE BLA BLA BLA1               randomblablalba   blabla   521   498
option19971675185  ACHILLE BLA BLA BLA1                                  blabla   919   498
option19971675186  ACHILLEBLABLABLA134234531          randomblablalba    blabla    10    498
option19971675187  ACHILLEBLABLABLA134234531 7 65                        blabla     0 176498
option19971675188  ACHILLE BLA BLA BLA1342 90345 31                      blabla      1764980
option19971675189  ACHILLEBLABLABLA13423N09487OP531   randomblablalba     blabla     1764980
option19971675190  ACHILLE BLA BLA BLA 134 23N 094 87  OP53                blabla     0     0

in lambda I have: (I've also added a layer so that aws lambda can read pandas. I've tested this with dummy data and its working :) )

import json
import boto3
import pandas as pd
import io

def lambda_handler(event,context):
     print(event)
     bucket = event['Records][0]['s3']['bucket']['name']
    key= event['Records][0]['s3']['object']['key']
     response = s3_client.get_object(Bucket=bucket,Key=key)
     data = response['Body'].read().decode('utf-8')
    buf = io.STringIO(data
    fileRow = buf.readline()
#continue python script to extract the data

Id like my data to be in a json format.

How long does it take to convert the file? It sounds like this would be a good use case for Lambda as long as this isn't a giant file. — stdunbar
– stdunbar, Commented Oct 12, 2022 at 18:39
Ya it doesnt take too long, maybe 2-3 min max. There are many rows and the file size is 250+ mb — aero8991
– aero8991, Commented Oct 12, 2022 at 18:49
Then I'd continue down the Lambda route. You'll need to make sure you've allocated enough memory for the Lambda but it sounds like it should work fine. You need to also increase the maximum run time as the default is I believe 3 seconds. You can run a Lambda up to 15 minutes. — stdunbar
– stdunbar, Commented Oct 12, 2022 at 19:39

Steve G · Accepted Answer · 2022-10-12 20:25:10Z

1

The Python script as it currently stands could be used as the target for a S3 put event notification (i.e. when an object is written to a S3 bucket it sends data to the target Lambda function that corresponds to the object).

The issue is that I can't see anything in the script that is set up to write the output back to S3, or anywhere else.

I would recommend using the local storage within the Lambda function (e.g. /tmp ) to write out the file and then use another boto3 call to copy or move it to a destination bucket/key that you define.

answered Oct 12, 2022 at 20:25

Steve G

1416 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

aws lambda w/ messy file in python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related