0

I'm new to AWS and lambda and would like to trigger a messy file to be converted to a json format using python pandas. The file has no filetype but it can be read in notepad. I have the python file and its working properly but now Im struggling on how to get it incorporated into AWS. Do I transform the file within lambda itself or do I use another service?

I'd like to be able to run my python script and clean up the data within lambda or anther service (whichever is easiest)

Here's a copy of what the file looks like: (colunmns are all over the place and it has no headers)

option19971675181  ACHILLE BLA BLA BLA1         randomblablalba           blabla    88   498
option19971675182  ACHILLE BLA BLA BLA  1                                blabla       176498
option19971675183  ACHILLE BLA BLA BLA1                                  blabla   191   498
option19971675184  ACHILLE BLA BLA BLA1               randomblablalba   blabla   521   498
option19971675185  ACHILLE BLA BLA BLA1                                  blabla   919   498
option19971675186  ACHILLEBLABLABLA134234531          randomblablalba    blabla    10    498
option19971675187  ACHILLEBLABLABLA134234531 7 65                        blabla     0 176498
option19971675188  ACHILLE BLA BLA BLA1342 90345 31                      blabla      1764980
option19971675189  ACHILLEBLABLABLA13423N09487OP531   randomblablalba     blabla     1764980
option19971675190  ACHILLE BLA BLA BLA 134 23N 094 87  OP53                blabla     0     0

in lambda I have: (I've also added a layer so that aws lambda can read pandas. I've tested this with dummy data and its working :) )

import json
import boto3
import pandas as pd
import io

def lambda_handler(event,context):
     print(event)
     bucket = event['Records][0]['s3']['bucket']['name']
    key= event['Records][0]['s3']['object']['key']
     response = s3_client.get_object(Bucket=bucket,Key=key)
     data = response['Body'].read().decode('utf-8')
    buf = io.STringIO(data
    fileRow = buf.readline()
#continue python script to extract the data

Id like my data to be in a json format.

3
  • 1
    How long does it take to convert the file? It sounds like this would be a good use case for Lambda as long as this isn't a giant file. Commented Oct 12, 2022 at 18:39
  • Ya it doesnt take too long, maybe 2-3 min max. There are many rows and the file size is 250+ mb Commented Oct 12, 2022 at 18:49
  • 1
    Then I'd continue down the Lambda route. You'll need to make sure you've allocated enough memory for the Lambda but it sounds like it should work fine. You need to also increase the maximum run time as the default is I believe 3 seconds. You can run a Lambda up to 15 minutes. Commented Oct 12, 2022 at 19:39

1 Answer 1

1

The Python script as it currently stands could be used as the target for a S3 put event notification (i.e. when an object is written to a S3 bucket it sends data to the target Lambda function that corresponds to the object).

The issue is that I can't see anything in the script that is set up to write the output back to S3, or anywhere else.

I would recommend using the local storage within the Lambda function (e.g. /tmp ) to write out the file and then use another boto3 call to copy or move it to a destination bucket/key that you define.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.