0

I am currently reading the csv file in "rb" mode and uploading the file to an s3 bucket.

with open(csv_file, 'rb') as DATA:
    s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)

All of this is working fine but now I have to validate the headers in the csv file before making the put call.

When I try to run below, I get an error.

with open(csv_file, 'rb') as DATA:
       csvreader = csv.reader(file)
       columns = next(csvreader)
       # run-some-validations
       s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)

This throws

_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

As a workaround, I have created a new function which opens the file in "r" mode and does validation on the csv headers and this works ok.

def check_csv_headers():
    with open(csv_file, 'r') as file:
        csvreader = csv.reader(file)
        columns = next(csvreader)

I do not want to read the same file twice. Once for header validation and once for uploading to s3. The upload part also doesn't work if I do it in "r" mode.

Is there a way I can achieve this while reading the file only once in "rb" mode ? I have to make this work using the csv module and not the pandas library.

3
  • Why not read twice? The code you want to use moves the stream's position after the headers. If you want to upload them as well you'd have to go back to the start of the file. Commented May 16, 2022 at 7:45
  • You cannot use a csv.reader objet with a file opened in binary mode. It requires strings, not bytes, so you must use text mode. Why does it really matter if you read the header twice? Commented May 16, 2022 at 7:49
  • 1
    To do what you want you'd have to use bytes=DATA.readline() to read the first line as bytes, convert that to a string with l=bytes.decode() and then put that into a list list=list[line], parse it with csvreader=csv.reader(list), and finally go back to the start with DATA.seek(0). You'll have to do the same thing you would with open(,'r') with a lot more code Commented May 16, 2022 at 7:57

1 Answer 1

1

Doing what you want is possible but not very efficient. Simply opening a file isn't that expensive. The CSV reader only reads only line at a time, not the entire file.

To do what you want you have to :

  1. Read the first line as bytes
  2. Decode it into a string (using the correct encoding)
  3. Convert it to a list of strings
  4. Parse it with csv.reader and finally
  5. Seek to the start of the stream.

Otherwise you'll end up uploading only the data without the headers :

with open(csv_file, 'rb') as DATA:
   header=file.readline()
   lines=[header.decode()]
   csvreader = csv.reader(lines)
   columns = next(csvreader)
   // run-some-validations
   DATA.seek(0)

   s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)

Opening the file as text is not only simpler, it allows you to separate the validation logic from the upload code.

To ensure only one line is read at a time you can use buffering=1

def check_csv_headers():
    with open(csv_file, 'r', buffering=1) as file:
        csvreader = csv.reader(file)
        columns = next(csvreader)
        // run-some-validations

    with open(csv_data, 'rb') as DATA:
        s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)

Or

def check_csv_headers():
    with open(csv_file, 'r', buffering=1) as file:
        csvreader = csv.reader(file)
        columns = next(csvreader)
        // run-some-validations
        //If successful
        return True

def upload_csv(filePath):
    if check_csv_headers(filePath) :    
        with open(csv_data, 'rb') as DATA:
            s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.