How to read the headers of a csv file using csv module in "rb" mode?

Question

I am currently reading the csv file in "rb" mode and uploading the file to an s3 bucket.

with open(csv_file, 'rb') as DATA:
    s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)

All of this is working fine but now I have to validate the headers in the csv file before making the put call.

When I try to run below, I get an error.

with open(csv_file, 'rb') as DATA:
       csvreader = csv.reader(file)
       columns = next(csvreader)
       # run-some-validations
       s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)

This throws

_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

As a workaround, I have created a new function which opens the file in "r" mode and does validation on the csv headers and this works ok.

def check_csv_headers():
    with open(csv_file, 'r') as file:
        csvreader = csv.reader(file)
        columns = next(csvreader)

I do not want to read the same file twice. Once for header validation and once for uploading to s3. The upload part also doesn't work if I do it in "r" mode.

Is there a way I can achieve this while reading the file only once in "rb" mode ? I have to make this work using the csv module and not the pandas library.

Why not read twice? The code you want to use moves the stream's position after the headers. If you want to upload them as well you'd have to go back to the start of the file. — Panagiotis Kanavos
– Panagiotis Kanavos, Commented May 16, 2022 at 7:45
You cannot use a csv.reader objet with a file opened in binary mode. It requires strings, not bytes, so you must use text mode. Why does it really matter if you read the header twice? — juanpa.arrivillaga
– juanpa.arrivillaga, Commented May 16, 2022 at 7:49
To do what you want you'd have to use bytes=DATA.readline() to read the first line as bytes, convert that to a string with l=bytes.decode() and then put that into a list list=list[line], parse it with csvreader=csv.reader(list), and finally go back to the start with DATA.seek(0). You'll have to do the same thing you would with open(,'r') with a lot more code — Panagiotis Kanavos
– Panagiotis Kanavos, Commented May 16, 2022 at 7:57

Panagiotis Kanavos · Accepted Answer · 2022-05-16 08:21:30Z

Doing what you want is possible but not very efficient. Simply opening a file isn't that expensive. The CSV reader only reads only line at a time, not the entire file.

To do what you want you have to :

Read the first line as bytes
Decode it into a string (using the correct encoding)
Convert it to a list of strings
Parse it with csv.reader and finally
Seek to the start of the stream.

Otherwise you'll end up uploading only the data without the headers :

with open(csv_file, 'rb') as DATA:
   header=file.readline()
   lines=[header.decode()]
   csvreader = csv.reader(lines)
   columns = next(csvreader)
   // run-some-validations
   DATA.seek(0)

   s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)

Opening the file as text is not only simpler, it allows you to separate the validation logic from the upload code.

To ensure only one line is read at a time you can use buffering=1

def check_csv_headers():
    with open(csv_file, 'r', buffering=1) as file:
        csvreader = csv.reader(file)
        columns = next(csvreader)
        // run-some-validations

    with open(csv_data, 'rb') as DATA:
        s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)

Or

def check_csv_headers():
    with open(csv_file, 'r', buffering=1) as file:
        csvreader = csv.reader(file)
        columns = next(csvreader)
        // run-some-validations
        //If successful
        return True

def upload_csv(filePath):
    if check_csv_headers(filePath) :    
        with open(csv_data, 'rb') as DATA:
            s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)

Collectives™ on Stack Overflow

How to read the headers of a csv file using csv module in "rb" mode?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related