0

I am new to the cloud and to data engineering as well.

I have a large csv file stored in a GCS bucket. I would like to write a python script to bulk-insert the data into a postgresql database on my local machine using a COPY statement. I cannot figure out the authentication though.

I would like to do something like this:

import psycopg2

conn = psycopg2.connect(database=database,
                        user=user,
                        password=password,
                        host=host,
                        port=port)

cursor = conn.cursor()
file = 'https://storage.cloud.google.com/<my_project>/<my_file.csv>'
sql_query = f"COPY <MY_TABLE> FROM {file} WITH CSV"
cursor.execute(sql_query)
conn.commit()
conn.close()

I get this error message:

psycopg2.errors.UndefinedFile: could not open file "https://storage.cloud.google.com/<my_project>/<my_file.csv>" for reading: No such file or directory HINT: COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as psql's \copy.

The same happens when I run the query in psql.

I assume the problem is in authentication. I have set up Application Default Credentials with Google Cloud CLI and when acting like the authenticated user, I can easily download the file using wget. When I switch to postgres user, I get "access denied" error.

The ADC seem to work only with client libraries and command-line tools.

I use Ubuntu 22.04.1 LTS.

Thanks for any help.

1 Answer 1

2

This is not going to work for you. The file will need to be in a location permitted to the server process and also not fetched over http (it's a local file path it is expecting).

You can supply a program/script that will fetch the file for you and print it to STDOUT which the server can consume.

Or - do what the error message suggests and handle it locally with psycopg's copy support.

Sign up to request clarification or add additional context in comments.

1 Comment

try to read the file with pandas.read_csv, if pandas can read it, then it's not an authentication issue. I currently have the same issue. I can read the file in pandas, but i can't use open/copy.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.