1

I have a table containing paths of csv files as well as a boolean indicated if the file has already been loaded to the database. I would like to load each of those csv's and if successful switch the boolean indicator "loaded" of the corresponding entry to TRUE in the file list.

The table containing the fienames looks as follows:

CREATE TABLE files
(
id serial,
path varchar(256),
loaded bool DEFAULT FALSE,
PRIMARY KEY (id)
)

It's really important that the loaded parameter in the files table only gets switched to TRUE if the csv has indeed been loaded. There are some mis-formed csv files which I expect to throw errors.

2
  • how do you load your csv into database? also serial is an old implementation of identity column , use new modern generated always as identity Commented Apr 22, 2021 at 14:49
  • @eshirvana I was planning on using COPY but am also looking in to pg_bulkload. I am very new to SQL and Postgres so I'm not sure what the best way to do it is. Whatever solution I choose should be fast as I will be loading a lot of data with this regularly. Thank you for the tip on serial being outdated. Commented Apr 22, 2021 at 14:56

2 Answers 2

2

I would make a function and using a cursor , go through the files and update them something like this:

I'm writing you psql psudocode to get the idea:

create or replace function load_csvs()
 returns nothing 
declare file_record record;      
        curs CURSOR
        FOR SELECT id ,path 
        FROM files WHERE loaded = false;
begin
   open cusr
   loop
       fetch curs into files_record;
        exit when not found;
        begin; 
            execute FORMAT('COPY tablename (<list of columns>)
                            FROM ''%s''
                            DELIMITER '',''
                            CSV HEADER;',files_record.path);
            UPDATE files
            SET loaded = true  
            where id = files_record.id

            commit;

        exception when '<whatever error>' then ROLLBACK
        end;
    end loop
close curs
end

explanation: a cursor go through files table and fetch the ones that are not imported yet , inside cursor loop you prepare the copy command using dynamic sql , each time with filepath fetched from table , if successfully imported , it flags the record in files table as loaded , if not, exception catches it and it rollback changes (you can change it to whatever logic you want) and it goes to the next file, until no record in the files left.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your answer. I have some follow up questions: 1) I am researching cursors and it seems they are not super efficient but necessary sometimes. Is this one of these times or could this be accomplished without a cursor? 2) Is the execute Format ... statement atomic / will either all of the rows in the csv be parsed or none? 3) Is there not an end statement missing after `where id = files_record.id'?
1) this is exactly one of the use cases for cursor , 2) I f copy command fails in middle , you already see some rows in the table , you can use rollback instead of do nothing in case of exception ( see updated answer) 3) it might , this is psudocode , It might have some syntax error.
0

You can use the below code assuming that you are able to load files and the path contains the loaded path location in the table.

UPDATE files
SET loaded = 'TRUE'  
WHERE LENGTH(path) > 1

1 Comment

I think that your answer assumes that the path is only entered in to the files database once the csv it points to has been loaded. Actually the path is already in the files table to begin with. (I have a different program which adds csv paths to the table at an earlier point which now need to be loaded)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.