1

I have an existing table (NameList) in to which I would like to load the contents of multiple csv files (fileA.csv, fileB.csv ...). The columns of the table are identical to those of the csv except that I want to record for each row the id of the csv file it came from. The id would be taken from another table which contains the properties of each file.

The table with the list of files would look like this:

CREATE TABLE files
(
id serial,
fileName varchar(128),
path varchar(256),
PRIMARY KEY (id)
)

The table to insert the csv contents in to would look like:

CREATE TABLE NameList
(
FirstName varchar(40),
LastName varchar(40),
SourceFile_ID int,
FOREIGN KEY (SourceFile_ID) REFERENCES files(id)
)

The csv files would look as follows:

Name of file:
fileA.csv

Contents:
FirstName,LastName
John,Smith
.
.
.

The only thing relating to this I could find so far is this: Add extra column while importing csv data in table in SQL server table However they suggest to use a default value on the additional column which would not solve my problem since I need to have a different value for each file I add.

8
  • 1
    You can import the csv and then update the id column Commented Apr 22, 2021 at 12:25
  • 1
    Hi @Marko how can I update only the rows that I just imported without touching the ones that are already in the table? Commented Apr 22, 2021 at 12:26
  • 1
    You only need to update those records that have the default value... Commented Apr 22, 2021 at 12:27
  • 1
    @Luuk Interesting, would this be considered good practice though? Would this not create a problem if for instance 2 csv's get added at the same time. Both would have default value associated with them so when I update based on default value a mistake would happen. Also this would involve an extra lookup on every insert which would probably not be so efficient on this table as it's very large. Commented Apr 22, 2021 at 12:33
  • 1
    @sev you could insert the data into a temporary table (postgresqltutorial.com/postgresql-temporary-table), update the column, then move the data to the main table. This would avoid problems with 2 CSVs being loaded at once, because they'd be using different temp tables (as long as 2 different db sessions are used for the inserts). Even if only one connection is used, you could have different names for the temp table for different CSVs. Commented Apr 22, 2021 at 12:48

1 Answer 1

2

You could insert the data into a temporary table (https://www.postgresqltutorial.com/postgresql-temporary-table), update the column, then move the data to the main table.

This would avoid problems with 2 CSVs being loaded at once, because they'd be using different temp tables (as long as 2 different db sessions are used for the inserts). Even if only one session is used, you could have different names for the temp table for different CSVs.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.