Adding column while loading csv in SQL

Question

I have an existing table (NameList) in to which I would like to load the contents of multiple csv files (fileA.csv, fileB.csv ...). The columns of the table are identical to those of the csv except that I want to record for each row the id of the csv file it came from. The id would be taken from another table which contains the properties of each file.

The table with the list of files would look like this:

CREATE TABLE files
(
id serial,
fileName varchar(128),
path varchar(256),
PRIMARY KEY (id)
)

The table to insert the csv contents in to would look like:

CREATE TABLE NameList
(
FirstName varchar(40),
LastName varchar(40),
SourceFile_ID int,
FOREIGN KEY (SourceFile_ID) REFERENCES files(id)
)

The csv files would look as follows:

Name of file:
fileA.csv

Contents:
FirstName,LastName
John,Smith
.
.
.

The only thing relating to this I could find so far is this: Add extra column while importing csv data in table in SQL server table However they suggest to use a default value on the additional column which would not solve my problem since I need to have a different value for each file I add.

Hi @Marko how can I update only the rows that I just imported without touching the ones that are already in the table? — sev
– sev, Commented Apr 22, 2021 at 12:26
You only need to update those records that have the default value... — Luuk
– Luuk, Commented Apr 22, 2021 at 12:27
@Luuk Interesting, would this be considered good practice though? Would this not create a problem if for instance 2 csv's get added at the same time. Both would have default value associated with them so when I update based on default value a mistake would happen. Also this would involve an extra lookup on every insert which would probably not be so efficient on this table as it's very large. — sev
– sev, Commented Apr 22, 2021 at 12:33
@sev you could insert the data into a temporary table (postgresqltutorial.com/postgresql-temporary-table), update the column, then move the data to the main table. This would avoid problems with 2 CSVs being loaded at once, because they'd be using different temp tables (as long as 2 different db sessions are used for the inserts). Even if only one connection is used, you could have different names for the temp table for different CSVs. — Marko
– Marko, Commented Apr 22, 2021 at 12:48

Marko · Accepted Answer · 2021-04-22 13:57:23Z

2

You could insert the data into a temporary table (https://www.postgresqltutorial.com/postgresql-temporary-table), update the column, then move the data to the main table.

This would avoid problems with 2 CSVs being loaded at once, because they'd be using different temp tables (as long as 2 different db sessions are used for the inserts). Even if only one session is used, you could have different names for the temp table for different CSVs.

answered Apr 22, 2021 at 13:57

Marko

1,00012 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Adding column while loading csv in SQL

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related