11

I'm using SQLite, and I need to load hundreds of CSV files into one table. I didn't manage to find such a thing in the web. Is it possible?

Please note that in the beginning i used Oracle, but since Oracle have a 1000 columns limitation per table, and my CSV files have more than 1500 columns each one, i had to find another solution. I wan't to try SQLite, since i can install it fast and easily. These CSV files have been supplied with such as amount of columns and i can't change or split them (nevermind why).

Please advise.

6
  • 1
    Are all the CSVs going into the same table? If so, you can do cat *.csv > big.csv and just load big.csv. Commented Jan 17, 2015 at 22:05
  • Yes. Some of the files are bigger than 1GB. Merging so many huge files into one file will create an enormous file. I'm afraid it would be problematic somehow... Commented Jan 17, 2015 at 22:28
  • If your system can't handle a multi-GB CSV file, it is going to have trouble with a multi-GB database. Commented Jan 18, 2015 at 11:40
  • It is unclear, to me at least, whether your problem is that you don't know how to load a single CSV file into SQLite at all, or if the problem is that you don't know how to handle hundreds of files. Commented Jan 18, 2015 at 11:41
  • The problem is that i don't know how to handle hundreds of files. Commented Jan 18, 2015 at 15:20

4 Answers 4

16

I ran into a similar problem and the comments to your question actually gave me the answer that finally worked for me

Step 1: merge the multiple csv's into a single file. Exclude the header for most of them but write down the header from one of them in the beginning.

Step 2: Load the single merged csv into SQLite.

For step 1 I used:

$ head -1 one.csv > all_combined.csv
$ tail -n +2 -q *.csv >> all_combined.csv

The first command writes only the first line of the csv file (you can choose whichever one file), the second command writes the whole document starting from line 2 and therefore excluding the header. The -q option makes sure that tail never writes the file name as a header.

Make sure to put all_combined.csv in a separate folder, or in some distributions, it will be included recursively!

To load into SQLite (Step 2) the answer given by Hot Licks worked for me:

 sqlite> .mode csv
 sqlite> .import all_combined.csv my_new_table

This assumes that my_new_table hasn't been created. Alternatively you can create beforehand and then load, but in that case exclude the header from Step 1.

Sign up to request clarification or add additional context in comments.

2 Comments

I'm not sure exactly why but step 1 here made all_combined.csv grow recursively on my Ubuntu 20.04 until I run out of disk space. Using a different extension or storing it in a different folder solves the issue.
@FonsMA because the second command also reads the all_combined.csv in which writes again. To fix this, the best is to have a prefix on the files that you want to concatenate tail -n +2 -q some_*.csv >> all_combined.csv or write the file name as all_combined.commasv until done
4

I didn't find a nicer way to solve this so I used find along with xargs to avoid creating a huge intermediate .csv file:

find . -type f -name '*.csv' | xargs -I% sqlite3 database.db ".mode csv" ".import % new_table" ".exit"

find prints out the file names and the -I% parameter to xargs causes the command after it to be run once for each line, with % replaced by a name of a csv file.

Comments

2

http://www.sqlite.org/cli.html --

Use the ".import" command to import CSV (comma separated value) data into an SQLite table. The ".import" command takes two arguments which are the name of the disk file from which CSV data is to be read and the name of the SQLite table into which the CSV data is to be inserted.

Note that it is important to set the "mode" to "csv" before running the ".import" command. This is necessary to prevent the command-line shell from trying to interpret the input file text as some other format.

sqlite> .mode csv
sqlite> .import C:/work/somedata.csv tab1

There are two cases to consider: (1) Table "tab1" does not previously exist and (2) table "tab1" does already exist.

In the first case, when the table does not previously exist, the table is automatically created and the content of the first row of the input CSV file is used to determine the name of all the columns in the table. In other words, if the table does not previously exist, the first row of the CSV file is interpreted to be column names and the actual data starts on the second row of the CSV file.

For the second case, when the table already exists, every row of the CSV file, including the first row, is assumed to be actual content. If the CSV file contains an initial row of column labels, that row will be read as data and inserted into the table. To avoid this, make sure that table does not previously exist.


Note that you need to make sure that the files DO NOT have an initial line defining the field names. And, for "hundreds" of files you will probably want to prepare a script rather than typing in each file individually.

1 Comment

As can be read in the link in your answer, section named "Importing CSV files" (section 7.5 actually, maybe this has changed in time, or can do it): If the CSV file contains an initial row of column labels, you can cause the .import command to skip that initial row using the "--skip 1" option.
0

You can use DB Browser for SQLite to do this pretty easily. File > Import > Table from CSV file... and then select all the files to open them together into a single table.

I just tested this out with a dozen CSV files and got a single 1 GB table from them without any work. As long as they have the same schema, DB Browser is able to put them together. You'll want to keep the 'Column Names in first line' option checked.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.