Append two tables inside sqlite database using R

Question

I have two very large csv files that contain the same variables. I want to combine them into one table inside a sqlite database - if possible using R.

I successfully managed to put both csv files in separate tables inside one database using inborutils::csv_to_sqlite that one imports small chunks of data at a time.

Is there a way to create a third tables where both tables are simply appended using R (keeping in mind the limited RAM)? And if not - how else can I perform this task? Maybe via the terminal?

G. Grothendieck · Accepted Answer · 2022-01-27 11:41:04Z

2

We assume that when the question refers to the "same variables" that it means that the two tables have the same column names. Below we create two such test tables, BOD and BOD2, and then in the create statement we combine them creating table both. This does the combining entirely on the SQLite side. Finally we use look at both.

library(RSQLite)
con <- dbConnect(SQLite())  # modify to refer to existing SQLite database

dbWriteTable(con, "BOD", BOD)
dbWriteTable(con, "BOD2", 10 * BOD)

dbExecute(con, "create table both as select * from BOD union select * from BOD2")

dbReadTable(con, "both")

dbDisconnect(con)

edited Jan 27, 2022 at 11:41

answered Jan 27, 2022 at 11:21

G. Grothendieck

273k18 gold badges221 silver badges365 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

r-newbie Over a year ago

I can reproduce your example, however, when I run the code with my data it is already running for ~3 hours now. Of course this depends on structure and amount of data as well as on my notebook. But would you be able to take an educated guess how long this might take for two tables with around 20 million rows and 36 columns? Just to know, if I should expect it to take.. days? (I know, I specifically asked for a solution in R and if it takes a few hours I do not care since I only have to fulfill that task once. However, would there be other (faster) options as well?

G. Grothendieck Over a year ago

As stated in the answer this is entirely done on the SQLite side. The create statement does not go through R at all.

r-newbie Over a year ago

Ah, of course, my bad. Still, regarding the expected time - putting the csv files inside the database chunk by chunk took around 1 hour per table/csv file. Since I do not know what is going on on the SQLite side when I append both tables, could you maybe explain? Is the creation of the appended table so much more time consuming than the creation of the tables in the database from the csv files?

G. Grothendieck Over a year ago

You can try it with a different number of records timing each one and then plot the time vs number of records and see if you can determine the shape of the curve and extrapolate it. Since you have both as csv you could concatenate the csv files and then read it in. You could also try different databases.

r-newbie Over a year ago

I had to stop the process after ~10 hours but I found another workaround: by combining the two csv files via the terminal and and then putting them into the SQLite database via inborutils::csv_to_sqlite. I accept the answer though, since in the example you provided it works (and possibliy also for my data if I gave it more time).

|

Collectives™ on Stack Overflow

Append two tables inside sqlite database using R

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related