1

I have a created a table "SQL_table" in SQL and copied contents from various existing .csv files to "SQL_table" in R using the code given below.

Code:

library(RPostgres)
library(DBI)

#Step 2: make a dataframe of all files
file_names1 <- dir("D:/Data/", full.names = TRUE, recursive = T) #where you have your files
my_data_frame <- do.call(rbind,lapply(file_names1,read.csv))

#Step 3: Establish R & PostgreSQL Connection using RPostgres
dsn_database = "...."   # Specify the name of your Database
dsn_hostname = "localhost"  # Specify host name
dsn_port = "...."                # Specify your port number. e.g. 98939
dsn_uid = "...."         # Specify your username. e.g. "admin"
dsn_pwd = "...."         # Specify your password. e.g. "xxx"

tryCatch({
  drv <- dbDriver("Postgres")
  print("Connecting to Database…")
  connec <- dbConnect(drv, 
                      dbname = dsn_database,
                      host = dsn_hostname, 
                      port = dsn_port,
                      user = dsn_uid, 
                      password = dsn_pwd)
  print("Database Connected!")
},
error=function(cond) {
  print("Unable to connect to Database.")
})

#Step 4: Run using RPostgres
dbWriteTable(connec, "SQL_table", my_data_frame, create = TRUE)
dbAppendTable(connec, "SQL_table", my_data_frame, create = TRUE)
dbReadTable(connec, "SQL_table")

The structure of my SQL_table is as follows:

Y_m_d_Time  Location_ID  Popul  Literacy  Mortality  
2001-01-01 00:00:00  NAM  5000  77  8.8
2002-01-01 00:00:00  NAM  8700  58  7.7
2003-01-01 00:00:00  NAM  3410  98  9.8
..
..
1990-01-01 00:00:00  KEN  2000  87  9.8
1991-01-01 00:00:00  KEN  3200  88  9.7
1991-01-01 00:00:00  KEN  4910  78  8.8
..
..
2010-01-01 00:00:00  PUB  6000  97  9.8
2011-01-01 00:00:00  PUB  4200  89  9.7
2012-01-01 00:00:00  PUB  5910  88  8.8
..
..
..

I have another .csv file "CSV_1" and I want to copy few columns (Latitude, Longitude, Altitude, Start_Date, End_Date) from "CSV_1" to "SQL_table" using R. The common column between both the tables is Location_ID in "SQL_table" and City in "CSV_1".

The structure of CSV file is as follows:

City  Latitude  Longitude  Altitude  Start_Date  End_Date  No. of Events  Event_Type
NAM  35  79  218  3/1/2001    10  Flood
KEN  30  81  129  2/1/1990  5/31/1999  5  Earthquake
PUB  22  76  220  1/1/2010    11  Landslide
..
..

My desired output is as follows:

Y_m_d_Time  Location_ID  Popul  Literacy  Mortality  Latitude  Longitude  Altitude  Start_Date  End_Date
2001-01-01 00:00:00  NAM  5000  77  8.8  35  79  218  3/1/2001
2002-01-01 00:00:00  NAM  8700  58  7.7  35  79  218  3/1/2001
2003-01-01 00:00:00  NAM  3410  98  9.8  35  79  218  3/1/2001
..
..
1990-01-01 00:00:00  KEN  2000  87  9.8  30  81  129  2/1/1990  5/31/1999
1991-01-01 00:00:00  KEN  3200  88  9.7  30  81  129  2/1/1990  5/31/1999
1991-01-01 00:00:00  KEN  4910  78  8.8  30  81  129  2/1/1990  5/31/1999
..
..
2010-01-01 00:00:00  PUB  6000  97  9.8  22  76  220  1/1/2010
2011-01-01 00:00:00  PUB  4200  89  9.7  22  76  220  1/1/2010
2012-01-01 00:00:00  PUB  5910  88  8.8  22  76  220  1/1/2010
..
..
..

Could anyone please help me how I can extend my code in R to obtain the desired results.

1

1 Answer 1

1

Suppose your variable Location_ID is origin from the flights table and City is faa from the airports table. If you agreed to use {dplyr}, you could simply do a left_join().

con <- DBI::dbConnect(duckdb::duckdb())

dbplyr::copy_nycflights13(con)
#> Creating table: airlines
#> Creating table: airports
#> Creating table: flights
#> Creating table: planes
#> Creating table: weather

flights <- dplyr::tbl(con, "flights")
airports <- dplyr::tbl(con, "airports")

flights |>
  dplyr::left_join(airports, by = c("origin" = "faa"))
#> # Source:   SQL [?? x 26]
#> # Database: DuckDB 0.8.1 [root@Darwin 22.5.0:R 4.3.1/:memory:]
#>     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#>    <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
#>  1  2013     2    14     1045           1056       -11     1204           1230
#>  2  2013     2    14     1048           1049        -1     1412           1400
#>  3  2013     2    14     1048           1040         8     1333           1337
#>  4  2013     2    14     1051           1100        -9     1424           1424
#>  5  2013     2    14     1051           1100        -9     1203           1214
#>  6  2013     2    14     1057           1100        -3     1408           1420
#>  7  2013     2    14     1057           1100        -3     1244           1300
#>  8  2013     2    14     1057           1059        -2     1353           1441
#>  9  2013     2    14     1058           1056         2     1211           1220
#> 10  2013     2    14     1058           1100        -2     1337           1338
#> # ℹ more rows
#> # ℹ 18 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#> #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#> #   hour <dbl>, minute <dbl>, time_hour <dttm>, name <chr>, lat <dbl>,
#> #   lon <dbl>, alt <dbl>, tz <dbl>, dst <chr>, tzone <chr>

DBI::dbDisconnect(con, shutdown = TRUE)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.