Push R code to PL/R code in postgresql database

Question

I am trying to convert a successful R code using Rpostgresql into a PL/R code in order to avoid pushing/pulling data in and out of the postgreql database.

Code is a dcast on a data.table:

#libs
library(RPostgreSQL);
library(data.table);

# connect
drv <- dbDriver("PostgreSQL");
con <- dbConnect(drv, dbname="postgres", user="postgres");

# load
cli_ranges <- dbGetQuery(con, "SELECT custid, prod_ranges, is_cli from cli_ranges;")

# DT 
setDT(cli_ranges )
setkeyv(cli_ranges , c("prod_ranges"))

# pivot
cli_ranges.pivoted <- dcast(cli_ranges, custid ~ paste0("is_cli_", prod_ranges), fun=sum, value.var = "is_cli")

# send back to DB
dbWriteTable(con, "cli_ranges_pivoted", cli_ranges.pivoted, row.names=F)

Code in R is working fine & fast.

I am now trying to push the code in a PL/R function,

CREATE OR REPLACE FUNCTION public.pivot()
  RETURNS void AS
$BODY$

[copy/paste R code]

$BODY$
  LANGUAGE plr;

... but the last line of the R code (dbWriteTable) throws :

ERROR:  R interpreter expression evaluation error
DETAIL:  Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function 'dbWriteTable' for signature '"logical", "character", "data.frame"'
CONTEXT:  In PL/R function pivot

Changing the data.table to a dataframe (as.data.frame(cli_ranges.pivoted)) is not working either.

One trick could be to return the data.table/frame in order to perform a CREATE TABLE cli_ranges_pivoted AS SELECT pivot(); but I don't really know how to push the data.frame as an output...

cli_ranges table :

custid  prod_ranges is_cli
1       A           1
1       B           1
1       C           0
2       A           1
2       B           0
2       C           1
3       A           0
3       B           1
3       C           0
4       A           1
...     ...         ...

After dcast (i.e. pivoting) datafram is as follow:

custid  prod_ranges_A   prod_ranges_B   prod_ranges_C
1       1               1               0
2       1               0               1
3       0               1               0
4       1               ...
...

Number of distinct values in prod_ranges changes often, so I can define in advance the number of columns after pivoting.

Env : Postgresql 9.5, R 3.3, PL/R 08.03.00.16, Win 10 64bits

Community · Accepted Answer · 2017-05-23 12:09:08Z

2

You could start with http://gpdb.docs.pivotal.io/4330/ref_guide/pl_r.html Example 2 and 3.

Or, You could try modifying the unnest function like Lukas Eklund and Erwind Brandstetter did in this post (I am using Lukas' solution): Unnest array by one level.

CREATE OR REPLACE FUNCTION unnest_multidim(anyarray)
RETURNS SETOF anyarray AS
$BODY$
  SELECT array_agg($1[series2.i][series2.x]) FROM
    (SELECT generate_series(array_lower($1,2),array_upper($1,2)) as x, series1.i
     FROM 
     (SELECT generate_series(array_lower($1,1),array_upper($1,1)) as i) series1 
    ) series2
GROUP BY series2.i
$BODY$
LANGUAGE sql IMMUTABLE;

and then you could try returning an array, and do something like this:

CREATE OR REPLACE FUNCTION r_norm(n integer, mean float8, std_dev float8)
 RETURNS float8[] 
AS $$
 x<-rnorm(n,mean,std_dev); 
 y<-rnorm(n,mean,std_dev);
 final<-cbind(as.data.frame(x), as.data.frame(y));
return(final) 

$$ LANGUAGE 'plr';

CREATE TABLE test_norm_var AS SELECT R_output[1] as col1, R_output[2] as col2 FROM unnest_multidim(r_norm(10,0,1)) R_output;

SELECT col1 FROM test_norm_var;

EDIT

I could not get dbWriteTable to work the way it was intended as a PL/R function... BUT, You could also try this method

CREATE OR REPLACE FUNCTION pivot() 
  RETURNS VOID as $$

library(RPostgreSQL);
library(data.table);

drv <- dbDriver("PostgreSQL");
con <- dbConnect(drv, dbname ="postgres");

fields <- list(custid = "numeric",prod_ranges = "varchar(128)", is_cli = "numeric")

custid <- c(1,1,1,2,2,2)
prod_ranges <- c("A","B","C","A","B","C")
is_cli <- c(1,1,0,1,0,1)

cli_ranges <- data.frame(custid,prod_ranges,is_cli, stringsAsFactors = default.stringsAsFactors())

setDT(cli_ranges )
setkeyv(cli_ranges , c("prod_ranges"))

cli_ranges.pivoted <- as.data.frame(dcast(cli_ranges, custid ~ paste0("is_cli_", prod_ranges), fun=sum, value.var = "is_cli"))

create_query <- paste0("CREATE TABLE cli_ranges (",paste0(colnames(cli_ranges.pivoted), collapse = " numeric, "),
" numeric) DISTRIBUTED BY (",colnames(cli_ranges)[1],")")

dbGetQuery(con, create_query);

values_string <- "("
for ( i in 1:dim(cli_ranges.pivoted)[1]){
 for ( j in 1:dim(cli_ranges.pivoted)[2] ){
    if ( j != dim(cli_ranges.pivoted)[2]) { 
        values_string <- paste0(values_string,cli_ranges.pivoted[i,j],",")
    } else { 
        values_string <- paste0(values_string,cli_ranges.pivoted[i,j])
    }
  } 
    if ( i != dim(cli_ranges.pivoted)[1] ){
        values_string <- paste0(values_string,"),(")
    } else {
        values_string <- paste0(values_string,")")
    }
}

insert_query <- paste0("INSERT INTO cli_ranges (",paste0(colnames(cli_ranges.pivoted), collapse = ", "),
") VALUES ", values_string )

dbGetQuery(con, insert_query);

$$ LANGUAGE plr;

edited May 23, 2017 at 12:09

CommunityBot

11 silver badge

answered Nov 9, 2016 at 20:24

DDrake

3181 silver badge9 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

ant1j Over a year ago

The problem is that the number of prod_ranges changes over time, i.e. I cannot define how much columns there will be resulting by the dcast (that transpose my table over the prod_ranges values)

DDrake Over a year ago

Could you provide an example of your input to dcast or output (fill in fake numbers, etc.) ?

ant1j Over a year ago

illustration provided in the original question

ant1j Over a year ago

DISTRIBUTED BY is more for Pivotal/Greenplum/Redshift than postgresql, but thanks I got the idea!

ant1j Over a year ago

Thanks for all the code, but I was actually looking for a more "direct" solution. I'd better use crosstab then.

Collectives™ on Stack Overflow

Push R code to PL/R code in postgresql database

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related