4

So I need to do Principle Component Regression with cross validation and I could not find a package in Python that would do so. I wrote my own PCR class but when tested against R's pls package it performs significantly worse and is much slower on high dimensional data (~50000 features) which I am still not sure why but that is another question. Because all of my other code is in python, and in the interest of saving time I decided the best way might just be able to write an R function that utilizes the PLS package in R. Here is the function:

R_pls <-function(X_train,y_train,X_test){
  library(pls)
  X<-as.matrix(X_train)
  y<-as.matrix(y_train)
  tdata<-data.frame(y,X=I(X))
  REGmodel <- pcr(y~X,scale=FALSE,data=tdata,validation="CV")
  B<-RMSEP(REGmodel)
  C<-B[[1]]
  q<-length(C)
  degs<-c(1:q)
  allvals<-C[degs%%2==0]
  allvals<-allvals[-1]
  comps<-which.min(allvals)
  xt<-as.matrix(X_test)
  ndata<-data.frame(X=I(xt))
  ypred_test<-as.data.frame(predict(REGmodel,ncomp=comps,newdata=ndata,se.fit=TRUE))
  ntdata<-data.frame(X=I(X))
  ypred_train<-as.data.frame(predict(REGmodel,ncomp=comps,newdata=ntdata,se.fit=TRUE))
  data_out=list(ypred_test=ypred_test,ypred_train=ypred_train)
  return(data_)
}

So I have found a good amount of information on how to access R built in functions but cannot really find anything for this situation. So I tied the following:

import rpy2.robjects as ro
prs=ro('R_pls')

where R_pls is the R function above. This produces

TypeError: 'module' object is not callable.

Any idea how I might get this to work or I am open to suggestions if there might be a better method.

Thanks

3
  • I'm fairly certain that there is partial least squares regression and PCA decomposition in sklearn. Have you tried looking there to see if it has what you need? I realize it's not a direct answer to the title, but it might help. Commented Apr 17, 2017 at 15:58
  • So I did use sklearn.decomposition PCA along with sklearn Linear Regression to build a PCR class however this does not perform equal to R which I am not sure why? Commented Apr 17, 2017 at 16:09
  • While it doesn't do pcr by itself, you could always calculate the components and then do an lm/glm model with h2o, which has both python and R interfaces. docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/pca.html Commented Apr 17, 2017 at 16:47

1 Answer 1

5

Consider importing the abitrary R user-defined function as a package with rpy2's SignatureTranslatedAnonymousPackage (STAP):

from rpy2.robjects.numpy2ri import numpy2ri, pandas2ri
from rpy2.robjects.packages import STAP
# for rpy2 < 2.6.1
# from rpy2.robjects.packages import SignatureTranslatedAnonymousPackage as STAP    

r_fct_string = """    
R_pls <- function(X_train, y_train, X_test){
  library(pls)

  X <- as.matrix(X_train)
  y <- as.matrix(y_train)
  xt <- as.matrix(X_test)

  tdata <- data.frame(y,X=I(X))
  REGmodel <- pls::pcr(y~X,scale=FALSE,data=tdata,validation="CV")
  B <- RMSEP(REGmodel)
  C <- B[[1]]
  q <- length(C)
  degs <- c(1:q)
  allvals <- C[degs%%2==0]
  allvals <- allvals[-1]
  comps <- which.min(allvals)
  ndata <- data.frame(X=I(xt))

  ypred_test <- as.data.frame(predict(REGmodel,ncomp=comps,newdata=ndata,se.fit=TRUE))
  ntdata <- data.frame(X=I(X))
  ypred_train <- as.data.frame(predict(REGmodel,ncomp=comps,newdata=ntdata,se.fit=TRUE))
  data_out <- list(ypred_test=ypred_test, ypred_train=ypred_train)

  return(data_out)
}
"""

r_pkg = STAP(r_fct_string, "r_pkg")

# CONVERT PYTHON NUMPY MATRICES TO R OBJECTS
r_X_train, r_y_train, r_X_test = map(numpy2ri, py_X_train, py_y_train, py_X_test)

# PASS R OBJECTS INTO FUNCTION (WILL NEED TO EXTRACT DFs FROM RESULT)
p_res = r_pkg.R_pls(r_X_train, r_y_train, r_X_test)

Alternatively, you can source the function as @agstudy shows here if function is saved in a separate .R script then call it like any Python function.

import rpy2.robjects as ro
robjects.r('''source('my_R_pls_func.r')''')

r_pls = ro.globalenv['R_pls']

# CONVERT PYTHON NUMPY MATRICES TO R OBJECTS
r_X_train, r_y_train, r_X_test = map(numpy2ri, py_X_train, py_y_train, py_X_test)

# PASS R OBJECTS INTO FUNCTION (WILL NEED TO EXTRACT DFs FROM RESULT)
p_res = r_pls(r_X_train, r_y_train, r_X_test)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.