0

I have an R Function that Removes all html data from an html page. It works when I run it in R But when I Run it through Rserve it produces error :

Exception in thread "main" org.rosuda.REngine.Rserve.RserveException: eval failed, request status: R parser: syntax error

at org.rosuda.REngine.Rserve.RConnection.eval(RConnection.java:234)
at CereScope_Data.main(CereScope_Data.java:80)

Java Eval Where I get the error :

REXP lstrRemoveHtml = cobjConn.eval("RemoveHtml('" + lstrRawData + "')");

My R Function: rawdata is an HTML page

RemoveHtml <- function(rawdata) {
  
  library("tm")
  
  ## Convering Data To UTF-8 Format
  ## Creating Corpus
  Encoding(rawdata) <- "latin1"
  docs <- Corpus(VectorSource(iconv(rawdata, from = "latin1", to = "UTF-8", sub = "")))
  
  toSpace <- content_transformer(function(x , pattern) gsub(pattern, " ", x))
  
  docs <- gsub("[^\\b]*(<style).*?(</style>)", " ", docs)
  docs <- Corpus(VectorSource(gsub("[^\\b]*(<script).*?(</script>)", " ", docs)))
  docs <- tm_map(docs, toSpace, "<.*?>")
  docs <- tm_map(docs, toSpace, "(//).*?[^\n]*")
  docs <- tm_map(docs, toSpace, "/")
  docs <- tm_map(docs, toSpace, "\\\\t")
  docs <- tm_map(docs, toSpace, "\\\\n")
  docs <- tm_map(docs, toSpace, "\\\\")
  docs <- tm_map(docs, toSpace, "@")
  docs <- tm_map(docs, toSpace, "\\|")
  
  docs <- tm_map(docs, toSpace, "\\\"")
  docs <- tm_map(docs, toSpace, ",")
  RemoveHtmlDocs <- tm_map(docs, stripWhitespace)
  
  return(as.character(RemoveHtmlDocs)[1])
}

Update - Things I tried already

  1. Escaping characters which may cause problems such as Single and Double Quotes and Backslashes
  2. I also tried assigning whole data to an R variable through eval and then running the function

New Update - Question Solved

  1. Escaping characters were causing problems such as Single and Double Quotes and Backslashes
  2. Another line which was no longer necessary was causing the problem as I didn't comment or remove it.

Thanks All!! : ) Check My Answer For Description!! : )

2 Answers 2

1

Error lies in

REXP lstrRemoveHtml = cobjConn.eval("RemoveHtml('" + lstrRawData + "')");

In Java, \ is an escape character. So it escapes the meaning of " which is meant to act as r expression

Solution: Just append lstrRawData before passing to eval function as

exp = "RemoveHtml(\"" + lstrRawData + "\")";
REXP lstrRemoveHtml = cobjConn.eval(exp)
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks but that didn't work. My java code works with other R functions correctly. I think the error is because of multiple single quotes occurring in the data.
@RugvedModak just adding '\' will solve the issue, see updated answer
Thanks but, I know about regex and that didn't work either.
Thanks for your quick replies.
0

The Escaping Characters was the issue. To solve this problem I Escaped Escapes And Quotes. I created This Method to make it simpler:

public static String Regexer(String Data) {
    String RegexedData = Data.replaceAll("\\\\", "\\\\\\\\").replaceAll("'", "\\\\'").replaceAll("\"", "\\\\\"");
    return (RegexedData);
}

I Escaped the Escaped characters again in the above function so that they are escaped in R functions also.

Tip : Don't Forget To Convert REXP to a Java variable. : )

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.