I have this sample R scraper script (I can't use actual website):
#!/usr/bin/Rscript
library(RCurl)
library(httr)
library(rvest)
library(lubridate)
library(stringi)
new_files <- Map(function(ln, y, bn) {
fun1 <- html_session(URLencode(
paste0("https://example.com", ln)),
config(ssl_verifypeer = FALSE))
if(y == Sys.Date()) {writeBin(fun1$response$content, bn)}
else ("He's dead, Jim")
return(fun1$response$content)
}, links, dates, names)
I'm running this script in a docker container, through Apache NiFi (the ExecuteProcessor processor). But when I set it to run, I keep getting this error:
Process execution failed due to java.io.IOException: Stream closed: java.io.IOException: Stream closed
java.io.IOException: Stream closed
at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170)
at java.io.BufferedInputStream.read(BufferedInputStream.java:336)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.nifi.processors.standard.ExecuteProcess$4.call(ExecuteProcess.java:367)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I was reading this answer on closing streams before they should be closed. I have no idea why this would be throwing a closed exception error, when the script works fine on my local computer / in RStudio.
It messes up as soon as it's executed in a docker container. Something to do with my if/else statement within the Map function? I have no clue - or it has something to do with loading the lubridate package.
docker run -p 8080:8080 -d nifi-container-name