3

Im trying to get JSON array object from data frame where each JSON object is a subset of data frame

> x <- 1:5  
> y <-c('a','b','c','d','e')  
> z <-c(1,1,1,2,2)  
> df <-data.frame(x,y,z)  
> df  
    x y z  
  1 1 a 1  
  2 2 b 1  
  3 3 c 1  
  4 4 d 2  
  5 5 e 2  
> rjson::toJSON(df)  
[1] "{\"x\":[1,2,3,4,5],\"y\":[\"a\",\"b\",\"c\",\"d\",\"e\"],\"z\":[1,1,1,2,2]}"  
> df1 = toJSONArray2(na.omit(df), json = F, names = F)  
> rjson::toJSON(df1)  
[1] "[[1,\"a\",1],[2,\"b\",1],[3,\"c\",1],[4,\"d\",2],[5,\"e\",2]]"  

The output I require is

[ [ [1,a],[2,b],[3,c] ],[ [4,d],[5,e] ] ]

Below method I'm able to get a list of dataframes as expected,but unable to get the required json output.

> x <- foreach(i=1:2) %do% { subset(df,df$z==i)[c(1,2)]}  
> x  
 [[1]]   
   x y  
 1 1 a  
 2 2 b  
 3 3 c  

 [[2]]
   x y
 4 4 d
 5 5 e

Found a solution.

> x <- foreach(i=1:2) %do% {
   tmp <-subset(df,df$z==i)[c(1,2)]  
   toJSONArray2(na.omit(tmp), json = F, names = F)  
   }
> rjson::toJSON(x) 

I require an implementation without toJSONArray2 which is quite slow

2 Answers 2

5

The toJSONArray2 function in rCharts is slow mainly due to the use of RJSONIO. I am in the process of updating it to a faster implementation using rjson. Here is what I have so far. I have borrowed the idea of the orient argument from pandas.

to_json = function(df, orient = "columns", json = T){
  dl = as.list(df)
  dl = switch(orient, 
    columns = dl,
    records = do.call('zip_vectors_', dl),
    values = do.call('zip_vectors_', setNames(dl, NULL))
  )
  if (json){
    dl = rjson::toJSON(dl)
  }
  return(dl)
}

zip_vectors_ = function(..., names = F){
  x = list(...)
  y = lapply(seq_along(x[[1]]), function(i) lapply(x, pluck_(i)))
  if (names) names(y) = seq_along(y)
  return(y)
}

pluck_ = function (element){
  function(x) x[[element]]
}

The example below will show you that to_json is 20x faster than toJSONArray2, most of which is coming due to the use of rjson rather than RJSONIO.

N = 10^3

df <- data.frame(
  x = rpois(N, 10),
  y = sample(LETTERS, N, replace = T),
  z = rpois(N, 5)
)

library(microbenchmark)
autoplot(microbenchmark(
  to_json(df, orient = "values", json = T),
  toJSONArray2(df, names = F),
  times = 5
))

enter image description here

UPDATE: On more carefully reading through your question, I realized that we could speed it up further by using dplyr and to_json

library(dplyr)

dfl = df %.%
  group_by(z) %.%
  do(function(x){
    to_json(x[-3], orient = 'values', json = F)  
  })
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot Ramnath. With to_json implementation I was able to reduce execution time from 370 secs to 10 sec :-)
Great! It is a great use case for me to push shifting over to the newer implementation of to_json.
2

For others trying to answer, the toJSONArray[2] functions are in the rCharts package. Your solution is pretty compact, but can be de-looped and tightened a bit with sapply and split:

library(rjson)
library(rCharts)

x <- 1:5  
y <- c('a', 'b' ,'c' ,'d' ,'e')  
z <- c(1, 1, 1, 2, 2)  

df <- data.frame(x, y, z) 

toJSON(df)

out <- toJSONArray(sapply(split(df[,1:2], df$z), function(x) {
  toJSONArray2(x, names=FALSE, json = FALSE)
}))

# doing gsub only for SO example output
cat(gsub("\\n", "", out))

## [ [ [ 1,"a" ],[ 2,"b" ],[ 3,"c" ] ],[ [ 4,"d" ],[ 5,"e" ] ] ]

Per requester, let's take a look at the toJSONArray[2]() function implementations in rCharts:

toJSONArray <- function(obj, json = TRUE, nonames = TRUE){
  list2keyval <- function(l){
    keys = names(l)
    lapply(keys, function(key){
      list(key = key, values = l[[key]])
    })
  }
  obj2list <- function(df){
    l = plyr::alply(df, 1, as.list)
    if(nonames){ names(l) = NULL }
    return(l)
  }
  if (json){
    toJSON(obj2list(obj))
  } else {
    obj2list(obj)
  }
}

toJSONArray2 <- function(obj, json = TRUE, names = TRUE, ...){
  value = lapply(1:nrow(obj), function(i) {
    res <- as.list(obj[i, ])
    if (!names) names(res) <- NULL  # remove names (e.g. {x = 1, y = 2} => {1, 2})
    return(res)
  })
  if (json){
    return(toJSON(value, .withNames = F, ...))
  } else {
    names(value) <- NULL;
    return(value)
  }
}

Those functions are pretty optimized, but toJSONArray2 is basically using one of the apply functions as a for loop, so let's see if a self-encoding of JSON for your needs is any better. The following might be faster for you, but you'll prbly need to tweak it a bit more for your production-code (and if you need the integers de-quoted):

out <- sapply(split(df[,1:2], df$z), function(x) {
  out.2 <- apply(x, 1, function(y) {
    return(paste0(toJSON(unlist(as.list(y)), .withNames = FALSE), sep=",", collapse=""))
  })
  out.2 <- paste(out.2, sep=", ", collapse=" ")
  out.2 <- gsub(",$", "", out.2)
  return(sprintf("[ %s ], ", out.2))
})

cat(sprintf("[ %s ]", gsub(", $", "", paste(unlist(out), collapse=""))))
## [ [ [ "1", "a" ], [ "2", "b" ], [ "3", "c" ] ], [ [ "4", "d" ], [ "5", "e" ] ] ]

It shares some similar patterns as the rCharts implementation but is completely focused on slapping rows of a factor-split data frame into the format you need.

1 Comment

Thanks :-). Is there any other implementation without toJSONArray2? This function is really slow ( on a dataset of 100000 elements this function alone taking 80% of the processing time around 300secs)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.