2

Hello community I am new to stackoverflow hope I can write this problem correctly.

I have a R dataframe that looks like this:

ID|Product1|Type     |Product_JSON_string
1 | Bread  |Grocery  |{"ID":"1","Product1":"Bread","Type":"Automotive"}
2 | Butter |Grocery  |{"ID":"2","Product1":"Butter","Type":"Grocery"} 
df <- data.frame(ID  = c("1", "2"),
                  Product1 = c("Bread", "Butter"),
                  Type= c("Grocery", "Grocery"),
                  Product_JSON_string= c('{"ID":"1","Product1":"Bread","Type":"Automotive"}',
                                         '{"ID":"2","Product1":"Butter","Type":"Grocery"}'
                                         ))

I want to parse the JSON string and see if the JSON string matches the database entries i-e the id, product1 and type are the same in the JSON and the dataframe. I am able to parse one JSON at a time and convert each variable into a column using this bad piece of code using the jsonlite library.

as.data.frame(matrix(unlist(parse_json(minify(db$Product_JSON_string[1]), simplifyVector = FALSE)), nrow=1))

However, this is not enough as it only gives me the first row and I am not able vectorize it for multiple rows. Less importantly, it does not give me the column names as the output looks like:

V1 V2 V3 1 1 Bread Automotive

Can someone help me write better code or improve this to work for multiple rows. I actually have to run for thousands of json strings like these.

3 Answers 3

2

You can use jsonlite::fromJSON on each Product_JSON_string and combine the values.

result <- do.call(rbind, lapply(df$Product_JSON_string, function(x) 
                  as.data.frame(jsonlite::fromJSON(x))))
result

#  ID Product1       Type
#1  1    Bread Automotive
#2  2   Butter    Grocery
Sign up to request clarification or add additional context in comments.

1 Comment

do.call(bind_rows, lapply(df$Product_json_string, function(x) as.data.frame(fromJSON(minify(x))))) this worked for me
2
library(tidyverse)
library(jsonlite)
library(daff)


df <- data.frame(
  ID  = c("1", "2", "99"),
  Product1 = c("Bread", "Butter", "err"),
  Type = c("Grocery", "Grocery", "bad"),
  Product_JSON_string = c(
    '{"ID":"1","Product1":"Bread","Type":"Automotive"}',
    '{"ID":"2","Product1":"Butter","Type":"Grocery"}',
    '{"ID":"3","Product1":"Butter","Type":"Grocery"}'
  )
)

df %>%
  select(-Product_JSON_string)
#>   ID Product1    Type
#> 1  1    Bread Grocery
#> 2  2   Butter Grocery
#> 3 99      err     bad

JSON_data <- purrr::map_df(df$Product_JSON_string, ~unlist(jsonlite::parse_json(.)))
JSON_data
#> # A tibble: 3 x 3
#>   ID    Product1 Type      
#>   <chr> <chr>    <chr>     
#> 1 1     Bread    Automotive
#> 2 2     Butter   Grocery   
#> 3 3     Butter   Grocery

differences <- daff::diff_data(df %>%
                 select(-Product_JSON_string),
               JSON_data)
differences
#> Daff Comparison: 'df %>% select(-Product_JSON_string)' vs. 'JSON_data' 
#>     ID Product1 Type               
#> ->  1  Bread    Grocery->Automotive
#>     2  Butter   Grocery            
#> +++ 3  Butter   Grocery            
#> --- 99 err      bad

daff::render_diff(differences)

2 Comments

When I rune map_df I get the error argument 1 must be named. Not sure which argument this might be.
I have edited my answer: map_df belongs to the package purrr.
0

just adding minify to @ronak shah's answer worked for me

do.call(bind_rows, lapply(df$Product_json_string, function(x) as.data.frame(fromJSON(minify(x)))))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.