2

I have a mydata.txt/.json file containing data like this:

[{"num":1,"name":"Swab Summer: Transformation At the United States Coast Guard Academy","link":"http:\/\/www.amazon.com\/dp\/0982168594\/ref=wl_it_dp_v_nS_ttl\/176-1400914-4673658?_encoding=UTF8&colid=1GM97SGAP8NLI&coliid=I1ELS7DSQ6QV5C","old-price":"N\/A","new-price":"","date-added":"January 10, 2014","priority":"","rating":"N\/A","total-ratings":"","comment":"","picture":"http:\/\/ecx.images-amazon.com\/images\/I\/51MtOOm493L._SL500_SL135_.jpg","page":1}]
[{"num":1,"name":"Vibomex","link":"http:\/\/www.amazon.com\/dp\/B00BR1CUFY\/ref=wl_it_dp_v_S_ttl\/175-5687209-2417046?_encoding=UTF8&colid=C0XVZ38E5WD9&coliid=I1EPDGRY73N5Q2","old-price":"N\/A","new-price":"","date-added":"July 20, 2014","priority":"","rating":"N\/A","total-ratings":"","comment":"","picture":"http:\/\/ecx.images-amazon.com\/images\/I\/31GBqOHskyL._SL500_SL135_.jpg","page":1}]

Basically, multiple json files. These are two separate rows. Now, when I'm trying to import the data in R and making it a dataframe, its only reading the lines corresponding to the first row. Below is my code:

library(rjson)
json_file <- fromJSON(file="mydata.txt")
json_file <- lapply(json_file, function(x) {
  x[sapply(x, is.null)] <- NA
  unlist(x)
})
do.call("rbind", json_file)

Can anybody help me with reading lines corresponding to each row of mydata.txt/json in a single dataframe in R or python. Any help is appreciated!

1 Answer 1

3

Here's one way, using the fromJSON offered in the jsonlite package:

do.call(rbind, lapply(readLines('mydata.json'), jsonlite::fromJSON))

#   num                                                                 name                                                                                                                                   link
# 1   1 Swab Summer: Transformation At the United States Coast Guard Academy http://www.amazon.com/dp/0982168594/ref=wl_it_dp_v_nS_ttl/176-1400914-4673658?_encoding=UTF8&colid=1GM97SGAP8NLI&coliid=I1ELS7DSQ6QV5C
# 2   1                                                              Vibomex   http://www.amazon.com/dp/B00BR1CUFY/ref=wl_it_dp_v_S_ttl/175-5687209-2417046?_encoding=UTF8&colid=C0XVZ38E5WD9&coliid=I1EPDGRY73N5Q2
#   old-price new-price       date-added priority rating total-ratings comment                                                             picture page
# 1       N/A           January 10, 2014             N/A                       http://ecx.images-amazon.com/images/I/51MtOOm493L._SL500_SL135_.jpg    1
# 2       N/A              July 20, 2014             N/A                       http://ecx.images-amazon.com/images/I/31GBqOHskyL._SL500_SL135_.jpg    1

If the set of column names varies across the json files, you can use:

library(dplyr)
rbind_all(lapply(readLines('mydata.json'), jsonlite::fromJSON))
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your help but I'm getting this error: Error in parseJSON(txt) : parse error: premature EOF. Do you know the reason for this kind of error? Cheers!
@warwick12 Maybe you're missing a closing ] on one of the lines. You can do something like: which(lapply(readLines('mydata.json'), function(x) tryCatch({jsonlite::fromJSON(x); 1}, error=function(e) 0)) == 0) to see which lines throw errors, then inspect those lines in your .json file.
Superb. Thanks for your help! works perfectly fine now. Cheers!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.