0

I am trying to merge multiple json files into one database and despite trying all the approaches found on SO, it fails.

The files provide sensor data. The stages I've completed are:

1. Unzip the files - produces json files saved as '.txt' files
2. Remove the old zip files
3. Parse the '.txt' files to remove some bugs in the content - random 3 
letter + comma combos at the start of some lines, e.g. 'prm,{...'

I've got code which will turn them into data frames individually:

stream <- stream_in(file("1.txt"))
flat <- flatten(stream)
df_it <- as.data.frame(flat)

But when I put it into a function:

df_loop <- function(x) {
  stream <- stream_in(x)
  flat <- flatten(stream)
  df_it <- as.data.frame(flat)
  df_it
}

And then try to run through it:

df_all <- sapply(file.list, df_loop)

I get:

Error: Argument 'con' must be a connection.

Then I've tried to merge the json files with rbind.fill and merge to no avail.

Not really sure where I'm going so terribly wrong so would appreciate any help.

3
  • 1
    is file.list a list of file paths? In that case you need to do stream <- stream_in(file(x)) in your function Commented Jan 3, 2019 at 5:21
  • That worked a treat but would you help me understand why? Commented Jan 3, 2019 at 7:12
  • added ans pls check Commented Jan 3, 2019 at 7:24

1 Answer 1

1

You need a small change in your function. Change to -

stream <- stream_in(file(x))

Explanation

Start with analyzing your original implementation -

stream <- stream_in(file("1.txt"))

The 1.txt here is the file path which is getting passed as an input parameter to file() function. A quick ?file will tell you that it is a

Function to create, open and close connections, i.e., “generalized files”, such as possibly compressed files, URLs, pipes, etc.

Now if you do a ?stream_in() you will find that it is a

function that implements line-by-line processing of JSON data over a connection, such as a socket, url, file or pipe

Keyword here being socket, url, file or pipe.

Your file.list is just a list of file paths, character/strings to be specific. But in order for stream_in() to work, you need to pass in a file object, which is the output of file() function which takes in the file path as a string input.

Chaining that together, you needed to do stream_in(file("/path/to/file.txt")).

Once you do that, your sapply takes iterates each path, creates the file object and passes it as input to stream_in().

Hope that helps!

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you - much appreciated! Will get back to work trying to merge them with rbind.fill or something similar.
I've followed your advice but now merging into one dataframe seems to crash. What do you think I'm missing to stream_in the files, flatten them and append them to one large data frame?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.