1

My data.table consists of hourly observations of the power produced by an engine (output) and a system state descriptor tag which tells which all components of the engine are turned on.

DATA

structure(list(time = structure(c(1517245200, 1517247000, 1517248800, 
1517250600, 1517252400, 1517254200, 1517256000, 1517257800, 1517259600, 
1517261400, 1517263200, 1517265000, 1517266800, 1517268600, 1517270400, 
1517272200, 1517274000, 1517275800, 1517277600, 1517279400, 1517281200, 
1517283000, 1517284800, 1517286600), class = c("POSIXct", "POSIXt"
), tzone = ""), output1 = c(160.03310020928, 159.706274495615, 
159.803834736236, 159.753928429527, 159.54807802046, 159.21298848298, 
158.904290018581, 158.683643772917, 158.670475839199, 158.793901799427, 
158.886487460894, 159.167829223303, 159.66751884913, 159.1288534448, 
159.141463186901, 160.116892086363, 160.517879769862, 160.615925580417, 
160.915687799509, 161.590897854561, 161.568455821241, 161.411642091721, 
161.811137570257, 162.193040254917), tag1 = c("evap only", "evap only", 
"fog & evap", "fog & evap", "evap only", "evap only", "evap only", 
"neither fog nor evap", "neither fog nor evap", "fog & evap", "evap only", "evap only", 
"evap only", "fog & evap", "evap only", "fog & evap", "evap only", 
"evap only", "evap only", "evap only", "fog & evap", "fog & evap", 
"bad data", "neither fog nor evap")), row.names = c(NA, -24L
), class = c("data.table", "data.frame"))

You can also generate some sample data using:

sample_data <- data.table(time = seq.POSIXt(from = Sys.time(), by = 60*60*3, length.out = 100), 
           output = runif(n = 100, min = 130, max = 172), 
           tag = sample(x = c('evap only', 'bad data', 'neither fog nor evap', 'fog and evap'), 
                        size = 100, replace = T))

I want to group this by day (sample data above has only two days but actual data has 3 years worth of data) and find the mean power corresponding to each tag. I would like the output to be something like:

          time  evap only fog & evap  neither fog nor evap bad data
 1: 2018-01-29  159.8391  160.0825    159.8491             161.8111

I've tried the following piece of code but the result is not in the form that I want. I'm using .SDcols because the actual dataset has a large number of other columns.

sample_data[, lapply(.SD, function(z){mean(z, na.rm = T)}), .SDcols = c('output1'), by = .(round_date(time, 'day'), tag1)]
   round_date                 tag1  output1
1: 2018-01-30            evap only 159.8391
2: 2018-01-30           fog & evap 160.0825
3: 2018-01-30 neither fog nor evap 159.8491
4: 2018-01-30             bad data 161.8111

I've seen the below questions posted on stack overflow.

  1. Create new data.table columns based on other columns
  2. Loop through data.table and create new columns basis some condition
  3. R data.table create new columns with standard names
  4. Add new columns to a data.table containing many variables
  5. Add multiple columns to R data.table in one function call?
  6. Assign multiple columns using := in data.table, by group
  7. Dynamically create new columns in data.table
  8. Creating new columns in data.table

Is there a data.table way of achieving this?

6
  • 1
    r u looking for dcast(DT[, mean(output1), .(d=as.Date(time), tag1)], d ~ tag1, value.var="V1")? since your desired output only has 1 date, its hard to tell what you are looking for Commented Sep 30, 2019 at 10:17
  • If you already have the mean by date isn't this just a reshaping question ? stackoverflow.com/questions/5890584/… Commented Sep 30, 2019 at 10:18
  • @chinsoon12 data from additional dates would end up as additional rows in the output. I've added a section to generate some random data with additional dates. Commented Sep 30, 2019 at 12:30
  • @RonakShah I realized what I was missing, it works now but I'm wondering if there's a data.table way of achieving this. Commented Sep 30, 2019 at 12:42
  • What is your desired output for the sample data set Commented Sep 30, 2019 at 13:01

2 Answers 2

1

Here is a data.table approach

#explanation of mean(.SD[[1]] ..), see akrun's comment here:
# https://stackoverflow.com/questions/29568732/using-mean-with-sd-and-sdcols-in-data-table#comment47286876_29568732
ans <- DT[, .(mean_output1 = mean(.SD[[1]], na.rm = TRUE )), 
          by = .( date = as.Date( time ), tag1 ), 
          .SDcols = c("output1") ]

dcast( ans, date~tag1, value.var = "mean_output1" )

#          date bad data evap only fog & evap neither fog nor evap
# 1: 2018-01-29       NA  159.3908   159.3701             158.6771
# 2: 2018-01-30 161.8111  160.5564   161.0323             162.1930
Sign up to request clarification or add additional context in comments.

Comments

0
library(dplyr)
library(lubridate)
# test is the dataframe provided in question
test1 = test %>% group_by(date = date(time), tag1) %>% 
          summarise(mean_power = mean(output1))

convert tibble produced by above code to a dataframe

test1_df = data.frame(test1)

reshape data to wide format

reshape(test1_df, idvar = "date", timevar = 
            "tag1", direction = "wide")

Output:

> output
        date evap only fog & evap bad data neither fog nor evap
1 2018-01-29  159.8697   159.8038       NA                   NA
3 2018-01-30  159.8335   160.1289 161.8111             159.8491

The row number is appearing as 3 after 1 since the date 2018-01-30 first appeared on 3rd row in test1_df.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.