2

Say I have a data.table like this:

set.seed(10)
data.table(group = rep(c("a","b","c"), each=5), date = rep(1:5,3), value = sample(c(95:105,""),15, replace=TRUE))

Within each group, in the value column, I would like to check (in a simple whay) whether there is a ""(empty character), or a group of empty characters, that is both preceded and followed by a value.

So, this is fine: "", 95,103, etc.... (empty character is first within the group), but the patterns below are examples"missing data" that I would like to detect:

95, "", 103,... (empty character in the middle)

95, "","", 103... (several empty characters in the middle)

95, 103, "" (empty character in the end)

So, in the output below, I would be able to get the row/group A, and if there are many groups, I should get all groups (or rows)

    group date value
 1:     a    1   105
 2:     a    2   103
 3:     a    3   104
 4:     a    4      
 5:     a    5   101
 6:     b    1   102
 7:     b    2   100
 8:     b    3   101
 9:     b    4    97
10:     b    5   102
11:     c    1   104
12:     c    2   101
13:     c    3   104
14:     c    4    96
15:     c    5   102

Edit: What I would need do is to select the rows that have the wrong pattern (so empty string(s) in the middle or in the end), in order to be able to detect whether there are any errors in a large dataset. So in the table in my example, the desired output would be the 4th row as it has a "missing value" (an empty character inbetween values)

     group date value
1:     a    4   

(If there were more unwanted rows, of course, I would like to get all of them)

4
  • No "" in you data Commented Mar 31, 2020 at 22:06
  • What is your desired output, the rows that meet the criteria or the ones that don't? Commented Apr 1, 2020 at 3:24
  • @Edward The ones that do (e.g. I want to check whether there are gaps in a large dataset, and ideally there would be zero) Commented Apr 1, 2020 at 7:13
  • @arg0naut91 4th row? Otherwise just use different seed number... Commented Apr 1, 2020 at 7:22

2 Answers 2

1

In case your data.table is not sorted according to 'date' column you can use the following:

DT[order(date), order := c(1:.N) , group]
DT[value == "" & order > 1L]

output:

   group date value order
1:     a    4           4

data is the same as yours:

set.seed(10)
DT <- data.table(group = rep(c("a","b","c"), each=5), date = rep(1:5,3), 
                 value = sample(c(95:105,""),15, replace=TRUE))
Sign up to request clarification or add additional context in comments.

3 Comments

I don't think this would work in case when within a group there is more than one row that have an empty string in the beginning (e.g. 1st, 2nd, and 3rd have an empty string)?
It will work. You can test with the following example: DT <- data.table(group = rep(c("a","b"), each=3), date = c(1,2,3), value = c("","","","95","","100"))
The answer basically returns all the rows that has an an empty string as 'value' except the first date of each group.
0

Here is an option:

DT[, rw := rleid(value==""), group]
DT[value=="" & rw>1L]

output:

   group date value rw
1:     a    4        2

data:

library(data.table)
set.seed(10)
DT <- data.table(group = rep(c("a","b","c","d"), each=5), 
    date = rep(1:5,4), value = c(sample(c(95:105,""),15, replace=TRUE), c("",2,3,4,5)))

7 Comments

Thanks, but what I would lite do is to select the rows that have the wrong pattern (so empty string/strings in the middle), in order to be able to detect whether there are any errors in a large dataset.
can you post your desired output?
In the DT you created, it would be: 1: c 5 (15th row of the DT table, which has an empty string as a last value within the group). In the table in my example, it would be the 4th row as it has an empty character inbetween values
post as in type in the desired output in your post as wording is quite vague.
have also updated my post. the prev ans was because based on I would be able to get the row/group A, and if there are many groups, I should get all groups (or rows), it seemed like you wanted the whole group
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.