Say I have a data.table like this:
set.seed(10)
data.table(group = rep(c("a","b","c"), each=5), date = rep(1:5,3), value = sample(c(95:105,""),15, replace=TRUE))
Within each group, in the value column, I would like to check (in a simple whay) whether there is a ""(empty character), or a group of empty characters, that is both preceded and followed by a value.
So, this is fine: "", 95,103, etc.... (empty character is first within the group), but the patterns below are examples"missing data" that I would like to detect:
95, "", 103,... (empty character in the middle)
95, "","", 103... (several empty characters in the middle)
95, 103, "" (empty character in the end)
So, in the output below, I would be able to get the row/group A, and if there are many groups, I should get all groups (or rows)
group date value
1: a 1 105
2: a 2 103
3: a 3 104
4: a 4
5: a 5 101
6: b 1 102
7: b 2 100
8: b 3 101
9: b 4 97
10: b 5 102
11: c 1 104
12: c 2 101
13: c 3 104
14: c 4 96
15: c 5 102
Edit: What I would need do is to select the rows that have the wrong pattern (so empty string(s) in the middle or in the end), in order to be able to detect whether there are any errors in a large dataset. So in the table in my example, the desired output would be the 4th row as it has a "missing value" (an empty character inbetween values)
group date value
1: a 4
(If there were more unwanted rows, of course, I would like to get all of them)
""in you data