5

I want to change the default value (which is 255) to NA.

dt <- data.table(x = c(1,5,255,0,NA), y = c(1,7,255,0,0), z = c(4,2,7,8,255))
coords <- c('x', 'y')

Which gives the following code:

     x   y   z
1:   1   1   4
2:   5   7   2
3: 255 255   7
4:   0   0   8
5:  NA   0 255

I the furthest I came up with is this:

dt[.SD == 255, (.SD) := NA, .SDcols = coords]

Please note that column z stays the same. So just the columns which are specified and not all columns.

But that doesn't help me to get the sollution:

     x   y   z
1:   1   1   4
2:   5   7   2
3:  NA  NA   7
4:   0   0   8
5:  NA   0 255

I am looking for a sustainable solution because the original dataset is a couple of million rows.

EDIT:

I have found a solution but it is quite ugly and is definately too slow as it takes almost 10 seconds to get through a dataframe of 22009 x 86. Does anyone have a better solution?

The code:

dt[, replace(.SD, .SD == 255, NA), .SDcols = coords, by = c(colnames(dt)[!colnames(dt) %in% coords])]

11
  • 1
    You can try dt[, replace(.SD, .SD == 255, NA)] Commented Nov 13, 2018 at 13:46
  • Thank you for your reply, Sotos. I edited my post. I am looking for a solution that is easily upscalable when the amount of rows heavily increase. I am not sure if the function replace is that friendly. Commented Nov 13, 2018 at 13:51
  • couple of million rows is not very big. replace will do just fine Commented Nov 13, 2018 at 13:55
  • okay, thank you. But it doesn't include the other columns. Commented Nov 13, 2018 at 13:59
  • 2
    You can do this when you read in the table: fread("path/to/file", na.strings=c("NA", "255")) Commented Nov 13, 2018 at 14:02

2 Answers 2

8

Here is how you can keep the columns outside .SDcols,

library(data.table)
dt[, (coords) := replace(.SD, .SD == 255, NA), .SDcols = coords]

which gives,

    x  y   z
1:  1  1   4
2:  5  7   2
3: NA NA   7
4:  0  0   8
5: NA  0 255
Sign up to request clarification or add additional context in comments.

Comments

2

You could also do:

require(data.table)
dt[ ,
    (coords) := lapply(.SD, function(x) fifelse(x == 255, NA_real_, x)),
    .SDcols = coords ]

Having compared it to Sotos' answer, it also seems a little bit faster.

1 Comment

I find this approach much more intuitive especially when swapping out the fifelse for any other function you might want to create or use

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.