0

I have a datatable which looks like the following

> head(mydt)
    name  b      c
1:  ao    2      1 GiB
2:  bo    2      1.4 Gib

Now, I try to do some cleansing - i try to remove the unit from the values in column c without any loops,.. I did the following:

mydt[,4 :=substr(c,0,gregexpr(pattern=' ',c)[[1]][1]-1)]

What I get is something like this:

> head(mydt)
    name  b      c
1:  ao    2      1 G
2:  bo    2      1.4

What I expect is the following

> head(mydt)
    name  b      c
1:  ao    2      1
2:  bo    2      1.4

However, it doesn't work --> it seems to use the same endpoint for all values. What I am doing wrong? How can I access the "current" value?

4
  • my question is not about extracting numbers from vectors -> it is about how to access the current value in order to modify it! Commented Dec 17, 2018 at 21:08
  • "i try to remove the unit from the values in column c" can easily be interpreted as that you want to remove the unit, i.e. extract the number ;) Given your LHS of :=, it seems like you need to study ?:= and an introductory data.table text. Commented Dec 17, 2018 at 21:24
  • you are right, I clarified my question! Commented Dec 17, 2018 at 21:27
  • 1
    Extracting numbers from vectors of strings Commented Dec 17, 2018 at 21:31

2 Answers 2

1

You can use dplyr on a data.table object, which evaluates each value individually as opposed to just the first row. For example:

library(dplyr)
library(data.table)

mydt<-data.table(name = c('ao','bo'), b = c(2,2), c = c("1 GiB", "1.4 GiB"))
mydt %>% 
  mutate(d = as.numeric(gsub(" GiB","",c)))

  name b       c   d
1   ao 2   1 GiB 1.0
2   bo 2 1.4 GiB 1.4
Sign up to request clarification or add additional context in comments.

Comments

0
mydt <- data.table(name = c("ao","bo"),
                   b = c(2,2),
                   c = c("1 GiB", "1.4 Gib"))

We can take several approaches.

Using tidyverse:

mydt %>% mutate(c = parse_number(c))

Or,

mydt %>% separate(col = c,into = "c",sep = " ",convert = T)

Using data.table approach

mydt[,
     c := tstrsplit(c," ", fixed = TRUE, keep = 1L)]

which gives output with column c as character.

To get c as a numeric output:

mydt[,
     c := tstrsplit(c," ", fixed = TRUE, keep = 1L)
     ][,
       c := as.numeric(c)]

Your original code can be replaced by below code:

mydt[, c :=substr(c,0,str_start(c," "))]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.