0

I'm having a very simple problem, but have not found a way out of it. I was given about 200 tables from different stores, and have to analyze the whole body of data as a unit. As the tables are fairly big, I'm working with data.table::fread. My plan is loading each table, rbinding them together and proceed with the analysis. In order to retain which store is which, I'd like to add a variable store to each table, with the name of the store(same as the table). For that I planned to run a for loop with the names of the tables and create the new variable.

# I'll create some dummy data tables here. In reality they're loaded with fread
library(data.table)
centro <- data.table(x=letters[1:5], y = 1:5)
sur <- data.table(x=letters[2:7], y = 3:8)
...
norte <- data.table(x=letters[2:10], y = 3:11)

I need each table to have a variable "store" whose value is "centro", "sur" or "norte", depending on the store it belongs to. So I tried:

st <- c("centro", "sur", "norte")
for (i in st){
   i[, store := i]
}

Which, of course, didn't work (it complains that "i is not a data.table"). I then tried to create a list and try it via lapply:

sts <- list(centro, sur, norte)
sts <- lapply(sts, function(z) z[, store := names(z)])

Which doesn't work because names(z) are "x", "y" and "store".

How can I do this with a loop (or a *pply function), instead of manually doing xyz[, store := "xyz"] ?

side notes

  • the naming of the variables is almost impossible to adjust, and doesn't follow any pattern (nothing like store_1, store_2, etc)
  • I may be very biased with the alternative I described. Any other thing that works will be ok.
1
  • Your chunk of code with i[, store := i] is not reproducible, i is a character scalar, and you are subsetting it with [ and trying to use := inside. Start from there. Commented May 5, 2016 at 21:37

3 Answers 3

3

The best way is to store your objects in a list and then access them by indexing the list.

library(data.table)

my_dts <- list (
  centro = data.table(x=letters[1:5], y = 1:5),
  sur = data.table(x=letters[2:7], y = 3:8),
  norte = data.table(x=letters[2:10], y = 3:11)
)

st <- c("centro", "sur", "norte")
for (i in st) my_dts[[i]][, store := i]
Sign up to request clarification or add additional context in comments.

4 Comments

This one was very obvious, thanks. Those are the kind of solutions that DOESN'T come to mind after too many hours / too few coffee.
my_dts[[i]][, store := i] doesn't actually work. for (i in 1:length(my_dts)) my_dts[[i]][, store := st[i]] does.
works for me. you can index a list by name, if the list has names, which it does in the example above. you can also index by number though, so your comment works too
Sure, I just listed the variables I had in workspace, so that's why it didn't work. It does, as you mention.
1

Direct answer, very little change

Wrap get around i, and assign the value back to the name contained in i.

for (i in st){
  assign(i, get(i)[, store := i])
}

Better system, some manipulation

Store the tables in a list and use the method proposed @arvi1000. A couple very obvious advantages

  1. Since your individual stores are all items under a list, your workspace has about 200 fewer objects in it, which is nice when you ls().
  2. You can use lapply instead of a for loop, which usually means cleaner code
  3. You can use AllStore <- do.call(what = rbind, args = my_dts).

'More Better', back to the beginning

  • When you read in the data from different stores, set the value of the store variable then. Use lapply and an anonymous function to do this.
  • Don't bother saving the my_dts list, because it's just duplication of data that will be saved in all_stores anyway.

Coincidentally, I had a similar business problem last week, and this is the method I used, working with about 250 csvs. Here's how I think your code would look.

st <- c("centro", "sur", "norte")

all_stores <- do.call(
  what = rbind,
  args = lapply(
    X = st,
    FUN = function(storeNM){
      temp <- fread(input = paste0(storeNM, '.csv'))
      temp[, store := storeNM]
      return(temp)
    }
  )
)

3 Comments

Thanks for introducing get, That was the function I was looking for. However, it doesn't work as you proposed (get(i)[, store := i]), as it doesn't actually modify the object. An assignment will be required: i <- get(i)[, store := i]
In my previous comment, should be assign(i, get(i)[, store := i])
True statement. I didn't think about (but I should have); get returns by value, which means that the by reference qualities of [.data.table all go out the window. Fixed.
0

Try this:

for (i in st)
{
eval(parse(text=i))[, ':='(store = i)]
}

On second thoughts , probably this is better. Assuming you have all your datasets read in using s<-lapply(filepathlist,fread) then use:

st <- c("centro", "sur", "norte") # need to change this to a list of datasets you read in    
st2<-lapply(st,function(i){eval(parse(text=i))[, ':='(store = i)]})
st3<-rbindlist(st2)

Output:

    > st3
        x  y  store
     1: a  1 centro
     2: b  2 centro
     3: c  3 centro
     4: d  4 centro
     5: e  5 centro
     6: b  3    sur
     7: c  4    sur
     8: d  5    sur
     9: e  6    sur
    10: f  7    sur
    11: g  8    sur
    12: b  3  norte
    13: c  4  norte
    14: d  5  norte
    15: e  6  norte
    16: f  7  norte
    17: g  8  norte
    18: h  9  norte
    19: i 10  norte
    20: j 11  norte

2 Comments

as with get in @Adam's answer, the eval(parse(text = i))[, store := i] piece doesn't actually change the variable, it does require the assignment: assign(i, eval(parse(text = i))[, store := i])
Agreed, assignment is required, eval(parse(...)) will just evaluate references to variable. I find @arvi1000 answer better with list of datasets.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.