34

In data.table is possible to have columns of type list and I'm trying for the first time to benefit from this feature. I need to store for each row of my table dt several comments taken from an rApache web service. Each comment will have a username, datetime, and body item.

Instead of using long strings with some weird, unusual character to separate each message from the others (like |), and a ; to separate each item in a comment, I thought to use lists like this:

library(data.table)
dt <- data.table(id=1:2,
        comment=list(list(
            list(username="michele", date=Sys.time(), message="hello"),
            list(username="michele", date=Sys.time(), message="world")),
          list(
            list(username="michele", date=Sys.time(), message="hello"),
            list(username="michele", date=Sys.time(), message="world"))))

> dt
   id comment
1:  1  <list>
2:  2  <list>

to store all the comments added for one particular row. (also because it will be easier to convert to JSON later on when I need to send it back to the UI)

However, when I try to simulate how I will be actually filling my table during production (adding single comment to a particular row), R either crashes or doesn't assign what I would like and then crashes:

library(data.table)

> library(data.table)
> dt <- data.table(id=1:2, comment=vector(mode="list", length=2))
> dt$comment
[[1]]
NULL

[[2]]
NULL

> dt[1L, comment := 1] # this works
> dt$comment
[[1]]
[1] 1

[[2]]
NULL

> set(dt, 1L, "comment", list(1, "a"))  # assign only `1` and when I try to see `dt` R crashes
Warning message:
In set(dt, 1L, "comment", list(1, "a")) :
  Supplied 2 items to be assigned to 1 items of column 'comment' (1 unused)

> dt[1L, comment := list(1, "a")]       # R crashes as soon as I run
> dt[1L, comment := list(list(1, "a"))] # any of these two

I know I'm trying to misuse data.table, e.g. the way the j argument has been designed allows this:

dt[1L, c("id", "comment") := list(1, "a")] # lists in RHS are seen as different columns! not parts of one

Question: So, is there a way to do the assignment I want? Or I just have to take dt$comment out in a variable, modify it, and then re-assign the whole column every times I need to do an update?

5
  • You could probably use rbind and/or merge to succesively update your data.table, but that sounds very inefficient. Other than that I can only say that I ran into the following warning message: "Column 'comment' is type 'list' which is not supported as a key column type, currently." Commented Mar 20, 2014 at 13:27
  • dt[1L, comment := list(1L)] - you've to use list(.) as the column type is list. set(dt, 1, "comment", list(1)) - list(1, "a") is of length 2, and you're assigning it to i=1 (which is of length 1. Commented Mar 20, 2014 at 14:05
  • @arun can you please write an answer with a reproducible code? because I don't I understood. Just to clarify: I need something like list(1, "a") inside 1 cell of the table, precisely in an element of a column of type list as defined above Commented Mar 20, 2014 at 14:23
  • @shadow I think you need to update data.table Commented Mar 20, 2014 at 14:24
  • @Arun I don't I understood means I think I haven't understood sorry... I also tried to assign list(list(1, "a")), which is of length 1, but R still crashes. Commented Mar 20, 2014 at 14:29

2 Answers 2

37

Using :=:

dt = data.table(id = 1:2, comment = vector("list", 2L))

# assign value 1 to just the first column of 'comment'
dt[1L, comment := 1L]

# assign value of 1 and "a" to rows 1 and 2
dt[, comment := list(1, "a")]

# assign value of "a","b" to row 1, and 1 to row 2 for 'comment'
dt[, comment := list(c("a", "b"), 1)]

# assign list(1, "a") to just 1 row of 'comment'
dt[1L, comment := list(list(list(1, "a")))]

For the last case, you'll need one more list because data.table uses list(.) to look for values to assign to columns by reference.

Using set:

dt = data.table(id = 1:2, comment = vector("list", 2L))

# assign value 1 to just the first column of 'comment'
set(dt, i=1L, j="comment", value=1L)

# assign value of 1 and "a" to rows 1 and 2
set(dt, j="comment", value=list(1, "a"))

# assign value of "a","b" to row 1, and 1 to row 2 for 'comment'
set(dt, j="comment", value=list(c("a", "b"), 1))

# assign list(1, "a") to just 1 row of 'comment'
set(dt, i=1L, j="comment", value=list(list(list(1, "a"))))

HTH


I'm using the current development version 1.9.3, but should just work fine on any other version.

> sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.3

loaded via a namespace (and not attached):
[1] plyr_1.8.0.99  reshape2_1.2.2 stringr_0.6.2  tools_3.0.3   
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks a million, I was missing one list, I tried only with 2, not 3. I was misled by dt[1, comment := 1] inserting 1 inside the first list element. So I thought := list(list(1, "a")) should put list(1, "a") inside the first list element. One question Arun: why dt[1, comment := list(list(1))] and dt[1, comment := 1] gives the same result?
@Michele, Great question! I think it shouldn't. It should give a type mismatch error, IIUC. I'm not sure if I'd call it a bug, but it's still an inconsistency. So, could you please file a bug report here?
awesome. := list(list(1, "a")) solved my problem. Great answer. Thanks!
@Arun, I'm using data.table 1.9.7 and got warnings of type coercion for several example code in :=. And how can I assign a vector to a cell? dt[, comment := list(c("a", "b"), 1)] work for all rows, but assigning one cell with dt[1, comment := list(c("a", "b"))] or dt[1, comment := c("a", "b")] doesn't even give right result.
OK, the problem is probably with that we need [[]] to access the list item. the regular method of dt$comment[[1]] <- c("a", "b") works. Though I still don't know how to do it in data.table j syntax.
|
20

Just to add more info, what list columns are really designed for is when each cell is itself a vector:

> DT = data.table(a=1:2, b=list(1:5,1:10))
> DT
   a            b
1: 1    1,2,3,4,5
2: 2 1,2,3,4,5,6,

> sapply(DT$b, length)
[1]  5 10 

Notice the pretty printing of the vectors in the b column. Those commas are just for display, each cell is actually a vector (as shown by the sapply command above). Note also the trailing comma on the 2nd item of b. That indicates that the vector is longer than displayed (data.table just displays the first 6 items).

Or, more like your example :

> DT = data.table(id=1:2, comment=list( c("michele", Sys.time(), "hello"),
                                        c("michele", Sys.time(), "world") ))
> DT
   id                       comment
1:  1 michele,1395330180.9278,hello
2:  2 michele,1395330180.9281,world 

What you're trying to do is not only have a list column, but put list into each cell as well, which is why <list> is being displayed. Additionally if you place named lists into each cell then beware that all those names will use up space. Where possible, a list column of vectors may be easier.

1 Comment

Thanks a lot for the hint. I think I'll be fine however. I'm not using data.table for its speed (this time).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.