create list from columns of data table expression

Question

Consider the following dt:

dt <- data.table(a=c(1,1,2,3),b=c(4,5,6,4))

That looks like that:

> dt
   a b
1: 1 4
2: 1 5
3: 2 6
4: 3 4

I'm here aggregating each column by it's unique values and then counting how many uniquye values each column has:

 > dt[,lapply(.SD,function(agg) dt[,.N,by=agg])]

   a.agg a.N b.agg b.N
1:     1   2     4   2
2:     2   1     5   1
3:     3   1     6   1

So 1 appears twice in dt and thus a.N is 2, the same logic goes on for the other values.

But the problem is if this transformations of the original datatable have different dimensions at the end, things will get recycled.

For example this dt:

dt <- data.table(a=c(1,1,2,3,7),b=c(4,5,6,4,4))

> dt[,lapply(.SD,function(agg) dt[,.N,by=agg])]

   a.agg a.N b.agg b.N
1:     1   2     4   3
2:     2   1     5   1
3:     3   1     6   1
4:     7   1     4   3
Warning message:
In as.data.table.list(jval, .named = NULL) :
  Item 2 has 3 rows but longest item has 4; recycled with remainder.

That is no longer the right answer because b.N should have now only 3 rows and things(vector) got recycled.

This is why I would like to transform the expression dt[,lapply(.SD,function(agg) dt[,.N,by=agg])] in a list with different dimensions, with the name of items in the list being the name of the columns in the new transformed dt.

A sketch of what I mean is:

newlist
$a.agg
1 2 3 7
$a.N
2 1 1 1
$b.agg
4 5 6 4
$b.N
3 1 1

Or even better solution would be to get a datatable with a track of the columns on another column:

    dt_final
   agg N column
    1 2 a
    2 1 a
    3 1 a
    7 1 a
    4 3 b
    5 1 b
    6 1 b

Ronak Shah · Accepted Answer · 2021-11-24 06:03:29Z

2

Get the data in long format and then aggregate by group.

library(data.table)

dt_long <- melt(dt, measure.vars = c('a', 'b'))
dt_long[, .N, .(variable, value)]

#   variable value N
#1:        a     1 2
#2:        a     2 1
#3:        a     3 1
#4:        a     7 1
#5:        b     4 3
#6:        b     5 1
#7:        b     6 1

In tidyverse -

library(dplyr)
library(tidyr)

dt %>%
  pivot_longer(cols = everything()) %>%
  count(name, value)

answered Nov 24, 2021 at 6:03

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

moth Over a year ago

beautiful :) . measure.vars = colnames(d) if you have several thousand columns.

Collectives™ on Stack Overflow

create list from columns of data table expression

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related