0

Consider the following dt:

dt <- data.table(a=c(1,1,2,3),b=c(4,5,6,4))

That looks like that:

> dt
   a b
1: 1 4
2: 1 5
3: 2 6
4: 3 4

I'm here aggregating each column by it's unique values and then counting how many uniquye values each column has:

 > dt[,lapply(.SD,function(agg) dt[,.N,by=agg])]

   a.agg a.N b.agg b.N
1:     1   2     4   2
2:     2   1     5   1
3:     3   1     6   1

So 1 appears twice in dt and thus a.N is 2, the same logic goes on for the other values.

But the problem is if this transformations of the original datatable have different dimensions at the end, things will get recycled.

For example this dt:

dt <- data.table(a=c(1,1,2,3,7),b=c(4,5,6,4,4))

> dt[,lapply(.SD,function(agg) dt[,.N,by=agg])]

   a.agg a.N b.agg b.N
1:     1   2     4   3
2:     2   1     5   1
3:     3   1     6   1
4:     7   1     4   3
Warning message:
In as.data.table.list(jval, .named = NULL) :
  Item 2 has 3 rows but longest item has 4; recycled with remainder.

That is no longer the right answer because b.N should have now only 3 rows and things(vector) got recycled.

This is why I would like to transform the expression dt[,lapply(.SD,function(agg) dt[,.N,by=agg])] in a list with different dimensions, with the name of items in the list being the name of the columns in the new transformed dt.

A sketch of what I mean is:

newlist
$a.agg
1 2 3 7
$a.N
2 1 1 1
$b.agg
4 5 6 4
$b.N
3 1 1

Or even better solution would be to get a datatable with a track of the columns on another column:

    dt_final
   agg N column
    1 2 a
    2 1 a
    3 1 a
    7 1 a
    4 3 b
    5 1 b
    6 1 b

1 Answer 1

2

Get the data in long format and then aggregate by group.

library(data.table)

dt_long <- melt(dt, measure.vars = c('a', 'b'))
dt_long[, .N, .(variable, value)]

#   variable value N
#1:        a     1 2
#2:        a     2 1
#3:        a     3 1
#4:        a     7 1
#5:        b     4 3
#6:        b     5 1
#7:        b     6 1

In tidyverse -

library(dplyr)
library(tidyr)

dt %>%
  pivot_longer(cols = everything()) %>%
  count(name, value)
Sign up to request clarification or add additional context in comments.

1 Comment

beautiful :) . measure.vars = colnames(d) if you have several thousand columns.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.