create list based on data frame in R

Question

I have a data frame A in the following format

user         item
10000000     1      # each user is a 8 digits integer, item is up to 5 digits integer
10000000     2
10000000     3
10000001     1
10000001     4
..............

What I want is a list B, with users' names as the name of list elements, list element is a vector of items corresponding to this user.

e.g

B = list(c(1,2,3),c(1,4),...)

I also need to paste names to B. To apply association rule learning, items need to be convert to characters

Originally I used tapply(A$user,A$item, c), this makes it not compatible with association rule package. See my post:

data format error in association rule learning R

But @sgibb's solution seems also generates an array, not a list.

library("arules")
temp <- as(C, "transactions")    # C is output using @sgibb's solution

throws error: Error in as(C, "transactions") : 
no method or default for coercing “array” to “transactions”

Please please please use dput to share your data. See here for reasons and more details, it makes it much easier to help. — Gregor Thomas
– Gregor Thomas, Commented Apr 5, 2014 at 22:21
Also, in your previous question you mentioned split. See split(A$item, A$user) — alexis_laz
– alexis_laz, Commented Apr 5, 2014 at 22:27
@Jin the output of tapply and split is the same. The only difference is class(tapply(...)) == "array" and class(split(...)) == "list"). — sgibb
– sgibb, Commented Apr 5, 2014 at 23:27
Perhaps, try something like lapply(split(A$item, A$user), unique). Should there be duplicated items, though? If not, maybe you 've made a miscalculation somewhere when building A? I only say this, because neither split nor tapply have anything to do with a possible duplication of values. — alexis_laz
– alexis_laz, Commented Apr 6, 2014 at 0:22

sgibb · Accepted Answer · 2014-04-05 23:09:51Z

3

Have a look at tapply:

df <- read.table(textConnection("
user         item
10000000     1
10000000     2
10000000     3
10000001     1
10000001     4"), header=TRUE)

B <- tapply(df$item, df$user, FUN=as.character)
B
# $`10000000`
# [1] "1" "2" "3"
#
# $`10000001`
# [1] "1" "4"

EDIT: I do not know the arules package, but here the solution proposed by @alexis_laz:

library("arules")
as(split(df$item, df$user), "transactions")
# transactions in sparse format with
#  2 transactions (rows) and
#  4 items (columns)

edited Apr 5, 2014 at 23:09

answered Apr 5, 2014 at 22:25

sgibb

25.8k3 gold badges72 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Jin Over a year ago

your solution provides an array, not a list. Assume your output is B, brand_table <- as(B, "transactions") will complain (after you install library("a rules") )

sgibb Over a year ago

@Jin: You should have mentioned the arules package, your aim and your link to the previous question from the beginning. See my edit.

Jin Over a year ago

you are not using B in the as command? why use split(df$item, df$user) here?

sgibb Over a year ago

@Jin because it is a short example. You could use B <- split(df$item, df$user); as(B, "transactions") instead. (And my B answer before the edit was the answer to the original question (and to the title of the question).)

Jin Over a year ago

I see your point. But are you using association rule correctly if we try to collect all items belonging to same user first? We can directly split?

|

Collectives™ on Stack Overflow

create list based on data frame in R

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related