0

I have a data frame A in the following format

user         item
10000000     1      # each user is a 8 digits integer, item is up to 5 digits integer
10000000     2
10000000     3
10000001     1
10000001     4
..............

What I want is a list B, with users' names as the name of list elements, list element is a vector of items corresponding to this user.

e.g

B = list(c(1,2,3),c(1,4),...)    

I also need to paste names to B. To apply association rule learning, items need to be convert to characters

Originally I used tapply(A$user,A$item, c), this makes it not compatible with association rule package. See my post:

data format error in association rule learning R

But @sgibb's solution seems also generates an array, not a list.

library("arules")
temp <- as(C, "transactions")    # C is output using @sgibb's solution

throws error: Error in as(C, "transactions") : 
no method or default for coercing “array” to “transactions”
11
  • Please please please use dput to share your data. See here for reasons and more details, it makes it much easier to help. Commented Apr 5, 2014 at 22:21
  • ?dlply or ?tapply Commented Apr 5, 2014 at 22:26
  • Also, in your previous question you mentioned split. See split(A$item, A$user) Commented Apr 5, 2014 at 22:27
  • 1
    @Jin the output of tapply and split is the same. The only difference is class(tapply(...)) == "array" and class(split(...)) == "list"). Commented Apr 5, 2014 at 23:27
  • 1
    Perhaps, try something like lapply(split(A$item, A$user), unique). Should there be duplicated items, though? If not, maybe you 've made a miscalculation somewhere when building A? I only say this, because neither split nor tapply have anything to do with a possible duplication of values. Commented Apr 6, 2014 at 0:22

1 Answer 1

3

Have a look at tapply:

df <- read.table(textConnection("
user         item
10000000     1
10000000     2
10000000     3
10000001     1
10000001     4"), header=TRUE)

B <- tapply(df$item, df$user, FUN=as.character)
B
# $`10000000`
# [1] "1" "2" "3"
#
# $`10000001`
# [1] "1" "4"

EDIT: I do not know the arules package, but here the solution proposed by @alexis_laz:

library("arules")
as(split(df$item, df$user), "transactions")
# transactions in sparse format with
#  2 transactions (rows) and
#  4 items (columns)
Sign up to request clarification or add additional context in comments.

8 Comments

your solution provides an array, not a list. Assume your output is B, brand_table <- as(B, "transactions") will complain (after you install library("a rules") )
@Jin: You should have mentioned the arules package, your aim and your link to the previous question from the beginning. See my edit.
you are not using B in the as command? why use split(df$item, df$user) here?
@Jin because it is a short example. You could use B <- split(df$item, df$user); as(B, "transactions") instead. (And my B answer before the edit was the answer to the original question (and to the title of the question).)
I see your point. But are you using association rule correctly if we try to collect all items belonging to same user first? We can directly split?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.