5

I have two data frames df1 and df2:

group=c("Group 1", "Group 2", "Group3","Group 1", "Group 2", "Group3")
year=c("2000","2000","2000", "2015", "2015", "2015")
items=c("12", "10", "15", "5", "10", "7")
df1=data.frame(group, year, items)

year=c("2000", "2015")
items=c("37", "22")
df2=data.frame(year,items)

df1 contains the number of items per year and separated by group, and df2 contains the total number of items per year

I'm trying to create a for loop that will calculate the proportion of items for each group type. I'm trying to do something like:

df1$Prop="" #create empty column called Prop in df1
for(i in 1:nrow(df1)){
  df1$Prop[i]=df1$items/df2$items[df2$year==df1$year[i]]
} 

where the loop is supposed to get the proportion for each type of item (by getting the value from df1 and dividing by the total in df2) and list it in a new column but this code isn't working.

1
  • 1
    just a question: why the " in the items vector? values are numbers in fact but with your syntax they are converted in factors. Commented Jul 13, 2015 at 22:16

2 Answers 2

4

You don't need df2 really, here's a simple solution using data.table and only df1 (I'm assuimg items is numeric column, if not, you''ll need to convert it to one setDT(df1)[, items := as.numeric(as.character(items))])

library(data.table)
setDT(df1)[, Prop := items/sum(items), by = year]
df1
#      group year items      Prop
# 1: Group 1 2000    12 0.3243243
# 2: Group 2 2000    10 0.2702703
# 3:  Group3 2000    15 0.4054054
# 4: Group 1 2015     5 0.2272727
# 5: Group 2 2015    10 0.4545455
# 6:  Group3 2015     7 0.3181818

Another way is if you already have df2, you can join between the two and calculate Prop while doing so (again, I'm assuming items is numeric in real data)

setkey(setDT(df1), year)[df2, Prop := items/i.items]

A base R alternative

with(df1, ave(items, year, FUN = function(x) x/sum(x)))
## [1] 0.3243243 0.2702703 0.4054054 0.2272727 0.4545455 0.3181818
Sign up to request clarification or add additional context in comments.

2 Comments

items is a factor the way @shrimp32 wrote the example.
I know, I said I'm assuming it's a mistake and that's actually a numeric value.
2

dplyr equivalent to David's data.table solution

library(dplyr)

df1$items = as.integer(as.vector(df1$items))
df1 %>% group_by(year) %>% mutate(Prop = items / sum(items))

#Source: local data frame [6 x 4]
#Groups: year

#    group year items      Prop
#1 Group 1 2000    12 0.3243243
#2 Group 2 2000    10 0.2702703
#3  Group3 2000    15 0.4054054
#4 Group 1 2015     5 0.2272727
#5 Group 2 2015    10 0.4545455
#6  Group3 2015     7 0.3181818

plyr alternative

ddply(df1, .(year), mutate, prop = items/sum(items))

lapply alternative

do.call(rbind,lapply(split(df1, df1$year), 
        function(x){ x$prop = x$item / sum(x$item); x}))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.