0

I am wanting to create variables age10, age20, age30, etc. for a specified data set. The input for the function add_ages is a data frame named df, where the new variables are created based on their relation to the existing variable age.

df <- data.frame(age=sample(1:100,10,replace=T))

add_ages <- function(d){
  for(i in seq(10,100,10)){
    d[,paste0("age",i)] <<- ifelse(i>=d[,"age"] & d[,"age"]<i+10,1,0)
  }
}

add_ages(d=df)

However, when I run the code above, I get the following error:

Error in d[, paste0("age", i)] <<- ifelse(i >= d[, "age"] & d[, "age"] <  : 
  object 'd' not found

I'm not sure I understand why d cannot be found, when I am defining it to be df. Any thoughts?

2 Answers 2

2

It sounds like you are trying to create dummy variables using your data.

Note that for most modeling functions in R this will happen automatically in the modeling step. The way this works is using the model.matrix() function.

Here is an example:

df <- data.frame(age=sample(1:100,10,replace=T))

# Create a categorical variable using cut()
df$agegroup <- cut(df$age, breaks=seq(0, 100, by = 10))

You now have a categorical variable with age groups:

head(df)
  age agegroup
1  82  (80,90]
2  79  (70,80]
3  99 (90,100]
4  12  (10,20]
5  82  (80,90]
6  66  (60,70]

Convert to a model matrix

# Create the model matrix

model.matrix(~agegroup - 1, df)
   agegroup(0,10] agegroup(10,20] agegroup(20,30] agegroup(30,40] agegroup(40,50]
1               0               0               0               0               0
2               0               0               0               0               0
3               0               0               0               0               0
4               0               1               0               0               0
5               0               0               0               0               0
6               0               0               0               0               0
7               0               0               0               0               0
8               0               1               0               0               0
9               0               0               0               0               1
10              0               0               0               0               0
Sign up to request clarification or add additional context in comments.

Comments

2

Use <- instead of <<-. Using <<- is assigning things in the global scope, where d does not exist. Finally, return d.

add_ages <- function(d) {
  for (i in seq(10,100,10)){
    d[,paste0("age",i)] <- ifelse(i>=d[,"age"] & d[,"age"]<i+10,1,0)
  }
  d
}
df <- add_ages(df)

Edit:

If you really want to avoid doing df <- add_ages(df), you could do the following:

add_ages <- function() {
  for (i in seq(10,100,10)){
    df[,paste0("age",i)] <<- ifelse(i>=df[,"age"] & df[,"age"]<i+10,1,0)
  }
}

add_ages()

I'd recommend against this for at least two reasons. First, this does not generalize at all. There's really no point in making a function that does this, you'd be better off just using the loop immediately after creation of df, e.g.

df <- data.frame(age=sample(1:100,10,replace=T))
for (i in seq(10,100,10)){
  df[,paste0("age",i)] <<- ifelse(i>=df[,"age"] & df[,"age"]<i+10,1,0)
}

Second, functions should strive to avoid side effects. In other words, if I call a function, the only object that gets modified is where I save the output to. Side effects like this may seem harmless, but if you were to write this as one of several functions in the middle of some code, and then come back to it 6 months later, its likely you'll forget side effects, which can cause all sorts of headaches.

2 Comments

is there a good way to return a data frame maintaining the original name? For example, without having to specify df <- d in the above code?
See my edit above. Note that in the original version, you wouldn't use df <- d, you'd use df <-add_ages(df).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.