0

I have some lines of code with a for loop that look like this:

somevector2 <- c(length = somevector2_length)

for(string in somevector1){

  df2 <- df1[df1$col1 == string, ]
  ff <- somefunction(df2$col2)
  somevector2 <- c(somevector2, ff)

}

From what i understood initializing the vector with the correct length should make the loop faster, but it still takes quite sometimes although the somefunction(df2$col2) does some simple operations.somevector1 it's just a vector of strings

Is there a way to make this loop faster in R? thank you very much

1 Answer 1

2

Sorry, but that's not how you are supposed to post a question on SO. :( You should provide a working example. Also, that's not the way to create a vector of a fixed length.


Let's see a reproducible example of what you posted:

##### this makes your example reproducible

somevector1 <- unique(iris$Species)
df1 <- iris
names(df1) <- paste0("col", 5:1)
somefunction <- sum
somevector2_length <- 3



##### this is your code

# somevector2 <- c(length = somevector2_length) # <- this was wrong
somevector2 <- c()


for(string in somevector1){
 
 df2 <- df1[df1$col1 == string, ]
 ff <- somefunction(df2$col2)
 somevector2 <- c(somevector2, ff)
 
}

So this is the final result:

somevector2
#>  12.3  66.3 101.3

What I suggest you is to use this line of code down here, instead of your code. You will get a similar result (it's a NAMED numeric vector).

tapply(df1$col2, df1$col1, somefunction)
#>    setosa versicolor  virginica 
#>      12.3       66.3      101.3 

You can get rid of the names with unname()

Sign up to request clarification or add additional context in comments.

4 Comments

thanks, i didn't know i had to provide always a reproducible example, i'll make sure to do so in future questions
not always.. there are some situations were you can't... but generally yes... Also, if this is what you were looking for, I would invite you to accept the answer.
I'm trying to understand this solution you provided, but i don't quite understand the point where this line you provided tapply(df1$col2, df1$col1, somefunction) takes into account the filtering of the df based on the string value provided in the for loop as seen in for(string in somevector1){ df2 <- df1[df1$col1 == string, ].
that line of code replaces your entire code. If you want just some spefic strings, instead of all of them, just filter your dataframe before that line like this: df1 <- df1[df1$col1 %in% somevector1, ]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.