2

If I have a matrix, does there exist a way to apply a function on the rows of a matrix in such way that a certain number of rows are grouped?

As an example: I might like to solve a least squares problem using QR decomposition on a matrix for every ten of my hundred rows. This might look like:

set.seed(128)
f <- function(x) x^2 -x + 1
x <- runif(1000, -1, 1)
y <- f(x) + rnorm(1000, 0, 0.2)

morpheus <- cbind(1,x,x^2)
# apply qr.solve(morpheus, y) 100 times on 10 rows at a time 
# in such way that the correspondence between morpheus and y is not broken

Would anybody now how this problem could be solved? If it would be possible, I'd prefer an approach using any form of apply or other functional solution, but still any help is welcome

3 Answers 3

3

Using dplyr

library(dplyr)
morpheus %>% group_by(rep(1:10, 100)) %>% do(as.data.frame(rbind(qr.solve(cbind(.$const, .$x, .$x_sq), .$y))))
Source: local data frame [10 x 4]
Groups: rep(1:10, 100)

   rep(1:10, 100)        V1         V2        V3
1               1 1.0410480 -0.9616138 0.8777193
2               2 0.9883532 -0.9751688 1.0431504
3               3 1.0263414 -1.0053184 0.8811848
4               4 1.0114099 -1.0024364 0.9341063
5               5 1.0059417 -0.9694164 0.9322200
6               6 1.0501467 -1.0186771 0.9048468
7               7 0.9748101 -1.0045796 1.0932815
8               8 0.9784629 -0.9572418 1.0008312
9               9 0.9559010 -1.0271767 1.0823086
10             10 0.9435522 -1.0583352 1.0804009
Sign up to request clarification or add additional context in comments.

Comments

2

I think the simplest solution, apart from for loop, would be using by

f <- function(x) x^2 -x + 1
x <- runif(1000, -1, 1)
y <- f(x) + rnorm(1000, 0, 0.2)

morpheus <- cbind(1,x,x^2,y, rep(1:100,each=10))

by(morpheus[,1:4], morpheus[,5], function(x)qr.solve(x[,1:3],x[,4]))

     INDICES: 1
        V1          x         V3 
     1.1359248 -0.7800506  0.6642460 
    --------------------------------------------------------------------------------- 
    INDICES: 2
       V1          x         V3 
     0.9156199 -1.0999112  1.0019637 
    --------------------------------------------------------------------------------- 
    INDICES: 3
       V1          x         V3 
     0.9901892 -0.8275427  1.2576495 

### etc.

UPDATE: you can use do.call to get the results into a matrix for further use:

do.call('rbind',as.list(
  by(morpheus[,1:4], morpheus[,5], function(x){
    qr.solve(x[,1:3],x[,4])
  })
))

# results:

          V1          x        V3
1   0.9445907 -1.0655362 0.9471155
2   1.0370279 -0.8100258 0.7440526
3   0.9681344 -0.7442517 0.9108040
### etc.

2 Comments

Don't use by, it is usually useless for further operations as it is designed to be the last output.
Ok, but that could be solved, i.e. by do.call (see the updated answer)
0

If you have an additional variable that labels the set of rows you want to independently apply your function on, you may want to try

 library('data.table')
 iris <- as.data.table(iris)
 iris[,
      apply(.SD,1, mean),
      by = Species
      ]

       Species    V1
  1:    setosa 2.550
  2:    setosa 2.375
  3:    setosa 2.350
  4:    setosa 2.350
  5:    setosa 2.550
 ---                
146: virginica 4.300
147: virginica 3.925
148: virginica 4.175
149: virginica 4.325
150: virginica 3.950

and replace mean with any other function of your choice, by = variable being the variable allowing you to group by ten rows at a time.

2 Comments

In that case i would suggest aggregate or using dplyr or plyr packages, which are faster
@Zbynek aggregate, as well as dplyr and plyr, are by no means faster: see github.com/Rdatatable/data.table/wiki/Benchmarks-:-Grouping.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.