Revisions to Why is addressing an array of vectors more efficient than addressing an array of matrices in Cg?

added 219 characters in body

Source Link

edited Jul 14, 2015 at 21:10

12.8k
4
46
57

There is definitely a performance implication for using an array matrices, personally I can see two reasons for this. Memory layout and index calculations.

Generally speaking contiguous memory layout is much faster to access than non-contiguous memory layout. It's a common practice to flatten 2D arrays into 1D arrays. As noted the implication is that it leaks the implementation so you have to handle this fact by changing the indexing.

But I can be wrong, I suspect that the matrix implementation in Cg language actually a 1D array. So this brings us to the second point, if the implementation in Cg of a matrix is actually a 1D array the only different between using an array of matrices or array of vectors (flattened matrix) is actually the index calculation, given in the article they are using floating point to calculate the index and a single precision floating point multiply, add, and multiply-add take 4 clock cycles per warp. The arrays of vectors only needs one index the array of matrices need two, this leads to less instructions per lookup, and remember in shaders every instruction matters.

Update regarding your question in the comments:

What you said makes sense, what I speculate is that the compiler will notice that boneMatrix doesn't change. So the compiler won't allocate a new matrix and just reference the old values, so it's not actually constructing a new matrix just aliasing the vectors to be able to use matrix operations. But how can we be sure? someone need to check the generated code..

Update: this has been confirmed by @EternalWind (check the comments) the compiler doesn't construct a new matrix but actually reference the vectors, moreover it was able to vectorize the operation using dot product.

There is definitely a performance implication for using an array matrices, personally I can see two reasons for this. Memory layout and index calculations.

Generally speaking contiguous memory layout is much faster to access than non-contiguous memory layout. It's a common practice to flatten 2D arrays into 1D arrays. As noted the implication is that it leaks the implementation so you have to handle this fact by changing the indexing.

But I can be wrong, I suspect that the matrix implementation in Cg language actually a 1D array. So this brings us to the second point, if the implementation in Cg of a matrix is actually a 1D array the only different between using an array of matrices or array of vectors (flattened matrix) is actually the index calculation, given in the article they are using floating point to calculate the index and a single precision floating point multiply, add, and multiply-add take 4 clock cycles per warp. The arrays of vectors only needs one index the array of matrices need two, this leads to less instructions per lookup, and remember in shaders every instruction matters.

Update regarding your question in the comments:

What you said makes sense, what I speculate is that the compiler will notice that boneMatrix doesn't change. So the compiler won't allocate a new matrix and just reference the old values, so it's not actually constructing a new matrix just aliasing the vectors to be able to use matrix operations. But how can we be sure? someone need to check the generated code..

There is definitely a performance implication for using an array matrices, personally I can see two reasons for this. Memory layout and index calculations.

Generally speaking contiguous memory layout is much faster to access than non-contiguous memory layout. It's a common practice to flatten 2D arrays into 1D arrays. As noted the implication is that it leaks the implementation so you have to handle this fact by changing the indexing.

But I can be wrong, I suspect that the matrix implementation in Cg language actually a 1D array. So this brings us to the second point, if the implementation in Cg of a matrix is actually a 1D array the only different between using an array of matrices or array of vectors (flattened matrix) is actually the index calculation, given in the article they are using floating point to calculate the index and a single precision floating point multiply, add, and multiply-add take 4 clock cycles per warp. The arrays of vectors only needs one index the array of matrices need two, this leads to less instructions per lookup, and remember in shaders every instruction matters.

Update regarding your question in the comments:

What you said makes sense, what I speculate is that the compiler will notice that boneMatrix doesn't change. So the compiler won't allocate a new matrix and just reference the old values, so it's not actually constructing a new matrix just aliasing the vectors to be able to use matrix operations. But how can we be sure? someone need to check the generated code..

Update: this has been confirmed by @EternalWind (check the comments) the compiler doesn't construct a new matrix but actually reference the vectors, moreover it was able to vectorize the operation using dot product.

added 736 characters in body

Source Link

edited Jul 14, 2015 at 15:37

concept3d

12.8k
4
46
57

There is definitely a performance implication for using an array matrices, personally I can see two reasons for this. Memory layout and index calculations.

Generally speaking contiguous memory layout is much faster to access than non-contiguous memory layout. It's a common practice to flatten 2D arrays into 1D arrays. As noted the implication is that it leaks the implementation so you have to handle this fact by changing the indexing.

But I can be wrong, I suspect that the matrix implementation in Cg language actually a 1D array. So this brings us to the second point, if the implementation in Cg of a matrix is actually a 1D array the only different between using an array of matrices or array of vectors (flattened matrix) is actually the index calculation, given in the article they are using floating point to calculate the index and a single precision floating point multiply, add, and multiply-add take 4 clock cycles per warp. The arrays of vectors only needs one index the array of matrices need two, this leads to less instructions per lookup, and remember in shaders every instruction matters.

Update regarding your question in the comments:

What you said makes sense, what I speculate is that the compiler will notice that boneMatrix doesn't change. So the compiler won't allocate a new matrix and just reference the old values, so it's not actually constructing a new matrix just aliasing the vectors to be able to use matrix operations. But how can we be sure? someone need to check the generated code..

There is definitely a performance implication for using an array matrices, personally I can see two reasons for this. Memory layout and index calculations.

Generally speaking contiguous memory layout is much faster to access than non-contiguous memory layout. It's a common practice to flatten 2D arrays into 1D arrays. As noted the implication is that it leaks the implementation so you have to handle this fact by changing the indexing.

But I can be wrong, I suspect that the matrix implementation in Cg language actually a 1D array. So this brings us to the second point, if the implementation in Cg of a matrix is actually a 1D array the only different between using an array of matrices or array of vectors (flattened matrix) is actually the index calculation, given in the article they are using floating point to calculate the index and a single precision floating point multiply, add, and multiply-add take 4 clock cycles per warp. The arrays of vectors only needs one index the array of matrices need two, this leads to less instructions per lookup, and remember in shaders every instruction matters.

There is definitely a performance implication for using an array matrices, personally I can see two reasons for this. Memory layout and index calculations.

Generally speaking contiguous memory layout is much faster to access than non-contiguous memory layout. It's a common practice to flatten 2D arrays into 1D arrays. As noted the implication is that it leaks the implementation so you have to handle this fact by changing the indexing.

But I can be wrong, I suspect that the matrix implementation in Cg language actually a 1D array. So this brings us to the second point, if the implementation in Cg of a matrix is actually a 1D array the only different between using an array of matrices or array of vectors (flattened matrix) is actually the index calculation, given in the article they are using floating point to calculate the index and a single precision floating point multiply, add, and multiply-add take 4 clock cycles per warp. The arrays of vectors only needs one index the array of matrices need two, this leads to less instructions per lookup, and remember in shaders every instruction matters.

Update regarding your question in the comments:

What you said makes sense, what I speculate is that the compiler will notice that boneMatrix doesn't change. So the compiler won't allocate a new matrix and just reference the old values, so it's not actually constructing a new matrix just aliasing the vectors to be able to use matrix operations. But how can we be sure? someone need to check the generated code..

added 67 characters in body

Source Link

edited Jul 14, 2015 at 13:48

concept3d

12.8k
4
46
57

There is definitely a performance implication for using an array matrices, personally I can see two reasons for this. Memory layout and index calculations.

Generally speaking contiguous memory layout is much faster to access than non-contiguous memory layout. It's a common practice to flatten 2D arrays into 1D arrays. As noted the implication is that it leaks the implementation so you have to handle this fact by changing the indexing.

But I can be wrong, I suspect that the matrix implementation in Cg language actually a 1D array. So this brings us to the second point, if the implementation in Cg of a matrix is actually a 1D array the only different between using an array of matrices or array of vectors (flattened matrix) is actually the index calculation, given thatin the article they are using floating point to calculate the index and a single precision floating point multiply, add, and multiply-add take 4 clock cycles per warp. The arrays of vectors only needs one index the array of matrices need two, this leads to less instructions per lookup, and remember in shaders every instruction matters.

There is definitely a performance implication for using an array matrices, personally I can see two reasons for this. Memory layout and index calculations.

Generally speaking contiguous memory layout is much faster to access than non-contiguous memory layout. It's a common practice to flatten 2D arrays into 1D arrays. As noted the implication is that it leaks the implementation so you have to handle this fact by changing the indexing.

But I can be wrong, I suspect that the matrix implementation in Cg language actually a 1D array. So this brings us to the second point, if the implementation in Cg of a matrix is actually a 1D array the only different between using an array of matrices or array of vectors (flattened matrix) is actually the index calculation, given that a single precision floating point multiply, add, and multiply-add take 4 clock cycles per warp. The arrays of vectors only needs one index the array of matrices need two, this leads to less instructions per lookup, and remember in shaders every instruction matters.

There is definitely a performance implication for using an array matrices, personally I can see two reasons for this. Memory layout and index calculations.

Generally speaking contiguous memory layout is much faster to access than non-contiguous memory layout. It's a common practice to flatten 2D arrays into 1D arrays. As noted the implication is that it leaks the implementation so you have to handle this fact by changing the indexing.

But I can be wrong, I suspect that the matrix implementation in Cg language actually a 1D array. So this brings us to the second point, if the implementation in Cg of a matrix is actually a 1D array the only different between using an array of matrices or array of vectors (flattened matrix) is actually the index calculation, given in the article they are using floating point to calculate the index and a single precision floating point multiply, add, and multiply-add take 4 clock cycles per warp. The arrays of vectors only needs one index the array of matrices need two, this leads to less instructions per lookup, and remember in shaders every instruction matters.

Source Link

answered Jul 14, 2015 at 12:45

concept3d

12.8k
4
46
57

Loading

Stack Exchange Network

Return to Answer