16

UPDATE: Note that the relevant function in Julia v1+ is view

Question: I would like to index into an array without triggering memory allocation, especially when passing the indexed elements into a function. From reading the Julia docs, I suspect the answer revolves around using the sub function, but can't quite see how...

Working Example: I build a large vector of Float64 (x) and then an index to every observation in x.

N = 10000000
x = randn(N)
inds = [1:N]

Now I time the mean function over x and x[inds] (I run mean(randn(2)) first to avoid any compiler irregularities in the timing):

@time mean(x)
@time mean(x[inds])

It's an identical calculation, but as expected the results of the timings are:

elapsed time: 0.007029772 seconds (96 bytes allocated)
elapsed time: 0.067880112 seconds (80000208 bytes allocated, 35.38% gc time)

So, is there a way around the memory allocation problem for arbitrary choices of inds (and arbitrary choice of array and function)?

2 Answers 2

16

Just use xs = sub(x, 1:N). Note that this is different from x = sub(x, [1:N]); on julia 0.3 the latter will fail, and on julia 0.4-pre the latter will be considerably slower than the former. On julia 0.4-pre, sub(x, 1:N) is just as fast as view:

julia> N = 10000000;

julia> x = randn(N);

julia> xs = sub(x, 1:N);

julia> using ArrayViews

julia> xv = view(x, 1:N);

julia> mean(x)
-0.0002491126429772525

julia> mean(xs)
-0.0002491126429772525

julia> mean(xv)
-0.0002491126429772525

julia> @time mean(x);
elapsed time: 0.015345806 seconds (27 kB allocated)

julia> @time mean(xs);
elapsed time: 0.013815785 seconds (96 bytes allocated)

julia> @time mean(xv);
elapsed time: 0.015871052 seconds (96 bytes allocated)

There are several reasons why sub(x, inds) is slower than sub(x, 1:N):

  • Each access xs[i] corresponds to x[inds[i]]; we have to look up two memory locations rather than one
  • If inds is not in order, you will get poor cache behavior in accessing the elements of x
  • It destroys the ability to use SIMD vectorization

In this case, the latter is probably the most important effect. This is not a Julia limitation; the same thing would happen were you to write the equivalent code in C, Fortran, or assembly.

Note that it's still faster to say sum(sub(x, inds)) than sum(x[inds]), (until the latter becomes the former, which it should by the time julia 0.4 is officially out). But if you have to do many operations with xs = sub(x, inds), in some circumstances it will be worth your while to make a copy, even though it allocates memory, just so you can take advantage of the optimizations possible when values are stored in contiguous memory.

Sign up to request clarification or add additional context in comments.

2 Comments

This is very interesting. I hadn't considered the advantage inherent in creating a new object in contiguous memory. Many thanks.
Out of curiosity, to transform array subsetting into index subsetting, is it ever worth it to sort in place, subindex, and then resort in place?
12

EDIT: Read tholy's answer too to get a full picture!

When using an array of indices, the situation is not great right now on Julia 0.4-pre (start of Feb 2015):

julia> N = 10000000;
julia> x = randn(N);
julia> inds = [1:N];
julia> @time mean(x)
elapsed time: 0.010702729 seconds (96 bytes allocated)
elapsed time: 0.012167155 seconds (96 bytes allocated)
julia> @time mean(x[inds])
elapsed time: 0.088312275 seconds (76 MB allocated, 17.87% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.073672734 seconds (76 MB allocated, 3.27% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.071646757 seconds (76 MB allocated, 1.08% gc time in 1 pauses with 0 full sweep)
julia> xs = sub(x,inds);  # Only works on 0.4
julia> @time mean(xs)
elapsed time: 0.057446177 seconds (96 bytes allocated)
elapsed time: 0.096983673 seconds (96 bytes allocated)
elapsed time: 0.096711312 seconds (96 bytes allocated)
julia> using ArrayViews
julia> xv = view(x, 1:N)  # Note use of a range, not [1:N]!
julia> @time mean(xv)
elapsed time: 0.012919509 seconds (96 bytes allocated)
elapsed time: 0.013010655 seconds (96 bytes allocated)
elapsed time: 0.01288134 seconds (96 bytes allocated)
julia> xs = sub(x,1:N)  # Works on 0.3 and 0.4
julia> @time mean(xs)
elapsed time: 0.014191482 seconds (96 bytes allocated)
elapsed time: 0.014023089 seconds (96 bytes allocated)
elapsed time: 0.01257188 seconds (96 bytes allocated)
  • So while we can avoid the memory allocation, we are actually slower(!) still.
  • The issue is indexing by an array, as opposed to a range. You can't use sub for this on 0.3, but you can on 0.4.
  • If we can index by a range, then we can use ArrayViews.jl on 0.3 and its inbuilt on 0.4. This case is pretty much as good as the original mean.

I noticed that with a smaller number of indices used (instead of the whole range), the gap is much smaller, and the memory allocation is low, so sub might be worth:

N = 100000000
x = randn(N)
inds = [1:div(N,10)]

@time mean(x)
@time mean(x)
@time mean(x)
@time mean(x[inds])
@time mean(x[inds])
@time mean(x[inds])
xi = sub(x,inds)
@time mean(xi)
@time mean(xi)
@time mean(xi)

gives

elapsed time: 0.092831612 seconds (985 kB allocated)
elapsed time: 0.067694917 seconds (96 bytes allocated)
elapsed time: 0.066209038 seconds (96 bytes allocated)
elapsed time: 0.066816927 seconds (76 MB allocated, 20.62% gc time in 1 pauses with 1 full sweep)
elapsed time: 0.057211528 seconds (76 MB allocated, 19.57% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.046782848 seconds (76 MB allocated, 1.81% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.186084807 seconds (4 MB allocated)
elapsed time: 0.057476269 seconds (96 bytes allocated)
elapsed time: 0.05733602 seconds (96 bytes allocated)

6 Comments

Very informative (as always). Many thanks. Perhaps it is worth adding that sub does not appear to work for vector indices at all in v0.3.5 (sub(x, inds) throws an error on my machine, but works if the second argument to sub is a range object)
Note the distinction between 1:N and [1:N], see my answer.
IainDunning, to clarify: where you say "it's not great now, but will be fixed before 0.4 is released," that's simply not true. It's already as well-optimized as computer hardware permits, and when indexing with a range sub is just as performant as view. There is no way to make indexing with a vector as performant as indexing with a range.
Hopefully I've fixed my answer so its not actively misleading, can you check?
See github.com/JuliaLang/julia/issues/10093. Thanks for pursuing this, IainDunning.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.