1

I have two 2D arrays in Rust like this

    let d1: [[T; 3]; 3] = ...
    let d2: [[T; 3]; 3] = ...

Now I want to add the elements element-wise and assign the result to a new 2D array.

My current solution looks like this:


    let mut result = [[T::default(); ROW]; COL];
    
    for (i, row) in d1.iter().enumerate() {
        for (j, element) in row.iter().enumerate() {
            result[i][j] = *element + d2[i][j];
        }
    }

This is inefficient as it first fills result with default values only to override them later.

Therefore I want to solve this using iterators by flattening the 2D arrays and then adding the elements.

However, I don't know how I can construct a 2D array from the resulting flattened 1D iterator :

    let result = for sum in d1.iter().flatten().zip(
                            d2.iter().flatten())
                            .map(|(x, y)| x+y)
                            .collect().???
5
  • Have you benchmarked that it indeed makes a difference? Commented Jul 13, 2022 at 10:05
  • As I don't have a solution for the second approach, I can't benchmark it yet. I will when I have both approaches implemented. But I think, it is a valid question regardless of the performance. Commented Jul 13, 2022 at 10:39
  • 1
    This is surely a valid question out of curiousity, but if you actually want to use it I'd recommend you to benchmark. As for how to benchmark, you can us3 MaybeUninit to avoid the initialization overhead. Commented Jul 13, 2022 at 10:45
  • Yes, I'm currently learning Rust and decided to implement a simple Matrix class to learn in practice. I will definitely try benchmarking with MaybeUninit, haven't heard of it yet. Thank you! Commented Jul 13, 2022 at 10:53
  • Note that all of this runs on the stack, so you will run into stack overflows pretty quickly. Commented Jul 13, 2022 at 12:14

2 Answers 2

1

You can do a hybrid approach.

You can initialize a fixed-size array via first constructing a [(); N] and then calling .map() on it. Combined with iterators over the input arrays, this should be somewhat performant.

Although I still recommend benchmarking it vs the naive solution.

fn add_2d<'a, 'b, T, const W: usize, const H: usize>(
    d1: &'a [[T; W]; H],
    d2: &'b [[T; W]; H],
) -> [[T; W]; H]
where
    T: 'static,
    &'a T: std::ops::Add<&'b T, Output = T>,
{
    let mut iter_row = d1.iter().zip(d2.iter());

    [(); H].map(move |()| {
        let (row1, row2) = iter_row.next().unwrap();
        let mut elem_iter = row1.iter().zip(row2.iter());
        [(); W].map(move |()| {
            let (elem1, elem2) = elem_iter.next().unwrap();
            elem1 + elem2
        })
    })
}

fn main() {
    let d1: [[u32; 2]; 3] = [[1, 2], [4, 5], [7, 8]];
    let d2: [[u32; 2]; 3] = [[10, 20], [40, 50], [70, 80]];

    let sums = add_2d(&d1, &d2);
    println!("{:?}", sums);
}
[[11, 22], [44, 55], [77, 88]]

Alternatively, if you don't mind unsafe, this is what I would probably go for to achieve maximum performance:

use std::mem::MaybeUninit;

fn add_2d<'a, 'b, T, const W: usize, const H: usize>(
    d1: &'a [[T; W]; H],
    d2: &'b [[T; W]; H],
) -> [[T; W]; H]
where
    T: 'static,
    &'a T: std::ops::Add<&'b T, Output = T>,
{
    let mut result: MaybeUninit<[[T; W]; H]> = MaybeUninit::uninit();

    let arr_ptr = result.as_mut_ptr() as *mut [T; W];

    unsafe {
        for y in 0..H {
            let row_ptr = arr_ptr.add(y) as *mut T;
            let row1 = d1.get_unchecked(y);
            let row2 = d2.get_unchecked(y);
            for x in 0..W {
                row_ptr
                    .add(x)
                    .write(row1.get_unchecked(x) + row2.get_unchecked(x));
            }
        }

        result.assume_init()
    }
}

fn main() {
    let d1: [[u32; 2]; 3] = [[1, 2], [4, 5], [7, 8]];
    let d2: [[u32; 2]; 3] = [[10, 20], [40, 50], [70, 80]];

    let sums = add_2d(&d1, &d2);
    println!("{:?}", sums);
}
[[11, 22], [44, 55], [77, 88]]

Be sure to use .write() instead of a simple assignment operation, because the assignment operation would run the Drop function of the previous, uninitialized element. Which would cause undefined behaviour.


Benchmarks

I added hkBst's .map().collect().try_into().unwrap() based method to the benchmarks, as well:

#![feature(bench_black_box)]

use std::mem::MaybeUninit;

fn add_2d_iter<'a, 'b, T, const W: usize, const H: usize>(
    d1: &'a [[T; W]; H],
    d2: &'b [[T; W]; H],
) -> [[T; W]; H]
where
    T: 'static,
    &'a T: std::ops::Add<&'b T, Output = T>,
{
    let mut iter_row = d1.iter().zip(d2.iter());

    [(); H].map(move |()| {
        let (row1, row2) = iter_row.next().unwrap();
        let mut elem_iter = row1.iter().zip(row2.iter());
        [(); W].map(move |()| {
            let (elem1, elem2) = elem_iter.next().unwrap();
            elem1 + elem2
        })
    })
}

fn add_2d_uninit<'a, 'b, T, const W: usize, const H: usize>(
    d1: &'a [[T; W]; H],
    d2: &'b [[T; W]; H],
) -> [[T; W]; H]
where
    T: 'static,
    &'a T: std::ops::Add<&'b T, Output = T>,
{
    let mut result: MaybeUninit<[[T; W]; H]> = MaybeUninit::uninit();

    let arr_ptr = result.as_mut_ptr() as *mut [T; W];

    unsafe {
        for y in 0..H {
            let row_ptr = arr_ptr.add(y) as *mut T;
            let row1 = d1.get_unchecked(y);
            let row2 = d2.get_unchecked(y);
            for x in 0..W {
                row_ptr
                    .add(x)
                    .write(row1.get_unchecked(x) + row2.get_unchecked(x));
            }
        }

        result.assume_init()
    }
}

fn add_2d_unwrap<'a, 'b, T, const W: usize, const H: usize>(
    a: &'a [[T; W]; H],
    b: &'b [[T; W]; H],
) -> [[T; W]; H]
where
    T: std::fmt::Debug + 'static,
    &'a T: std::ops::Add<&'b T, Output = T>,
{
    a.iter()
        .zip(b.iter())
        .map(|(ai, bi)| {
            ai.iter()
                .zip(bi.iter())
                .map(|(aij, bij)| aij + bij)
                .collect::<Vec<_>>()
                .try_into()
                .unwrap()
        })
        .collect::<Vec<_>>()
        .try_into()
        .unwrap()
}

const W: usize = 30;
const H: usize = 30;
const ITERATIONS: usize = 1000000;

fn check_result(d1: &[[i32; W]; H], d2: &[[i32; W]; H], result: &[[i32; W]; H]) -> bool {
    for y in 0..H {
        for x in 0..W {
            if d1[y][x] + d2[y][x] != result[y][x] {
                return false;
            }
        }
    }
    true
}

fn bench<'a, 'b>(
    a: &'a [[i32; W]; H],
    b: &'b [[i32; W]; H],
    function: fn(&'a [[i32; W]; H], &'b [[i32; W]; H]) -> [[i32; W]; H],
) -> Vec<u128> {
    // Assert correctness and cache warmup
    {
        let sums = function(a, b);
        assert!(check_result(a, b, &sums));
        std::hint::black_box(sums);
    }

    // Bench runs
    (0..5)
        .into_iter()
        .map(|_| {
            let t0 = std::time::Instant::now();
            for _ in 0..ITERATIONS {
                std::hint::black_box(function(a, b));
            }
            t0.elapsed().as_micros()
        })
        .collect::<Vec<_>>()
}

fn main() {
    let mut num_iter = (1..).into_iter();
    let mut init_arr = move || [(); H].map(|()| [(); W].map(|()| num_iter.next().unwrap()));

    let d1 = init_arr();
    let d2 = init_arr();

    let times_iter = bench(&d1, &d2, add_2d_iter);
    let times_uninit = bench(&d1, &d2, add_2d_uninit);
    let times_unwrap = bench(&d1, &d2, add_2d_unwrap);

    println!("iter: {:?}", times_iter);
    println!("uninit: {:?}", times_uninit);
    println!("unwrap: {:?}", times_unwrap);
}
> cargo +nightly run --release
iter: [223519, 207772, 208043, 208497, 208807]
uninit: [202719, 202888, 201159, 202685, 201941]
unwrap: [864150, 866233, 863023, 859486, 852098]

Seems like unwrap is a bit slower, while the other two don't really make a difference.

Sign up to request clarification or add additional context in comments.

2 Comments

For me it seems that the "naive" implementation is the fastest, but requires the Default trait. The results are: naive: [402429, 401400, 401249, 400432, 400355] iter: [749904, 747838, 748133, 747185, 748023] uninit: [424482, 421838, 421431, 421345, 421134] unwrap: [3491545, 3465526, 3555841, 3514036, 3461563]
Interesting. I wonder why that is.
1

This code adds a 2D array without initializing it with default elements, but it does go through a Vec to array conversion, because you cannot collect directly into an array. I estimate the likelihood of this being optimized away to be much lower than for your default elements solution.

fn add_3by3(a: [[u32; 3]; 3], b: [[u32; 3]; 3]) -> [[u32; 3]; 3] {
    a.iter()
        .zip(b.iter())
        .map(|(ai, bi)| {
            ai.iter()
                .zip(bi.iter())
                .map(|(aij, bij)| aij + bij)
                .collect::<Vec<_>>()
                .try_into()
                .unwrap()
        })
        .collect::<Vec<_>>()
        .try_into()
        .unwrap()
}

Incidentally, I think it is probably a better idea to keep your data in a 1D array no matter what actual shape it has, and keep shape information on the side. That way every element-wise operation is very simple.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.