3

I am creating a class that needs to store different arrays like data. Those arrays will have mutable size, but all of the arrays inside the class will have the same size. The arrays will be later used for number crunching in methods provided by the class.

What is the best/standard way of declaring that kind of data inside the class?

Solution 1 – Raw arrays

class Example {
    double *Array_1;
    double *Array_2;
    double *Array_3;
    int size; //used to store size of all arrays
};

Solution 2 – std::vector for each array

class Example {
    vector<double> Array_1;
    vector<double> Array_2;
    vector<double> Array_3;
};

Solution 3 – A struct that stores each vertex and have a std::vector of that struct

struct Vertex{
    double Var_1;
    double Var_2;
    double Var_3;
};
class Example {
    vector<Vertex> data;
};

My conclusion as a beginner would be:

Solution 1 would have the best performance but would be the hardest to implement.

Solution 3 would be elegant and easier to implement, but I would run into problems when performing some calculations because the information would not be in an array format. This means numeric regular functions that receive arrays/vectors would not work (I would need to create temporary vectors in order to do the number crunching).

Solution 2 might be the middle of the way.

Any ideas for a 4th solution would be greatly appreciated.

9
  • 1
    1 wont have any better performance than 2. When compiling with optimizations, vector pretty much gets optimized away. Commented Jul 31, 2019 at 14:46
  • 1
    The general recommendation is that closely related data should be structured as a single unit. Like your Vertex structure. And using a vector (or array) of vertex structures is very common and often used in "mathematical calculations" using parallelism or CUDA kernels or the like. Commented Jul 31, 2019 at 14:55
  • 1
    std::array or std::vector. Not raw arrays. Commented Jul 31, 2019 at 15:11
  • 2
    AoS vs. SoA is something of a field of research all of its own. Conceptually, parallel arrays are plainly inferior, but that doesn’t always carry the day. Commented Jul 31, 2019 at 15:14
  • 1
    @JesperJuhl: array is obviously wrong here—this is between unique_ptr and vector, and is really the long-rejected dynarray (plus the struct business). Commented Jul 31, 2019 at 15:17

3 Answers 3

2

Don't use raw arrays. Options 2 and 3 are reasonable, the difference depends on how you'll be traversing the data. If you'll frequently be going through the arrays individually, you should store them as in solution #2 because each vector will be stored contiguously in memory. If you'll be going through them as sets of points, then solution 3 is probably better. If you want to go with solution #2 and it's critical that the arrays always be synchronized (same size, etc.) then I would make them private and control access to them through member functions. Example:

class Example
{
private:
    vector<double> Array_1;
    vector<double> Array_2;
    vector<double> Array_3;

public:
    void Push_data(double val1, double val2, double val3) {
        Array_1.push_back(val1);
        Array_2.push_back(val2);
        Array_3.push_back(val3);
    }

    vector<double> Get_all_points_at_index(size_t index) const {
        if (index < Array_1.size())
            return {Array_1[index], Array_2[index], Array_3[index]};
        else
            throw std::runtime_error("Error: index out of bounds");
    }

    const vector<double>& Get_array1() const {
        return Array_1;
    }

    void Clear_all() {
        Array_1.clear();
        Array_2.clear();
        Array_3.clear();
    }
};

This way, users of the class aren't burdened with the responsibility of making sure they add/remove values from all the vectors evenly - you do that with your class's member functions where you have complete control over the underlying data. The accessor functions should be written such that it's impossible for a user (including you) to un-syncronize the data.

Sign up to request clarification or add additional context in comments.

1 Comment

One recommendation: Get_all_points_at_index should probably return std::array<double, 3>, or a special Vertex type, just to communicate the size guarantee (not to mention avoiding a performance hit from repeated dynamic allocation).
1

If you are going to process big amounts of data, then solutions 1 and 2 are pretty much the same - the only meaningful difference is that solution 1 is hard to protect against memory leaks (while solution 2 deallocates your data when needed automatically).

The difference between solutions 2 and 3 is what people often call "Structure of arrays" vs "Array of structures". The runtime efficiency of these solutions depends on what your code does with them. The general principle is locality of reference. If your code frequently does number crunching only on the first component of your vertex data, then use structure of arrays (solution 2). However, any complex code will work on all of the data, so I guess solution 3 (array of structures) is the best.

Note that this example is rather pure. If your data contains elements that are sometimes used in number crunching and sometimes not (e.g. it does some transformation on two coordinates of the vertices, while leaving the third untouched), then you might need to implement some kind of in-between solution - copy only the needed data to some place, transform it and copy the results back.

Comments

0

Forget about approach 1 (as the others have mentioned) and stick to either approach 2 or 3 which best fits your needs. To me, I see your code as a part of an application/library that manages coordinates/data of a 3D space. So, you should think which operation you need to do on these 3D coordinates/data and which approach makes your code simpler or more efficient. As an example, if at some moment you need to pass the raw data of one dimension to a third-party library (e.g. for visualization stuff) you should go for approach 2.

As an concrete example, VTK (the visualization toolkit) has lots of data structures that keep 3D data in both ways, either like your 2nd approach (see vtkTypedDataArray) or your like 3rd approach (see vtkAOSDataArrayTemplate). Taking a look at them may give you some ideas.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.