Declaring array data inside a class C++

Question

I am creating a class that needs to store different arrays like data. Those arrays will have mutable size, but all of the arrays inside the class will have the same size. The arrays will be later used for number crunching in methods provided by the class.

What is the best/standard way of declaring that kind of data inside the class?

Solution 1 – Raw arrays

class Example {
    double *Array_1;
    double *Array_2;
    double *Array_3;
    int size; //used to store size of all arrays
};

Solution 2 – std::vector for each array

class Example {
    vector<double> Array_1;
    vector<double> Array_2;
    vector<double> Array_3;
};

Solution 3 – A struct that stores each vertex and have a std::vector of that struct

struct Vertex{
    double Var_1;
    double Var_2;
    double Var_3;
};
class Example {
    vector<Vertex> data;
};

My conclusion as a beginner would be:

Solution 1 would have the best performance but would be the hardest to implement.

Solution 3 would be elegant and easier to implement, but I would run into problems when performing some calculations because the information would not be in an array format. This means numeric regular functions that receive arrays/vectors would not work (I would need to create temporary vectors in order to do the number crunching).

Solution 2 might be the middle of the way.

Any ideas for a 4th solution would be greatly appreciated.

1 wont have any better performance than 2. When compiling with optimizations, vector pretty much gets optimized away. — NathanOliver
– NathanOliver, Commented Jul 31, 2019 at 14:46
The general recommendation is that closely related data should be structured as a single unit. Like your Vertex structure. And using a vector (or array) of vertex structures is very common and often used in "mathematical calculations" using parallelism or CUDA kernels or the like. — Some programmer dude
– Some programmer dude, Commented Jul 31, 2019 at 14:55
AoS vs. SoA is something of a field of research all of its own. Conceptually, parallel arrays are plainly inferior, but that doesn’t always carry the day. — Davis Herring
– Davis Herring, Commented Jul 31, 2019 at 15:14
@JesperJuhl: array is obviously wrong here—this is between unique_ptr and vector, and is really the long-rejected dynarray (plus the struct business). — Davis Herring
– Davis Herring, Commented Jul 31, 2019 at 15:17

Carlton · Accepted Answer · 2019-07-31 15:26:24Z

2

Don't use raw arrays. Options 2 and 3 are reasonable, the difference depends on how you'll be traversing the data. If you'll frequently be going through the arrays individually, you should store them as in solution #2 because each vector will be stored contiguously in memory. If you'll be going through them as sets of points, then solution 3 is probably better. If you want to go with solution #2 and it's critical that the arrays always be synchronized (same size, etc.) then I would make them private and control access to them through member functions. Example:

class Example
{
private:
    vector<double> Array_1;
    vector<double> Array_2;
    vector<double> Array_3;

public:
    void Push_data(double val1, double val2, double val3) {
        Array_1.push_back(val1);
        Array_2.push_back(val2);
        Array_3.push_back(val3);
    }

    vector<double> Get_all_points_at_index(size_t index) const {
        if (index < Array_1.size())
            return {Array_1[index], Array_2[index], Array_3[index]};
        else
            throw std::runtime_error("Error: index out of bounds");
    }

    const vector<double>& Get_array1() const {
        return Array_1;
    }

    void Clear_all() {
        Array_1.clear();
        Array_2.clear();
        Array_3.clear();
    }
};

This way, users of the class aren't burdened with the responsibility of making sure they add/remove values from all the vectors evenly - you do that with your class's member functions where you have complete control over the underlying data. The accessor functions should be written such that it's impossible for a user (including you) to un-syncronize the data.

answered Jul 31, 2019 at 15:26

Carlton

4,3052 gold badges28 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

hegel5000 Over a year ago

One recommendation: Get_all_points_at_index should probably return std::array<double, 3>, or a special Vertex type, just to communicate the size guarantee (not to mention avoiding a performance hit from repeated dynamic allocation).

anatolyg · Accepted Answer · 2019-07-31 15:27:43Z

If you are going to process big amounts of data, then solutions 1 and 2 are pretty much the same - the only meaningful difference is that solution 1 is hard to protect against memory leaks (while solution 2 deallocates your data when needed automatically).

The difference between solutions 2 and 3 is what people often call "Structure of arrays" vs "Array of structures". The runtime efficiency of these solutions depends on what your code does with them. The general principle is locality of reference. If your code frequently does number crunching only on the first component of your vertex data, then use structure of arrays (solution 2). However, any complex code will work on all of the data, so I guess solution 3 (array of structures) is the best.

Note that this example is rather pure. If your data contains elements that are sometimes used in number crunching and sometimes not (e.g. it does some transformation on two coordinates of the vertices, while leaving the third untouched), then you might need to implement some kind of in-between solution - copy only the needed data to some place, transform it and copy the results back.

TonySalimi · Accepted Answer · 2019-07-31 15:58:46Z

Forget about approach 1 (as the others have mentioned) and stick to either approach 2 or 3 which best fits your needs. To me, I see your code as a part of an application/library that manages coordinates/data of a 3D space. So, you should think which operation you need to do on these 3D coordinates/data and which approach makes your code simpler or more efficient. As an example, if at some moment you need to pass the raw data of one dimension to a third-party library (e.g. for visualization stuff) you should go for approach 2.

As an concrete example, VTK (the visualization toolkit) has lots of data structures that keep 3D data in both ways, either like your 2nd approach (see vtkTypedDataArray) or your like 3rd approach (see vtkAOSDataArrayTemplate). Taking a look at them may give you some ideas.

Collectives™ on Stack Overflow

Declaring array data inside a class C++

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related