0

I am struggling with this problem for a day and cannot find a solution anywhere online. I have four cell arrays with data per country on which I perform operations to find a number of countries that I want to analyse. I have saved these countries in a 27x1 cell array with nonnumerical attributes whose output looks like:

'Belgium'
'Bulgaria'
'Croatia'
'Cyprus'
'Czechia'
'Denmark'
'Estonia'

This is an example of the rows that I want to subtract from other cell arrays with data per country. The problem is that cell arrays do not allow indexing which means that I cannot use these to subtract data from other cell arrays. So what I want as output is an array that allows indexing such that I can use that array to subtract information of other cell arrays.

What I have tried:

  • I have tried str2double to create rows that allow indexing. This resulted in NaN values which did not allow any operation
  • I have tried cell2mat which gave the error: Dimensions of arrays being concatenated are not consistent.
  • I have tried to create a table from cell arrays, but I couldent paste all the data in it from the different cell arrays because I couldent subtract it

I am new here so I dont know how I can append my .m file and cell arrays. Therefore, I add a part of my code here:

[~,ia,ib] = intersect(pop(:,1),gdp(:,1));
Com_popgdp = [pop(ia,1:2),gdp(ib,2)];

[~,ia,ib] = intersect(fp(:,1),lr(:,1));
Com_fplr = [fp(ia,1:2),lr(ib,2)];

[~,ia,ib] = intersect(Com_popgdp(:,1),Com_fplr(:,1));
Com_all = [Com_popgdp(ia,1:2),Com_fplr(ib,2)]; 

Com_all = Com_all(:,1);

%Com_all is the resulting cell array with all countries that I want to
%analyse resulting from the intersections of cell arrays. For the analysis, 
%I must extract the Com_all rows from
%pop/gdp/fp/lr. However, this is not possible with cell arrays. How can I
%access and extract the rows from pop/gdp/fp/lr for my analysis?

Could anyone help me find a way in which I can use the selection cell arrays as indexing to subtract data from other cell arrays? Which method would be appropriate?

12
  • How are the initial cell arrays getting created? Are you reading them in from a file. Are they hard coded in the script somewhere? My initial thought is that you should create a table with columns for each variable: country name, GDP, etc. Initialize it with all of the countries from all of the datasets and NaNs for the numeric data. Than fill in the missing values that you have. Then only keep rows without NaNs. If you can find a way to share the data I will post an answers with more explicit details. Commented Nov 7, 2019 at 19:55
  • Hi goryh, thanks for replying. I downloaded .tsv files from Eurostat statistics. These are translated by a tsv2cell function in Matlab and from then I use standard Matlab operations. Commented Nov 7, 2019 at 19:58
  • How can I add such part of data? The countries that I provided are not sufficient? Commented Nov 7, 2019 at 20:26
  • 2
    Instead of creating a new question, you should have edited your old one so it can be reopenend. Note that the system will automatically ban you from asking questions if you have too many negatively-received questions, so it is always best to improve your questions than to ignore and/or delete them. Commented Nov 7, 2019 at 20:30
  • Hi Cris, I thought that it was closed and could not be reopened. Thanks your input! Commented Nov 7, 2019 at 20:33

2 Answers 2

1

There is a simpler solution than I initially thought.

First, change our cell arrays into tables

gdp = cell2table(gdp,'VariableNames',{'country','gdp'})

Or you could read them in directly as tables (https://www.mathworks.com/help/matlab/ref/readtable.html).

As long as all the tables have the same name for column with the country name you can then use innerjoin to it the intersection of the tables based on the country.

Here is the example I run to test it:

gdp = {'Belgium',1;'Bulgaria',2;'Croatia',3};
pop = {'Croatia',30; 'Cyprus', 40; 'Czechia', 50};
gdp = cell2table(gdp,'VariableNames',{'country','gdp'})
gdp =

  3×2 table

     country      gdp
    __________    ___

    'Belgium'      1 
    'Bulgaria'     2 
    'Croatia'      3 

popTable = cell2table(pop,'VariableNames',{'country','pop'})
pop =

  3×2 table

     country     pop
    _________    ___

    'Croatia'    30 
    'Cyprus'     40 
    'Czechia'    50

innerjoin(gdpTable,popTable)
1×3 table

     country     gdp    pop
    _________    ___    ___

    'Croatia'     3     30 
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks goryh! I try to implement this in my script and let you know if it solves the problem
I receive an error because I cannot translate the cell arrays into tables. I tried this with the gdp array: gdp = cell2table(gdp,'VariableNames','gdp'); which gave the error: Error using cell2table (line 69) The VariableNames property is a cell array of character vectors. To assign multiple variable names, specify names in a string array or a cell array of character vectors.
The third argument in the cell2table needs to be a cell array. Putting 'gdp' will throw an error. Putting {'gdp'} will not (at least not the same error). If you do not have a name for very column in you will also get an error, hence my example: cell2table(gdp,'VariableNames',{'country','gdp'})
It is not letting me edit my comment. Let's try that last sentence again: If you do not have a name for very column in the data, you will also get an error, hence my example named both columns: cell2table(gdp,'VariableNames',{'country','gdp'})
Goryh, I had to sleep a night to fully understand and implement it. Now, after some modifications I have the gdp table. I work on modifying the other cell arrays and then try to innerjoin them. Many thanks!
0

It seems to me that you first want to compute the intersection of all lists of country names, then index the cell arrays accordingly. intersect finds the intersection of two lists, you can call it multiple times in sequence to intersect multiple lists. And ismember finds which of the selected countries is present. For example:

A1 = {
'Bulgaria',2
'Croatia',3
'Cyprus',4
'Czechia',5
'Denmark',6
'Estonia',7
};

A2 = {
'Belgium',11
'Bulgaria',12
'Croatia',13
'Cyprus',14
'Denmark',16
'Estonia',17
};

A3 = {
'Belgium',21
'Croatia',23
'Cyprus',24
'Czechia',25
'Denmark',26
'Estonia',27
};

[countries] = intersect(A1(:,1),A2(:,1));
[countries] = intersect(countries,A3(:,1));
i1 = ismember(A1(:,1),countries); A1 = A1(i1,:);
i2 = ismember(A2(:,1),countries); A2 = A2(i2,:);
i3 = ismember(A3(:,1),countries); A3 = A3(i3,:);
A = [A1,A2(:,2),A3(:,2)];

The code above does assume that the three input cell arrays all have the countries in the same order. If this is not the case, sort the arrays using sort before matching them up with the list of selected countries:

i1 = ismember(sort(A1(:,1)),countries);

4 Comments

Hello Cris, thanks for you elaborate answer! Your understanding of my goal is correct and I think the ismember is the solution I was looking for. It took me some time to modify it for to implement specific columns. But this solution works! Thanks again and I have accepted your answer!
I do have one more question. Is there a way that I can reduce the number of lines in which the same operations are performed? Instead of 3 ismember lines, I use four which perform the same operation. Can this be simplified by implementing a loop?
@WouterWizard: If you do A={A1,A2,A3,A4} you can then address the arrays as A{i}, which you can use in a loop.
Great, it works perfect! Thanks for your commitment!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.