I am working on large table using python (using pandas library).
I would like to perform various kind of vector operations such as Correlation with each rows of the table.
It might be a simple problem, but for me it is difficult to deal with the DataFrame structure. I do not have a good idea about how to convert each row (or column) into lists (or numpy arrays).
Even counting the number of rows does not seem to be a simple problem, because function like df.count() seems to ignore null data.
Simple data table and the expected result table are like below. In this case, I would like to calculate sum of each row pairs.
The size of real table is much bigger (more than 1000 rows and columns) and contains some null values.
Data.csv:
Label Col1 Col2
Row1 1 2
Row2 3 4
Row3 5 6
Output.csv:
Label Col3
Row1,Row2 4,6
Row1,Row3 6,8
Row2,Row3 8,10
null values? Is there an empty or aNaNvalue or is the value just equal to zero. What do you want the output to be like if there is such anull value?shapemethod:df.shape[0]will be amount of rows.