So, I downloaded this Ecommerce dataset from kaggle here:
https://www.kaggle.com/datasets/kolawale/focusing-on-mobile-app-or-website
After converting it to a csv file, there seems to be an issue. The data starting from 2nd row (1st row contains the column names like Email, Address, Avatar, Avg. Session Length, Time on App, Yearly Amount Spent etc) onwards seems to be spilling over between adjacent rows.
For example, data(separated by commas) corresponding to one customer is not contained in one row. Some of it is contained in one row and the rest is in the row below it. When I do 'Text to Column' in excel, the data gets populated in the columns such that it makes no sense i.e Address data of a customer will be in the 'email' column, 'Avg Session Length' data will be in the 'Time on App' column and so on. Moreover, most of the cells become empty and lose data so that these cells get converted to NaN when read in by pandas' read_csv() function. This is shown below:
Doing 'Text to Column' on this data will cause the data to be split between columns in such a way that important data is lost. Reading this csv file using pandas' read_csv() function will almost certainly convert these empty cells to 'NaN'. This is shown below:
What is the work around for this? How do you combine the data in this csv file such that data for each customer is contained in one row only? Any help or advice is much appreciated. Thanks.


