1
$\begingroup$

So, I downloaded this Ecommerce dataset from kaggle here:
https://www.kaggle.com/datasets/kolawale/focusing-on-mobile-app-or-website

After converting it to a csv file, there seems to be an issue. The data starting from 2nd row (1st row contains the column names like Email, Address, Avatar, Avg. Session Length, Time on App, Yearly Amount Spent etc) onwards seems to be spilling over between adjacent rows.

For example, data(separated by commas) corresponding to one customer is not contained in one row. Some of it is contained in one row and the rest is in the row below it. When I do 'Text to Column' in excel, the data gets populated in the columns such that it makes no sense i.e Address data of a customer will be in the 'email' column, 'Avg Session Length' data will be in the 'Time on App' column and so on. Moreover, most of the cells become empty and lose data so that these cells get converted to NaN when read in by pandas' read_csv() function. This is shown below:

Data Spilling Over between adjacent rows

Doing 'Text to Column' on this data will cause the data to be split between columns in such a way that important data is lost. Reading this csv file using pandas' read_csv() function will almost certainly convert these empty cells to 'NaN'. This is shown below:

Half the cells become emtpy and data is lost

What is the work around for this? How do you combine the data in this csv file such that data for each customer is contained in one row only? Any help or advice is much appreciated. Thanks.

$\endgroup$

1 Answer 1

2
$\begingroup$

The issue is that the address field is a string with a new line character in the middle. Excel doesn't like it but the Pandas csv reader can deal with it:

df = pd.read_csv("Ecommerce Customers.csv")

df1

If you want to replace the special character by a more usual space character:

df["Address"] = df["Address"].str.replace("\n", " ")

Then you could save the dataframe as a csv file with df.to_csv("Ecommerce Customers2.csv") and open it in Excel if you wish but I guess that you will keep using Pandas to analyze the data.

$\endgroup$
2
  • $\begingroup$ Is there anything you did before you used the pd.read_csv() function to read in the csv file? Cause it seems like the data is getting converted to dataframe perfectly the way its supposed to but when I do the same thing, only one column appears at the top which contains names of all the labels separated by commas like this : 'Email, Address, Avg Session Length, Time on App, Time on Website' etc. instead of separate columns with separate names and data in them. $\endgroup$ Commented Jan 30 at 9:07
  • $\begingroup$ No, I just added ".csv" at the file name but I've just checked that it works with the original file name. Did you do something on the file ? Maybe delete the one you have and download it again to start fresh. What you get is as if it didn't use the comma separator or the header line equals to 0 but both are default parameters, that's strange. What is your Pandas version ? $\endgroup$ Commented Jan 30 at 9:46

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.