I have already asked a question, but I am facing a problem when I execute my following code for files with over million rows.
Code:
import numpy as np
import pandas as pd
import xlrd
import xlsxwriter
df = pd.read_excel('full-cust-data-nonconcat.xlsx')
df =df.groupby('ORDER_ID')['ASIN'].agg(','.join).reset_index()
writer = pd.ExcelWriter('PythonExport-Data.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
print df
Error:
Traceback (most recent call last):
File "grouping-data.py", line 9, in <module>
df =df.groupby('ORDER_ID')['ASIN'].agg(','.join).reset_index()
File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 2668, in aggregate
result = self._aggregate_named(func_or_funcs, *args, **kwargs)
File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 2786, in _aggregate_named
output = func(group, *args, **kwargs)
TypeError: sequence item 0: expected string, int found
Since its a huge file how can I check where is it finding string and getting int?
Is there any way I can convert all this to string first?
Sample Data: (these ids are alpha numeric)
ID1 Some_other_id1
ID2 Some_other_id2