expected string, int found While using pandas dataframe on a large file

Question

I have already asked a question, but I am facing a problem when I execute my following code for files with over million rows.

Code:

import numpy as np
import pandas as pd
import xlrd 
import xlsxwriter


 df = pd.read_excel('full-cust-data-nonconcat.xlsx')

 df  =df.groupby('ORDER_ID')['ASIN'].agg(','.join).reset_index()

 writer = pd.ExcelWriter('PythonExport-Data.xlsx', engine='xlsxwriter')
 df.to_excel(writer, sheet_name='Sheet1')
 writer.save()

 print df

Error:

Traceback (most recent call last):
 File "grouping-data.py", line 9, in <module>
df  =df.groupby('ORDER_ID')['ASIN'].agg(','.join).reset_index()
 File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 2668, in aggregate
  result = self._aggregate_named(func_or_funcs, *args, **kwargs)
  File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 2786, in _aggregate_named
   output = func(group, *args, **kwargs)
 TypeError: sequence item 0: expected string, int found

Since its a huge file how can I check where is it finding string and getting int?

Is there any way I can convert all this to string first?

Sample Data: (these ids are alpha numeric)

ID1 Some_other_id1
ID2 Some_other_id2

akuiper · Accepted Answer · 2017-02-17 04:49:22Z

2

You can write a lambda expression in the agg function to do the conversion:

df.groupby('ORDER_ID')['ASIN'].agg(lambda x: ','.join(x.astype(str)).reset_index()

Or convert the data type before aggregation:

df['ASIN'].astype(str).groupby(df['ORDER_ID']).agg(','.join).reset_index()

edited Feb 17, 2017 at 4:49

answered Feb 17, 2017 at 4:43

akuiper

216k33 gold badges362 silver badges379 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user2696258 Over a year ago

THANKS IT WORKED. But when I try it on other file with one more string column it gives error as: File "grouping-data.py", line 11, in <module> df = df['ASIN'].astype(str).groupby(df['ORDER_ID']).agg(','.join).reset_index() File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 2059, in getitem return self._getitem_column(key) File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 2066, in _getitem_column return self._get_item_cache(key) File "/Library/Python/2.7/site-packages/pandas/core/generic.py", line 1386, in _get_item_cache values = sel

akuiper Over a year ago

I am not sure what this is, maybe you can share some of your data that fails the command.

Collectives™ on Stack Overflow

expected string, int found While using pandas dataframe on a large file

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related