1

I am new to python and I am not familiar iterating with the groupby function in pandas I modified the code below and it works fine for creating a pandas dataframe

i=['J,Smith,200 G Ct,',
'E,Johnson,200 G Ct,',
'A,Johnson,200 G Ct,',
'M,Simpson,63 F Wy,',
'L,Diablo,60 N Blvd,',
'H,Simpson,63 F Wy,',
'B,Simpson,63 F Wy,']

dbn=[]
dba=[]

for z,g in groupby(
    sorted([l.split(',')for l in i],
    key=lambda x:x[1:]),
    lambda x:x[2:]
):

 l=list(g);r=len(l);Address=','.join(z);o=l[0]
 if r>2:
    dbn.append('The '+o[1]+" Family,")
    dba.append(Address)
 elif r>1:
    dbn.append(o[0]+" and "+l[1][0]+", "+o[1]+",")
    dba.append(Address)
 else:
    dbn.append(o[0]+" "+o[1])
#   print','.join(o),
    dba.append(Address)

Hdf=pd.DataFrame({'Address':dba,'Name':dbn})
print Hdf

      Address                 Name
0  60 N Blvd,             L Diablo
1   200 G Ct,    E and A, Johnson,
2    63 F Wy,  The Simpson Family,
3   200 G Ct,              J Smith

How would I modify the for loop to yield the same results if I am using a pandas dataframe instead of raw csv data?

df=pd.DataFrame({'Name':['J','E','A','M','L','H','B'],
'Lastname':['Smith','Johnson','Johnson','Simpson','Diablo','Simpson','Simpson'],
'Address':['200 G Ct','200 G Ct','200 G Ct','63 F Wy','60 N Blvd','63 F Wy','63 F Wy']})
2
  • Erm, just use read_csv? Commented Nov 21, 2013 at 8:14
  • @AndyHayden This is just an small portion the CSV file is huge and has more fields. Read csv will just get me the data frame df. What I want is the end result the dataframe Hdf shown above. Commented Nov 21, 2013 at 8:24

1 Answer 1

1

Version with loop/generator:

First, we create helper function and group data by Lastname, Address:

def helper(k, g):
    r = len(g)
    address, lastname = k
    if r > 2:
        lastname = 'The {} Family'.format(lastname)
    elif r > 1:
        lastname = ' and '.join(g['Name']) + ', ' + lastname
    else:
        lastname = g['Name'].squeeze() + ' ' + lastname
    return (address, lastname)

grouped = df.groupby(['Address', 'Lastname'])

Then create generator with helper function applied to each group:

vals = (helper(k, g) for k, g in grouped)

And then create resulting DataFrame from it:

pd.DataFrame(vals, columns=['Address','Name'])

     Address                Name
0   200 G Ct    E and A, Johnson
1   200 G Ct             J Smith
2  60 N Blvd            L Diablo
3    63 F Wy  The Simpson Family

More vectorized version:

Group data by Lastname, Address and then generate new DataFrame with length of group and string contains two first names concatenated:

grouped = df.groupby(['Address', 'Lastname'])
res = grouped.apply(lambda x: pd.Series({'Len': len(x), 'Names': ' and '.join(x['Name'][:2])})).reset_index()

     Address Lastname  Len    Names
0   200 G Ct  Johnson    2  E and A
1   200 G Ct    Smith    1        J
2  60 N Blvd   Diablo    1        L
3    63 F Wy  Simpson    3  M and H

Now just apply usual pandas transformations and delete unneseccary columns:

res.ix[res['Len'] > 2, 'Lastname'] = 'The ' + res['Lastname'] + ' Family'
res.ix[res['Len'] == 2, 'Lastname'] = res['Names'] + ', ' + res['Lastname']
res.ix[res['Len'] < 2, 'Lastname'] = res['Names'] + ' ' + res['Lastname']
del res['Len']
del res['Names']

     Address            Lastname
0   200 G Ct    E and A, Johnson
1   200 G Ct             J Smith
2  60 N Blvd            L Diablo
3    63 F Wy  The Simpson Family
Sign up to request clarification or add additional context in comments.

8 Comments

Got an error PandasError: DataFrame constructor not properly called! I am guessing it is a problem with a generator function when forming a dataframe but I am not sure as I never used generators
what version of Pandas do you have?
I will try reinstalling with version 0.120 but I am using python 2.7 not 3.0. And it still gave the error
@user2872701 try another version
@user2872701 well can you change generator to list? just change () to [], like vals = [helper(k, g) for k, g in grouped]
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.