python pandas applying for loop and groupby function

Question

I am new to python and I am not familiar iterating with the groupby function in pandas I modified the code below and it works fine for creating a pandas dataframe

i=['J,Smith,200 G Ct,',
'E,Johnson,200 G Ct,',
'A,Johnson,200 G Ct,',
'M,Simpson,63 F Wy,',
'L,Diablo,60 N Blvd,',
'H,Simpson,63 F Wy,',
'B,Simpson,63 F Wy,']

dbn=[]
dba=[]

for z,g in groupby(
    sorted([l.split(',')for l in i],
    key=lambda x:x[1:]),
    lambda x:x[2:]
):

 l=list(g);r=len(l);Address=','.join(z);o=l[0]
 if r>2:
    dbn.append('The '+o[1]+" Family,")
    dba.append(Address)
 elif r>1:
    dbn.append(o[0]+" and "+l[1][0]+", "+o[1]+",")
    dba.append(Address)
 else:
    dbn.append(o[0]+" "+o[1])
#   print','.join(o),
    dba.append(Address)

Hdf=pd.DataFrame({'Address':dba,'Name':dbn})
print Hdf

      Address                 Name
0  60 N Blvd,             L Diablo
1   200 G Ct,    E and A, Johnson,
2    63 F Wy,  The Simpson Family,
3   200 G Ct,              J Smith

How would I modify the for loop to yield the same results if I am using a pandas dataframe instead of raw csv data?

df=pd.DataFrame({'Name':['J','E','A','M','L','H','B'],
'Lastname':['Smith','Johnson','Johnson','Simpson','Diablo','Simpson','Simpson'],
'Address':['200 G Ct','200 G Ct','200 G Ct','63 F Wy','60 N Blvd','63 F Wy','63 F Wy']})

@AndyHayden This is just an small portion the CSV file is huge and has more fields. Read csv will just get me the data frame df. What I want is the end result the dataframe Hdf shown above. — user2872701
– user2872701, Commented Nov 21, 2013 at 8:24

roman · Accepted Answer · 2013-11-21 14:43:45Z

1

Version with loop/generator:

First, we create helper function and group data by Lastname, Address:

def helper(k, g):
    r = len(g)
    address, lastname = k
    if r > 2:
        lastname = 'The {} Family'.format(lastname)
    elif r > 1:
        lastname = ' and '.join(g['Name']) + ', ' + lastname
    else:
        lastname = g['Name'].squeeze() + ' ' + lastname
    return (address, lastname)

grouped = df.groupby(['Address', 'Lastname'])

Then create generator with helper function applied to each group:

vals = (helper(k, g) for k, g in grouped)

And then create resulting DataFrame from it:

pd.DataFrame(vals, columns=['Address','Name'])

     Address                Name
0   200 G Ct    E and A, Johnson
1   200 G Ct             J Smith
2  60 N Blvd            L Diablo
3    63 F Wy  The Simpson Family

More vectorized version:

Group data by Lastname, Address and then generate new DataFrame with length of group and string contains two first names concatenated:

grouped = df.groupby(['Address', 'Lastname'])
res = grouped.apply(lambda x: pd.Series({'Len': len(x), 'Names': ' and '.join(x['Name'][:2])})).reset_index()

     Address Lastname  Len    Names
0   200 G Ct  Johnson    2  E and A
1   200 G Ct    Smith    1        J
2  60 N Blvd   Diablo    1        L
3    63 F Wy  Simpson    3  M and H

Now just apply usual pandas transformations and delete unneseccary columns:

res.ix[res['Len'] > 2, 'Lastname'] = 'The ' + res['Lastname'] + ' Family'
res.ix[res['Len'] == 2, 'Lastname'] = res['Names'] + ', ' + res['Lastname']
res.ix[res['Len'] < 2, 'Lastname'] = res['Names'] + ' ' + res['Lastname']
del res['Len']
del res['Names']

     Address            Lastname
0   200 G Ct    E and A, Johnson
1   200 G Ct             J Smith
2  60 N Blvd            L Diablo
3    63 F Wy  The Simpson Family

edited Nov 21, 2013 at 14:43

answered Nov 21, 2013 at 8:38

roman

118k30 gold badges205 silver badges209 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

user2872701 Over a year ago

Got an error PandasError: DataFrame constructor not properly called! I am guessing it is a problem with a generator function when forming a dataframe but I am not sure as I never used generators

roman Over a year ago

what version of Pandas do you have?

user2872701 Over a year ago

I will try reinstalling with version 0.120 but I am using python 2.7 not 3.0. And it still gave the error

roman Over a year ago

@user2872701 try another version

roman Over a year ago

@user2872701 well can you change generator to list? just change () to [], like vals = [helper(k, g) for k, g in grouped]

|

Collectives™ on Stack Overflow

python pandas applying for loop and groupby function

1 Answer 1

Version with loop/generator:

More vectorized version:

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Version with loop/generator:

More vectorized version:

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related