2

I am trying to take the results I get from regex, i.e.

['11J']
['4C']
['5,']
[]
['04 ', '05 ', '48T']

And store those values in a new column (i.e. Apt) of an existing pandas data frame.

Sample data (Excel file)

index  id           apt     address           job description
0     122092476     207     EAST 74 STREET    blah blah 11J blah               
1     122096043     2092    8TH AVENUE        blah 4C blah blah

Code

import pandas as pd
import re

df = pd.read_excel('/Users/abc/Desktop/Apartment.xlsx', sheetname=0)
df['Apt'] = 'None'
top5 = df.head()
t5jobs = top5['Job Description']    

d = []

for index, job in enumerate(t5jobs):
    result = re.findall(r'\d\d\D', job) or re.findall(r'\d\D', job) or re.findall(r'PH\D', job)

#print(str(result))
d.append(str(result))

df2 = pd.DataFrame([[d]], columns=list('Apt'))
df.append(df2)

I am getting this error:

AssertionError: 3 columns passed, passed data had 1 columns

How can I get these values inserted in the Apt column (overwrite None)?

Desired Output:

index  id           apt     address           job description         apt 
 0     122092476     207     EAST 74 STREET    blah blah 11J blah      11J         
 1     122096043     2092    8TH AVENUE        blah 4C blah blah        4C
8
  • 2
    I don't understand your question. Can you edit your question to include a MCVE? Commented Sep 14, 2016 at 19:32
  • 1
    replace: columns=list('Apt') --> columns=['Apt'] Commented Sep 14, 2016 at 19:35
  • 2
    columns=list('Apt') this means three columns: ['A', 'p', 't'] Commented Sep 14, 2016 at 19:36
  • How do i overwrite "None". It still shows up in Apt column? Commented Sep 14, 2016 at 19:39
  • @user3062459, can you post your desired DF? Commented Sep 14, 2016 at 19:42

1 Answer 1

2

try this (for pandas 0.18.0+):

In [11]: df['Apt'] = df['job description'].str.extract(r'\b(\d{1,2}\D)\b', expand=True)

In [12]: df
Out[12]:
              id   apt         address     job description  Apt
index
0      122092476   207  EAST 74 STREET  blah blah 11J blah  11J
1      122096043  2092      8TH AVENUE   blah 4C blah blah   4C

for pandas versions < 0.18.0:

df['Apt'] = df['job description'].str.extract(r'\b(\d{1,2}\D)\b')
Sign up to request clarification or add additional context in comments.

5 Comments

df['Apt'] = df['Job Description'].str.extract(r'\b(\d{1,2}\D)\b', expand=True) TypeError: extract() got an unexpected keyword argument 'expand'
@user3062459, what is your pandas version?
pandas version .. pandas==0.17.1
It only fills in the first two values. How do I get it to fill in all the values, as I will run for entire dataset?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.