I have the following table and would like to split each row into three columns: state, postcode and city. State and postcode are easy, but I'm unable to extract the city. I thought about splitting each string after the street synonyms and before the state, but I seem to be getting the loop wrong as it will only use the last item in my list.
Input data:
Address Text
0 11 North Warren Circle Lisbon Falls ME 04252
1 227 Cony Street Augusta ME 04330
2 70 Buckner Drive Battle Creek MI
3 718 Perry Street Big Rapids MI
4 14857 Martinsville Road Van Buren MI
5 823 Woodlawn Ave Dallas TX 75208
6 2525 Washington Avenue Waco TX 76710
7 123 South Main St Dallas TX 75201
The output I'm trying to achieve (for all rows, but I only wrote out the first two to save time)
City State Postcode
0 Lisbon Falls ME 04252
1 Augusta ME 04330
My code:
# Extract postcode and state
df["Zip"] = df["Address Text"].str.extract(r'(\d{5})', expand = True)
df["State"] = df["Address Text"].str.extract(r'([A-Z]{2})', expand = True)
# Split after these substrings
street_synonyms = ["Circle", "Street", "Drive", "Road", "Ave", "Avenue", "St"]
# This is where I got stuck
df["Syn"] = df["Address Text"].apply(lambda x: x.split(syn))
df