Yeah maybe some advanced regex could handle this but the pandas naive approach would be:
import pandas as pd
import numpy as np
col = pd.Series(['10 rue des Treuils BP 12 33023, Bordeaux France',
'10 rue des Treuils BP 12 33023, Les Deux Alpes France',
'10 rue des Treuils BP 12 33023, New York United States'])
cities = np.where(col.str.endswith('United States'),
col.str.split(', ').str[1].str.split().str[:-2].str.join(' '),
col.str.split(', ').str[1].str.split().str[:-1].str.join(' '))
print(cities)
#['Bordeaux' 'Les Deux Alpes' 'New York']
A more general but not as effective solution (but who needs speed right?)
import pandas as pd
col = pd.Series(['10 rue des Treuils BP 12 33023, Bordeaux France',
'10 rue des Treuils BP 12 33023, New York United States',
'10 rue des Treuils BP 12 33023, Seoul South Korea',
'10 rue des Treuils BP 12 33023, Brazzaville Republic of Congo'])
countries = {'United States': 2 , 'South Korea': 2, 'Republic of Congo': 3}
n = [next((countries[k] for k,v in countries.items() if i.endswith(k)), 1) for i in col]
cities = [' '.join(i.split(', ')[1].split()[:-y]) for i,y in zip(col,n)]
print(cities)
# ['Bordeaux', 'Les Deux Alpes', 'New York', 'Seoul', 'Brazzaville']
And then simply assign back with:
df['city'] = cities
(?<=\d{5}, ).*(?=France|United States)