I have a dataframe like this:
pd.DataFrame({'course_code': ['BUS225 - DC - 02-21-17',
'N320L - EM8 - 01-21-20 - Sect1', 'N495 - LA8 - 05-14-19 - Sect3']})
I am trying to write a regular expression (with pandas) that returns me the following output:
pd.DataFrame({'course_code': ['BUS225', 'N320L', 'N495']})
At the moment here is my code:
df.course_code.str.extract(r'(\A\D\D\D\d\d\d)')
I know I'm missing something here. I'm having a hard timing capturing the "L", as well as dealing with course codes that have 3 alphas at the beginning of the string vs 1 alpha.
df.course_code.str.split(r' - ').str[0]df.course_code.str.extract(r'^([A-Z]+\d+[A-Z]*)')