Using Python 3.
I would like to parse a set of strings which are of the same format. One example I have is a list of books in the format:
title (year), author
e.g. "The Hitchhiker's Guide to the Galaxy (1979), Douglas Adams"
I'd like to extract the book's title, year and author from these strings using something elegant.
Something like:
book = "The Hitchhiker's Guide to the Galaxy (1979), Douglas Adams"
data = parsing_function(book, format)
where:
formatis some input that describes the format of the input string. A coded way of saying "author first, then the year in the brackets, then the author after the comma". Something likeformat = '{title} ({year}), {author}'datais the extracted title, year, etc. This could be a list or even better a dictionary.
This is inspired by the way Pandas parses date/time strings into datetime variables - see pandas.to_datetime here.
A format variable is passed in to the function to show how the date/time is represented, like:
pandas.to_datetime('13000101', format='%Y%m%d', errors='ignore')
>>> datetime.datetime(1300, 1, 1, 0, 0)
Is there a similar method of separating data in a string into different variables?
I can see a way to write a function for this specific case (e.g. using str.split() on the brackets/comma and separating that way), but I'm looking for a generic function that can be used on strings in any consistent format.
Thank you