I'm having a bit of an issue with my regex script and hopefully somebody can help me out.
Basically, I have a regex script that I use re.findall() with in a python script. My goal is to search various strings of varying length for references to Bible verses (e.g. John 3:16, Romans 6, etc). My regex script mostly works, but it sometimes tacks on an extra whitespace before the Bible book name. Here's the script:
versesToFind = re.findall(r'\d?\s?\w+\s\d+:?\d*', str)
To hopefully explain this problem better, here's my results when running this script on this text string:
str = 'testing testing John 3:16 adsfbaf John 2 1 Kings 4 Romans 4'
Result (from www.pythonregex.com):
[u' John 3:16', u' John 2', u'1 Kings 4', u' Romans 4']
As you can see, John 2 and Romans 4 has an extra whitespace at the beginning that I want to get rid of. Hopefully my explanation makes sense. Thanks in advance!