Extract Substring from String Python

Question

I'm trying to extract the following substring from the string

-- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p
ls,v $, $Revision: 1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $

String I want to extract: $Revision: 1.14 (or just 1.14)

My code is as follows:

from sys import *
from os.path import *
import re 

script, filename = argv

print "Filename: %s\n" % filename

def check_string():
    found = False
    with open(filename) as f:
        for line in f:
        if re.search("(?<=\$Revision: ) 1.14", line):
            print line
            found = True
        if not found:
            print "No Header exists in %s" % filename

check_string()

This does not seem to be working.

Any suggestions?

Thanks!

What does "This does not seem to be working" mean, exactly? — jonrsharpe
– jonrsharpe, Commented Nov 8, 2014 at 23:07

Padraic Cunningham · Accepted Answer · 2014-11-09 00:54:11Z

2

If I understand you correctly in and split should do what you want :

if "$Revision:" in line:
    print(line.split("$Revision: ")[1].split()[0])
1.14


In [6]: line ="""
   ...: -- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p
   ...: ls,v $, $Revision: 1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $
   ...: """

In [7]: line.split("$Revision: ")  # split the line at $Revision: 
Out[7]: 
['\n-- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p\nls,v $, ',
 '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n']

# we use indexing to get the first element after $Revision:  in the string
In [8]: line.split("$Revision: ")[1] 
# which becomes the substring below
Out[8]: '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n'

# if we call split again we split that substring on whitespace into individual strings
In [10]: '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n'.split()
Out[10]: ['1.14', '$,', '$Author:', '$,', '$Date:', '2014/09/23', '21:41:15', '$']

# using indexing again we extract the first element which is the  revision number
In [11]: '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n'.split()[0]
Out[11]: '1.14'

It is the same for $Date:

 date  = line.split("$Date: ")[1].split()[0]

Or just use in if you just want to check for a substring in the string:

if "$Revision: 1.14" in line:
    print line

edited Nov 9, 2014 at 0:54

answered Nov 8, 2014 at 23:23

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Padraic Cunningham Over a year ago

No worries, you're welcome. I was not fully if you wanted to check for membership or extract the revision number but the code will do both

Andrew Hummel Over a year ago

Can you please explain what the [1].split()[0] is doing? Also, what if I wanted to extract the date in $Date: 2014/09/23 21:41:15? I realize the same syntax would work fine, but is there a way to not have to do another if ""$Date" in line"?

Padraic Cunningham Over a year ago

are $Date and $Revision always together?

Andrew Hummel Over a year ago

They'll always need to be extracted, yes.

Padraic Cunningham Over a year ago

are they always in the same line, I mean, if $Revision is in the line does it mean $Date will also and vice versa?

|

Kent · Accepted Answer · 2014-11-08 23:09:49Z

2

if re.search("(?<=\$Revision: ) 1.14", line):

your line won't work because you try to match two spaces between : and 1.14, try:

if re.search("(?<=\$Revision: )1.14", line):

or

if re.search("\$Revision:\s+1.14", line):

answered Nov 8, 2014 at 23:09

Kent

197k36 gold badges248 silver badges317 bronze badges

Comments

NPE · Accepted Answer · 2014-11-08 23:10:38Z

1

Your regex requires two spaces between the colon and the version number, and the input only contains one.

answered Nov 8, 2014 at 23:10

NPE

503k114 gold badges970 silver badges1k bronze badges

Comments

Irshad Bhat · Accepted Answer · 2014-11-08 23:18:00Z

0

>>> import re
>>> string="""-- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p
... ls,v $, $Revision: 1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $"""
>>> re.findall(r'\$Revision:\s*([0-9.]*)',string,re.DOTALL) # if more than one such value is to be searched
['1.14']   
>>> re.search(r'\$Revision:\s*([0-9.]*)',string,re.DOTALL).group(1) # if only one such value neeeds to be found 
'1.14'

answered Nov 8, 2014 at 23:18

Irshad Bhat

8,7792 gold badges31 silver badges37 bronze badges

Comments

gboffi · Accepted Answer · 2014-11-09 01:32:23Z

import sys

def check_string(f,target):
    for line in f:
        if line.find(target)>=0:
            return line

script, filename = argv

f = open(filename)
rev_line = check_string(f,'Revision: 1.14')
if rev_line:
    ...
else:
    ...

The `check_string` function

No need for regexp
line.find(target) returns -1 on failure, index of target in line on success
if the index is no less than 0 we have a match, so we return line
if we don't find a match, we fall out the border of the function, returning None

The calling program

After the usual boilerplate we assign to the variable rev_line what is returned by check_string. If we have not found 'Revision: 1.14', rev_line is None, otherwise is the full line containing the target. Go on to do what is to be done in both cases.

Edit

If the revision number is not known at the time of writing the program, you have two cases

the revision number is sourced from a file, or otherwise computed, and is know at time of execution
```
target = 'Revision: %d.%d' % (major, minor)
rev_line = check_string(f, target)
```
the revision number is not fully known at the time of checking, in this case you build a target string containing a regexp and modify the innards of check_string, in place of the if line.find(target)>=0: you write if re.search(target, line): that is very similar to what you've written in the 1st place, BUT the regexp is no more hardcoded into the function and you're free to determine it in the main program body.

All in all, 2. is better, because you can always build a "constant" regexp...

I like this logic here, but one thing I forgot to mention..the 1.14 will not always be that value. It could be 1.x (x being any value).

Collectives™ on Stack Overflow

Extract Substring from String Python

5 Answers 5

9 Comments

Comments

Comments

Comments

The `check_string` function

The calling program

Edit

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

9 Comments

Comments

Comments

Comments

The check_string function

The calling program

Edit

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related

The `check_string` function