4

I'm trying to extract the following substring from the string

-- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p
ls,v $, $Revision: 1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $

String I want to extract: $Revision: 1.14 (or just 1.14)

My code is as follows:

from sys import *
from os.path import *
import re 

script, filename = argv

print "Filename: %s\n" % filename

def check_string():
    found = False
    with open(filename) as f:
        for line in f:
        if re.search("(?<=\$Revision: ) 1.14", line):
            print line
            found = True
        if not found:
            print "No Header exists in %s" % filename

check_string()

This does not seem to be working.

Any suggestions?

Thanks!

1
  • 1
    What does "This does not seem to be working" mean, exactly? Commented Nov 8, 2014 at 23:07

5 Answers 5

2

If I understand you correctly in and split should do what you want :

if "$Revision:" in line:
    print(line.split("$Revision: ")[1].split()[0])
1.14


In [6]: line ="""
   ...: -- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p
   ...: ls,v $, $Revision: 1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $
   ...: """

In [7]: line.split("$Revision: ")  # split the line at $Revision: 
Out[7]: 
['\n-- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p\nls,v $, ',
 '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n']

# we use indexing to get the first element after $Revision:  in the string
In [8]: line.split("$Revision: ")[1] 
# which becomes the substring below
Out[8]: '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n'

# if we call split again we split that substring on whitespace into individual strings
In [10]: '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n'.split()
Out[10]: ['1.14', '$,', '$Author:', '$,', '$Date:', '2014/09/23', '21:41:15', '$']

# using indexing again we extract the first element which is the  revision number
In [11]: '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n'.split()[0]
Out[11]: '1.14'

It is the same for $Date:

 date  = line.split("$Date: ")[1].split()[0]

Or just use in if you just want to check for a substring in the string:

if "$Revision: 1.14" in line:
    print line
Sign up to request clarification or add additional context in comments.

9 Comments

No worries, you're welcome. I was not fully if you wanted to check for membership or extract the revision number but the code will do both
Can you please explain what the [1].split()[0] is doing? Also, what if I wanted to extract the date in $Date: 2014/09/23 21:41:15? I realize the same syntax would work fine, but is there a way to not have to do another if ""$Date" in line"?
are $Date and $Revision always together?
They'll always need to be extracted, yes.
are they always in the same line, I mean, if $Revision is in the line does it mean $Date will also and vice versa?
|
2
if re.search("(?<=\$Revision: ) 1.14", line):

your line won't work because you try to match two spaces between : and 1.14, try:

if re.search("(?<=\$Revision: )1.14", line):

or

if re.search("\$Revision:\s+1.14", line):

Comments

1

Your regex requires two spaces between the colon and the version number, and the input only contains one.

Comments

0
>>> import re
>>> string="""-- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p
... ls,v $, $Revision: 1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $"""
>>> re.findall(r'\$Revision:\s*([0-9.]*)',string,re.DOTALL) # if more than one such value is to be searched
['1.14']   
>>> re.search(r'\$Revision:\s*([0-9.]*)',string,re.DOTALL).group(1) # if only one such value neeeds to be found 
'1.14'

Comments

0
import sys

def check_string(f,target):
    for line in f:
        if line.find(target)>=0:
            return line

script, filename = argv

f = open(filename)
rev_line = check_string(f,'Revision: 1.14')
if rev_line:
    ...
else:
    ...

The check_string function

  1. No need for regexp
  2. line.find(target) returns -1 on failure, index of target in line on success
  3. if the index is no less than 0 we have a match, so we return line
  4. if we don't find a match, we fall out the border of the function, returning None

The calling program

After the usual boilerplate we assign to the variable rev_line what is returned by check_string. If we have not found 'Revision: 1.14', rev_line is None, otherwise is the full line containing the target. Go on to do what is to be done in both cases.

Edit

If the revision number is not known at the time of writing the program, you have two cases

  1. the revision number is sourced from a file, or otherwise computed, and is know at time of execution

    target = 'Revision: %d.%d' % (major, minor)
    rev_line = check_string(f, target)
    
  2. the revision number is not fully known at the time of checking, in this case you build a target string containing a regexp and modify the innards of check_string, in place of the if line.find(target)>=0: you write if re.search(target, line): that is very similar to what you've written in the 1st place, BUT the regexp is no more hardcoded into the function and you're free to determine it in the main program body.

All in all, 2. is better, because you can always build a "constant" regexp...

1 Comment

I like this logic here, but one thing I forgot to mention..the 1.14 will not always be that value. It could be 1.x (x being any value).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.