1

I have few file name :

xyz-1.23.35.10.2.rpm

xyz-linux-version-90.12.13.689.tar.gz

xyz-xyz-xyz-13.23.789.0-xyz-xyz.rpm

Here xyz can be any string of any size(only alpha no numerals)

Here the numbers with('.') are a version for each file.

Can I have a one common function to extract the version from each of the filename? I tried but the function is getting too big and very much use of hard coded constants. please suggest a simple way

3 Answers 3

1

We can use the re module to do this. Let's define the pattern we're trying to match.

We'll need to match a string of digits:

\d+

These digits may be followed by either a period or a hyphen:

\d+[\-\.]?

And this pattern can repeat many times:

(\d[\-\.]?)*

Finally, we always end with at least one digit:

(\d+[\-\.]?)*\d+

This pattern can be used to define a function that returns a version number from a filename:

import re

def version_from(filename, pattern=r'(\d+[\-\.]?)*\d+'):
    match = re.search(pattern, filename)
    if match:
        return match.group(0)
    else:
        return None

Now we can use the function to extract all the versions from the data you provided:

data = ['xyz-1.23.35.10.2.rpm', 'xyz-linux-version-90-12-13-689.tar.gz', 'xyz-xyz-xyz-13.23.789.0-xyz-xyz.rpm']

versions = [version_from(filename) for filename in data]

The result is the list you ask for:

['1.23.35.10.2', '90-12-13-689', '13.23.789.0']
Sign up to request clarification or add additional context in comments.

2 Comments

perfect . awesome. I love you . it solved my problem
@AshishKumar Pleased to help. I notice that your question changed to exclude hyphens from the requirement. If that's correct, you'll only need r'(\d+\.?)\d+' for your pattern.
1

Not sure if there's a better way regular expressions aren't really my thing, but here's one way you can see the version of your files assuming the only occurrences of numbers are the versions in this format.

import re
strings = [
    "xyz-1.23.35.10.2.rpm",
    "xyz-linux-version-90.12.13.689.tar.gz",
    "xyz-xyz-xyz-13.23.789.0-xyz-xyz.rpm",
]
for string in strings:
    matches = re.findall("\d+", string)
    version = ".".join(matches)
    print(version)

Result:

1.23.35.10.2
90.12.13.689
13.23.789.0

Comments

0

Assuming that the only numbers in your string are the version you need to extract, you could try something like this:

 def func(someString):
    version = ''
    found = False 
    for character in someString:
        if character.isdigit():
            found = True
        elif character.isalpha():
            found = False
        if found:
            version += character
    return version

Basically we search each character of the string, and when the version part begins found becomes true (because 'number'.isdigit() returns true). When we reach that part each character is added to the version string. isdigit() and isalpha() are part of python's basic library so you don't need to import anything.

P.S. I haven't tested this for errors

2 Comments

I was expecting use of re module. this works. but its very static .
I was trying to figure this out with a basic algorithm, since my knowledge of re is limited to none. You are right though :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.