1

I have the following string, while the first letters can differ and can also be sometimes two, sometimes three or four.

PR191030.213101.ABD

I want to extract the 191030 and convert that to a valid date.

filename_without_ending.split(".")[0][-6:]

PZA191030_392001_USB

Sometimes it looks liket his

This solution is not valid since this is also might differ from time to time. The only REAL pattern is really the first six numbers.

How do I do this?

Thank you!

1
  • 1
    With the rules you gave it works every time no? Get you give us a counter example when it doesn't work? Commented Nov 13, 2019 at 10:34

4 Answers 4

3

You could get the first 6 digits using a pattern an a capturing group

^[A-Z]{2,4}(\d{6})\.
  • ^ Start of string
  • [A-Z]{2,4} Match 2, 3 or 4 uppercase chars
  • ( Capture group 1
    • \d{6} Match 6 digits
  • )\. Close group and match trailing dot

Regex demo | Python demo

For example

import re

regex = r"^[A-Z]{2,4}(\d{6})\."
test_str = "PR191030.213101.ABD"
matches = re.search(regex, test_str)

if matches:
    print(matches.group(1))

Output

191030
Sign up to request clarification or add additional context in comments.

2 Comments

thank you, really advanced solution. Hope its ok when you accept the one with the list comprehension since I feel more comfortable using that ;-)
@DataMastery Of course, no problem at all. You should select the solution that works for you. Good luck!
3

You can do:

a = 'PR191030.213101.ABD'
int(''.join([c for c in a if c.isdigit()][:6]))

Output:

191030

Comments

1

This can also be done by:

filename_without_ending.split(".")[0][2::]

This splits the string from the 3rd letter to the end.

Comments

0

Since first letters can differ we have to ignore alphabets and extract digits.

So using re module (for regular expressions) apply regex pattern on string. It will give matching pattern out of string.

'\d' is used to match [0-9]digits and + operator used for matching 1 digit atleast(1/more).

findall() will find all the occurences of matching pattern in a given string while #search() is used to find matching 1st occurence only.

import re

str="PR191030.213101.ABD"

print(re.findall(r"\d+",str)[0])

print(re.search(r"\d+",str).group())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.