1

I am trying to get a reference number inside a string which is in most cases precedented by "Ref." or something similar.

e.g.:

Explorer II Ref.16570 Box

regex with further examples

The problem is that there are many different variations1 as this is user generated content. How could I retrieve the number with python which is precented by e.g. Ref.?

The number/string is not always the same pattern e.g. numbers. .They might be mixed with characters and dots and slashes but for a human eye there is almost always such a number in each line identifiable.

E.g.:

Ref.16570
Ref. 16570
Referenz 216570
Referenz 01 733 7653 4159-07 4 26
331.12.42.51.01.002
166.0173
AB012012/BB01
Ref. 167.021
PAM00292
14000M
L3.642.4.56.6
161.559.50
801
666
753
116400GV
Ref.: 231.10.39.21.03.002
3233
Ref: 233.32.41.21.01.002
T081.420.97.057.01
16750
... almost each line in the example provided contains a certain ID

A small amount of false positives would not be a problem.

10
  • Do you need to match or extract ? Commented Dec 11, 2018 at 14:07
  • The link you shared doesn't show any further examples. It would be better if you could share some more examples in your question. Commented Dec 11, 2018 at 14:08
  • Probably you are looking for Ref(?:erenz)?\. *(\d+). It shouldn't start with ^. What you need is in 1st capturing group. Commented Dec 11, 2018 at 14:09
  • @PedroLobito I am looking to extract the number. It is always one per line. The example shows the titles of several cases which are typical. Each contains a reference number. Commented Dec 11, 2018 at 14:14
  • 1
    ...but for a human eye there is almost always such a number in each line identifiable. no sir. Not unless you give us some rules. Commented Dec 11, 2018 at 14:46

3 Answers 3

1

Not totally sure if you need to match or extract, but Ref\.?([ \d.]+) will extract any digits after Ref (case insensitive), i.e.:

import re
result = re.findall(r"Ref\.?([ \d.]+)", subject, re.IGNORECASE | re.MULTILINE)

['16570', '16570', '167.021', '3527']

Regex Demo
Python Demo


Regex Explanation
enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

I want to extract the reference number. For a better explanation I took your regex, added some code and updated the link within the question. You can see that there are certain maches now, but with different color and not all ref ids are matche. E.g. L.3.674.4.50.0 is one I want to get and also 331.12.42.51.01.002. Will update the question in order to make it more clear.
0

This ought to do the trick:

import re
str = 'Explorer II Ref.16570 Box'
m = re.match('Ref\.[0-9]+', str)
if m:
    print(m.group(0)[4:])

For more info:

2 Comments

I am getting: >>> print(m.group(0)[4:]) Traceback (most recent call last): File "<console>", line 1, in <module> AttributeError: 'NoneType' object has no attribute 'group'
Then the string likely doesn't have a match. I've updated my answer to account for this possibility.
0

Try the following code. It collects all the data after Ref till one of pre-defined stoppers. Stoppers are used because the question does not contain clear definition of what data is reference (not always the same pattern, might be mixed with, for a human eye there is almost always). I guess additional processing of matches is needed to extract actual references more accurately.

import re

ref_re = re.compile('(?P<ref_keyword>Referenz|Ref\.|Ref)[ ]*(?P<ref_value>.*?)(?P<ref_stopper> - | / |,|\n)')

with open('1.txt', mode='r', encoding='UTF-8') as file:
    data = file.read()

for match in ref_re.finditer(data):
    print('key:', match.group('ref_keyword'))
    print('value:', match.group('ref_value'))
    # print('stopper:', match.group('ref_stopper'))

Output starts with the lines:

key: Ref.
value: 16570 Box&Papiere mit Revision
key: Ref.
value: 16570 Box&Papiere mit Revision
key: Referenz
value: 216570 mit schwarzem Zifferblatt 
key: Referenz
value: 01 733 7653 4159-07 4 26 34EB 
key: Ref.
value: 167.021
key: Ref.
value: 3527
key: Referenz
value: 01 733 7653 4159-07 4 26 34EB
key: Ref.
value: 16570 Box&Papiere mit Revision

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.