0

I have a text file that has the data:

B4-B9   4
B1-B3   8
B5-B6   1
B7  4
B8 - B9 5
B12-B19 6
B17 - B24 3
B22, B23 3
B24-B29 8
B30,B31 10
B32-B39 12
B45-B47 12
B48-B49 15
B50 14
B17, B18 18
B41,B42 19

I would like to capture each letter 'B' with a number in each line in a group and a third group containing the number without a B.

I made the regex (B\d+)*\s*-*,*(B\d+)+\s(\d+) but I am having trouble capturing in the instances where there are spaces between the dashes or commas.

B8 - B9 5
B17 - B24 3
B22, B23 3
B17, B18 18
1
  • 1
    Would you please try : (B\d+)(?:\s*[-,]\s*(B\d+))?\s+(\d+). Commented May 19, 2021 at 0:02

1 Answer 1

3

You could use this regex to capture each B value and the final number into a dictionary (or tuples of values):

(?P<b1>B\d+)\s*(?:[-,]\s*(?P<b2>B\d+))?\s+(?P<num>\d+)

It looks for:

  • a B with some number of digits ((?P<b1>B\d+)), captured in group b1;
  • an optional - or , followed by another B with some number of digits ((?:[-,]\s*(?P<b2>B\d+))?, with the B value captured in group b2; and finally
  • some number of digits ((?P<num>\d+)) (captured in group num)

Demo on regex101

In python (assuming each line is in the variable s):

m = re.match(r'(?P<b1>B\d+)\s*(?:[-,]\s*(?P<b2>B\d+))?\s+(?P<num>\d+)$', s)
if m is not None:
    print(m.groupdict())

Output (for your sample data):

{'b1': 'B4', 'b2': 'B9', 'num': '4'}
{'b1': 'B1', 'b2': 'B3', 'num': '8'}
{'b1': 'B5', 'b2': 'B6', 'num': '1'}
{'b1': 'B7', 'b2': None, 'num': '4'}
{'b1': 'B8', 'b2': 'B9', 'num': '5'}
{'b1': 'B12', 'b2': 'B19', 'num': '6'}
{'b1': 'B17', 'b2': 'B24', 'num': '3'}
{'b1': 'B22', 'b2': 'B23', 'num': '3'}
{'b1': 'B24', 'b2': 'B29', 'num': '8'}
{'b1': 'B30', 'b2': 'B31', 'num': '10'}
{'b1': 'B32', 'b2': 'B39', 'num': '12'}
{'b1': 'B45', 'b2': 'B47', 'num': '12'}
{'b1': 'B48', 'b2': 'B49', 'num': '15'}
{'b1': 'B50', 'b2': None, 'num': '14'}
{'b1': 'B17', 'b2': 'B18', 'num': '18'}
{'b1': 'B41', 'b2': 'B42', 'num': '19'}

If you would prefer just tuples of data, change groupdict to groups i.e. m.groups(), then the output will be:

('B4', 'B9', '4')
('B1', 'B3', '8')
('B5', 'B6', '1')
('B7', None, '4')
('B8', 'B9', '5')
('B12', 'B19', '6')
('B17', 'B24', '3')
('B22', 'B23', '3')
('B24', 'B29', '8')
('B30', 'B31', '10')
('B32', 'B39', '12')
('B45', 'B47', '12')
('B48', 'B49', '15')
('B50', None, '14')
('B17', 'B18', '18')
('B41', 'B42', '19')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.