multiple split in string using regex

Question

I have a string :

 Station Disconnect:1.3.6.1.4.1.11.2.14.11.15.2.75.3.2.0.8 StaMAC:00:9F:0B:00:38:B8 BSSID:00 9F Radioid:2

I want split this string. It look like this -

'Station Disconnect:1.3.6.1.4.1.11.2.14.11.15.2.75.3.2.0.8' 'StaMAC:00:9F:0B:00:38:B8' 'BSSID:00 9F' 'Radioid:2'

I tried this logic - msgRegex = re.compile('[\w\s]+:') and split function also. How can I do Please help me Thank you

I used split function and regex also msgRegex = re.compile('[\w\s]+:') @RedBassett — Prafulla
– Prafulla, Commented Feb 6, 2017 at 10:46
You can use the "edit" button below your original question to add this to the question! — RedBassett
– RedBassett, Commented Feb 6, 2017 at 10:49
You can use split, And manually joints these area's Station Disconnect:,BSSID:00 9F. — Rahul K P
– Rahul K P, Commented Feb 6, 2017 at 11:06

Wiktor Stribiżew · Accepted Answer · 2017-02-08 10:56:29Z

1

From what I see, you have a problem when you have a whitespace inside the matches with hex values.

Because of that, I believe you cannot use a splitting approach here. Match your tokens with a regex like

(?<!\S)\b([^:]+):((?:[a-fA-F0-9]{2}(?:[ :][a-fA-F0-9]{2})*|\S)+)\b

See the regex demo

Python code:

import re

rx = r"(?<!\S)\b([^:]+):((?:[a-fA-F0-9]{2}(?:[ :][a-fA-F0-9]{2})*|\S)+)\b"
ss = ["Station Disconnect:1.3.6.1.4.1.11.2.14.11.15.2.75.3.2.0.8 StaMAC:00:9F:0B:00:38:B8 BSSID:00 9F Radioid:2",
    "Station Deassoc:1.3.6.1.4.1.11.2.14.11.15.2.75.3.2.0.5 StaMac1:40:83:DE:34:04:75 StaMac2:40:83:DE:34:04:75 UserName:4083de340475 StaMac3:40:83:DE:34:04:75 VLANId:1 Radioid:2 SSIDName:Devices SessionDuration:12 APID:CN58G6749V AP Name:1023-noida-racking-zopnow BSSID:BC:EA:FA:DC:A6:F1"]
for s in ss:
    matches = re.findall(rx, s)
    print(matches)

Result:

[('Station Disconnect', '1.3.6.1.4.1.11.2.14.11.15.2.75.3.2.0.8'), ('StaMAC', '00:9F:0B:00:38:B8'), ('BSSID', '00 9F'), ('Radioid', '2')]
[('Station Deassoc', '1.3.6.1.4.1.11.2.14.11.15.2.75.3.2.0.5'), ('StaMac1', '40:83:DE:34:04:75'), ('StaMac2', '40:83:DE:34:04:75'), ('UserName', '4083de340475'), ('StaMac3', '40:83:DE:34:04:75'), ('VLANId', '1'), ('Radioid', '2'), ('SSIDName', 'Devices'), ('SessionDuration', '12'), ('APID', 'CN58G6749V'), ('AP Name', '1023-noida-racking-zopnow'), ('BSSID', 'BC:EA:FA:DC:A6:F1')]

NOTE: If you need no tuples in the result, remove the capturing parentheses from the pattern.

Pattern details:

(?<!\S)\b - start of string or whitespace followed with a word boundary (next char must be a letter/digit or _)
([^:]+) - Capturing group #1: 1+ chars other than :
: - a colon
((?:[a-fA-F0-9]{2}(?:[ :][a-fA-F0-9]{2})*|\S)+) - Capturing group 2 matching one or more occurrences of:
- [a-fA-F0-9]{2}(?:[ :][a-fA-F0-9]{2})* - 2 hex chars followed with zero or more occurrences of a space or : and 2 hex chars
- | - or
- \S - a non-whitespace char
\b - trailing word boundary.

answered Feb 8, 2017 at 10:56

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Prafulla Over a year ago

'Desc:Unknown Failure APID:CN58G6749V BSSID:BC:EA:FA:DC:A6:F0 AuthMode: 1 AP MAC:BC:EA:FA:DC:A6:E0' For this type of line it doesn't work

Prafulla Over a year ago

output shoulde be : ['Desc:Unknown Failure' 'APID:CN58G6749V' 'BSSID:BC:EA:FA:DC:A6:F0' 'AuthMode: 1' 'AP MAC:BC:EA:FA:DC:A6:E0']

Wiktor Stribiżew Over a year ago

Just a sec: how do you delimit the key from the value? That example ruins the whole logic, and keys cannot be distinguished from the values with regex since both keys and values can have arbitrary number of whitespace inside them.

Wiktor Stribiżew Over a year ago

Ok, a regex is still possible, but only if all the keys are known. If you can provide their list, I can adjust the pattern.

Prafulla Over a year ago

Keys are not fixed.Keys may change or may be the sequence is changed.

Mohammad Yusuf · Accepted Answer · 2017-02-06 13:54:21Z

1

In this particular case you can implement it like so:

import re

a = 'Station Disconnect:1.3.6.1.4.1.11.2.14.11.15.2.75.3.2.0.8 StaMAC:00:9F:0B:00:38:B8 BSSID:00 9F Radioid:2'
print re.split(r'(?<=[A-Z0-9]) (?=[A-Z])', a)

Output:

['Station Disconnect:1.3.6.1.4.1.11.2.14.11.15.2.75.3.2.0.8', 'StaMAC:00:9F:0B:00:38:B8', 'BSSID:00 9F', 'Radioid:2']

Regex:

(?<=[A-Z0-9]) - Positive lookbehind for A-Z or 0-9

- 1 space character

(?=[A-Z]) - Positive look ahead for A-Z

answered Feb 6, 2017 at 13:54

Mohammad Yusuf

17.1k12 gold badges60 silver badges87 bronze badges

3 Comments

Prafulla Over a year ago

Still it gives something wrong output. line = ' Station Deassoc:1.3.6.1.4.1.11.2.14.11.15.2.75.3.2.0.5 StaMac1:40:83:DE:34:04:75 StaMac2:40:83:DE:34:04:75 UserName:4083de340475 StaMac3:40:83:DE:34:04:75 VLANId:1 Radioid:2 SSIDName:Devices SessionDuration:12 APID:CN58G6749V AP Name:1023-noida-racking-zopnow BSSID:BC:EA:FA:DC:A6:F1'

Mohammad Yusuf Over a year ago

Give me some time, i'll have a look.

Prafulla Over a year ago

Have you checked this problem? Thank you @MYGz

Collectives™ on Stack Overflow

multiple split in string using regex

2 Answers 2

5 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related