0

I have following exemplary strings:

- FCF_VD_ID,
- [FCF_VD_Alert_L1, ..., FCF_VD_Alert_L8],
- FCF_VD_SyncID,
- [FCF_VRU_Alert_FCV, FCF_VRU_Alert_A ..., FCF_VRU_Alert_H],
- [COM_Cam_Frame_1, ..., COM_Cam_Frame_8]

And I need to extract some specific parts from these strings. Specifically I need the core name of each string which in the above cases is everything till enumerator. As an enumerator I treat L1->L8, FCV, A->H, 1->8.

As output I need to get two strings:

core, enum = re.match(regex, string)

Example:

FCF_Alert_L1 -> FCF_Alert, L1
FCF_SyncID -> FCF_Sync_ID, None
FCF_VRU_Alert_FCV -> FCF_VRU_Alert, FCV

Unfortunately my regex ^([A-Za-z_]+(ID)?)([A-Z]+\d+|[A-Z]+|\d+)?$ does not work. Can anybody point out the problem in this regex? For FCF_VD_ID_L1 I receive ('FCF_VD_ID_L', None, '1') which is completely not what I require.

4
  • 2
    can you provide exact input and exact output? Commented May 7, 2021 at 14:31
  • Maybe you just want re.findall(r'(\w+)_(\w+)', text)? Or, (\w+)_(L?\d+|FCV|[AH]|[A-Za-z]*ID)\b? See this regex demo. Commented May 7, 2021 at 14:32
  • @WiktorStribiżew this is way too robust and hardcoded Commented May 7, 2021 at 14:42
  • Perhaps like this ^([^_\n]+(?:_[^_\n]+)*?)(?:_(L[1-8]|FCV|[A-Z]|[1-8]|ID))?$ regex101.com/r/4QnKNy/1 Commented May 7, 2021 at 14:54

1 Answer 1

1

It looks like you're looking for this regex:

(\w+?)(?:_(L[1-8]|FCV|[A-H])|([1-8]))?$

which matches a minimal number of word characters (\w+?) followed by an enum part of _ and L1-L8, FCV or A-H, or a digit in the range 1-8.

Note since you are using re.match no ^ is required at the beginning since re.match anchors all matches to the start of the string.

In python:

import re

strs = [
  'FCF_VD_ID', 'FCF_VD_Alert_L1', 'FCF_VD_Alert_L8',
  'FCF_VD_SyncID', 'FCF_VRU_Alert_FCV', 'FCF_VRU_Alert_A',
  'FCF_VRU_Alert_H', 'COM_Cam_Frame_1', 'COM_Cam_Frame_8',
  'idObject1'
]

regex = '(\w+?)(?:_(L[1-8]|FCV|[A-H])|([1-8]))?$'

for s in strs:
    core, enum1, enum2 = re.match(regex, s).groups()
    enum = enum1 if enum1 else enum2
    print(s + ' => ', (core, enum))

Output:

FCF_VD_ID =>  ('FCF_VD_ID', None)
FCF_VD_Alert_L1 =>  ('FCF_VD_Alert', 'L1')
FCF_VD_Alert_L8 =>  ('FCF_VD_Alert', 'L8')
FCF_VD_SyncID =>  ('FCF_VD_SyncID', None)
FCF_VRU_Alert_FCV =>  ('FCF_VRU_Alert', 'FCV')
FCF_VRU_Alert_A =>  ('FCF_VRU_Alert', 'A')
FCF_VRU_Alert_H =>  ('FCF_VRU_Alert', 'H')
COM_Cam_Frame_1 =>  ('COM_Cam_Frame', '1')
COM_Cam_Frame_8 =>  ('COM_Cam_Frame', '8')
idObject1 =>  ('idObject', '1')
Sign up to request clarification or add additional context in comments.

9 Comments

That's probably very close to what I am searching for. But specifically I aim for that last enum which can start with '_' or not. If we have something like idObject1 I want that 1 to be the enum. The input in the post's description is just an example.
@FilipSzczybura this is why it's really important to include all forms of sample data in your question. Your problem though is what about something like abcID; is that (abcID, None) or (abcI, D)?
'All' means thousands. '(\w+?)(?:_?(\\d+|[A-Z]+\\d+|[A-Z]+))?$' is closest what I am searching for, but I want to exclude 'ID|Id|id' from enum extraction. So for COM_Cam_Sync_ID I don't want do get the enum
Do any of the enums have to have _ before them? or is it optional for all of them? If it is, I don't see how you can tell whether ABC should be (ABC, None) or (AB, C) (since C is a valid enum)
Most of these enums will have leading _. The exception is a single digit enum like idObject1. Here the enum is 1. Otherwise I want to get any enum that is: _[A-Z]+\\d+ or _[A-Z]+ or without _ just \\d+. You can notice that enums after _ always are uppercase
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.