1

here i am trying to extract content between pointID and point Name. As shown in below image.

enter image description here

        import re
import pandas as pd
import numpy as np

sent1 = 'Date:2020/07/11 13:53  Low Alarm OFF\nAlarm Priority:Urgent\nPoint ID0000294.AI.0017707\nPoint Name:BOM-DC3-B2-2F-Q1-TEMP 3\nAlarm:Normal\nStatus:18.6 øC'
sent2 = 'Date:2020/07/11 13:42  Low AlarmAlarm Priority:UrgentPoint ID0000294.AI.0017707Point Name:BOM-DC3-B2-2F-Q1-TEMP 3Alarm:AbnormalStatus:Analog Lower Limit Alarm 18.0 øC'
def extract_id(sent):
    lst=re.split(r'\W+', sent)
    lst=str(lst[13]) + str(lst[14]) + str(lst[15])
    return(lst)

Here with first sent1 i am able to extract content between pointid but point Name But with the sent2 i am unable to do so . Why because i am splitting the complete sentence in a list and then fetching the list index 13/14/15. which is not same for sent2. Need a solution using Regular expression how can fetch the content between Point ID[Required content]Point Name.

1
  • You could use a non capturing group with an optional newline Point ID(\S.*?)[\r\n]*Point Name\b regex101.com/r/FS2HdC/1 Commented Jul 12, 2020 at 14:59

1 Answer 1

1

You could use match the optional newline before matching Point Name.

For the required content part, you could match at least a single non whitespace char \S after Point ID.

Point ID(\S.*?)[\r\n]*Point Name\b

The pattern matches

  • Point ID Match literaly
  • (\S.*?) Capture group 1, match a non whitespace char and any char except a newline non greedy
  • [\r\n]* Match 0+ newlines
  • Point Name\b Match Point Name followed by a word boundary

Regex demo | Python demo

import re

def extract_id(sent):
    regex = r"Point ID(\S.*?)[\r\n]*Point Name\b"
    return re.findall(regex, sent)

sent1 = 'Date:2020/07/11 13:53  Low Alarm OFF\nAlarm Priority:Urgent\nPoint ID0000294.AI.0017707\nPoint Name:BOM-DC3-B2-2F-Q1-TEMP 3\nAlarm:Normal\nStatus:18.6 øC'
sent2 = 'Date:2020/07/11 13:42  Low AlarmAlarm Priority:UrgentPoint ID0000294.AI.0017707Point Name:BOM-DC3-B2-2F-Q1-TEMP 3Alarm:AbnormalStatus:Analog Lower Limit Alarm 18.0 øC'

print(extract_id(sent1))
print(extract_id(sent2))

Output

['0000294.AI.0017707']
['0000294.AI.0017707']
Sign up to request clarification or add additional context in comments.

1 Comment

We can also use id = re.compile(r"Point ID(\S.*?)[\r\n]*Point Name\b") to optimize response time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.