regex for blank string

Question

I have a string as:

s=

"(2021-06-29T10:53:42.647Z) [Denis]: hi
(2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING
(2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane 
(2021-06-29T11:58:29.053Z) [Nicholas]: 
(2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#
(2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021
(2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##"

I want to extract the text from it. Expected output as:

comments=['hi','TA FOR SHOWING','how are you bane',' ','#END_REMOTE#','VAL 01JUL2021','##ENDED AT 08:07 GMT##']

What I have tried is:

comments=re.findall(r']:\s+(.*?)\n',s)

regex works well but I'm not able to get the blank text as ''

Could you please provide the code you use to process your text? The string literal your provided does not compile. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Nov 9, 2021 at 11:33

The fourth bird · Accepted Answer · 2021-11-09 11:41:45Z

You can exclude matching the ] instead in the capture group, and if you also want to match the value on the last line, you can assert the end of the string $ instead of matching a mandatory newline with \n

Note that \s can match a newline and also the negated character class [^]]* can match a newline

]:\s+([^]]*)$

Regex demo | Python demo

import re

regex = r"]:\s+([^]]*)$"

s = ("(2021-06-29T10:53:42.647Z) [Denis]: hi\n"
    "(2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING\n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane \n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: \n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#\n"
    "(2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021\n"
    "(2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##")

print(re.findall(regex, s, re.MULTILINE))

Output

['hi', 'TA FOR SHOWING', 'how are you bane ', '', '#END_REMOTE#', 'VAL 01JUL2021', '##ENDED AT 08:07 GMT##']

If you don't want to cross lines:

]:[^\S\n]+([^]\n]*)$

Regex demo

sln · Accepted Answer · 2021-11-10 19:35:27Z

You could identify all after the colon into an array from capture group 1.

re.findall(r'(?m):[ \t]+(.*?)[ \t]*$',s)

then loop the array assigning a space to all empty elements.

>>> import re
>>>
>>> s= """
... (2021-06-29T10:53:42.647Z) [Denis]: hi
... (2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING
... (2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane
... (2021-06-29T11:58:29.053Z) [Nicholas]:
... (2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#
... (2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021
... (2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##
... """
>>>
>>> talk = [re.sub('^$', ' ', w) for w in re.findall(r'(?m):[ \t]+(.*?)[ \t]*$',s)]
>>> print(talk)
['hi', 'TA FOR SHOWING', 'how are you bane', ' ', '#END_REMOTE#', 'VAL 01JUL2021', '##ENDED AT 08:07 GMT##']

hata · Accepted Answer · 2021-11-09 11:38:59Z

0

Is this what you want?

comments = re.findall(r']:\s(.*?)\n',s)

If the space after : is always one space, \s+ should be \s. \s+ means one or more spaces.

answered Nov 9, 2021 at 11:38

hata

12.6k6 gold badges50 silver badges77 bronze badges

Comments

RavinderSingh13 · Accepted Answer · 2021-11-10 19:53:08Z

With your shown samples please try following regex.

^\(\d{4}-\d{2}-\d{2}T(?:\d{2}:){2}\d{2}\.\d{3}Z\)\s+\[[^]]*\]:\s+([^)]*)$

Online demo for above regex

Explanation: Adding detailed explanation for above.

^\(\d{4}-\d{2}-\d{2}  ##Matching from starting of line ( followed by 4 digits-2 digits- 2 digits here.
T(?:\d{2}:){2}        ##Matching T followed by a non-capturing group which is matching 2 digits followed by colon 2 times.
\d{2}\.\d{3}Z\)\s+    ##Matching 2 digits followed by dot followed by 3 digits Z and ) followed by space(s).
\[[^]]*\]:\s+         ##Matching literal [ till first occurrence of ] followed by ] colon and space(s).
([^)]*)$              ##Creating 1st capturing group which has everything till next occurrence of `)`.

With Python3x:

import re
regex = r"^\(\d{4}-\d{2}-\d{2}T(?:\d{2}:){2}\d{2}\.\d{3}Z\)\s+\[[^]]*\]:\s+([^)]*)$"
varVal = ("(2021-06-29T10:53:42.647Z) [Denis]: hi\n"
    "(2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING\n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane \n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: \n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#\n"
    "(2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021\n"
    "(2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##")

print(re.findall(regex, varVal, re.MULTILINE))

Output will be as follows with samples shown by OP:

['hi', 'TA FOR SHOWING', 'how are you bane ', '', '#END_REMOTE#', 'VAL 01JUL2021', '##ENDED AT 08:07 GMT##']

Collectives™ on Stack Overflow

regex for blank string

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related