0

I've a text file, from that I have extracted these two paragraph block. The text example is given below.

Text Example:

EXONERAR, com validade a contar de 19 de agosto de 2020, DE- NILSON DE BRITO LIMA, ID FUNCIONAL Nº 2100423-4, do cargo em comissão de Coordenador, símbolo DAS-8, da Coordenadoria de Gestão Centralizada de Serviços, da Superintendência de Gestão Centralizada, da Subsecretaria de Logística, da Secretaria de Estado de Planejamento e Gestão. Processo nº SEI-120001/010643/2020

EXONERAR, a pedido, NADIA NAKAMURA VIEIRA, ID FUNCIONAL Nº 5099589-8, do cargo em comissão de Assessor Especial, símbolo DG, da Secretaria de Estado de Planejamento e Gestão. Processo nº SEI-150001/004627/2020

EXONERAR, com validade a contar de 26 de novembro de 2020, BRUNO RAFAEL ROCHA COSTA, ID FUNCIONAL Nº 5108093-1, do cargo em comissão de Assessor, símbolo DAS-7, da Assessoria de Planejamento e Gestão, da Presidência, da Superintendência de Des- portos do Estado do Rio de Janeiro - SUDERJ, da Secretaria de Es- tado de Esporte, Lazer e Juventude. Processo nº SEI- 3 0 0 0 0 2 / 0 0 0 4 11 / 2 0 2 0 .

EXONERAR, com validade a contar de 16 de novembro de 2020, LUIS HENRIQUE FERREIRA DE AQUINO, ID FUNCIONAL Nº 1914315-0, do cargo em comissão de Assistente II, símbolo DAI-6, da Secretaria de Estado de Planejamento e Gestão. Processo nº SEI120001/014825/2020:

From the above text block I want to grab the bold values only from each paragraph as a individual row.

What I have tried:

r"\b(?:(?:EXONERAR|d[ae]|por|símbolo)\s([^,]+?)(?: e Gestão)?,|\b(?!SEI\b)([A-Z\d]+-\s*\d+)|SEI-\s*([\d /]+)\b)"

My Current Output:

https://regex101.com/r/FCimoW/1

My current output is almost OK but having issue to not matching all the required parts e.g CAPITALIZED name part.

1

1 Answer 1

2

For the bold uppercase parts, you can add an alternation, matching 1 or more uppercase words separated by a whitespace char or a hyphen and that end with a comma.

\b([A-Z]+(?:[\s-]+[A-Z]+)+(?=,)

Regex demo for the full pattern

Sign up to request clarification or add additional context in comments.

6 Comments

[A-Z]+ It is capturing the CAPITALIZED name but not the international characters. See: regex101.com/r/wqAaSg/1
@AlwaysSunny Try it like this using \p{Lu} regex101.com/r/7iNy7o/1
may be it is not valid in python, getting error sre_constants.error: bad escape \p at position 113
added ` before that like \\p` but it is now creating issue with capturing on python
Please don't be. I've installed that regex package and it is working now. Thanks for the link. Earlier I saw that link but not sure will it work for me or not. But when you advised I used it and it is working as per my requirements. Thanks a million :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.