How to extract text part from file using Python & Regular Expressions

Question

Using Python I want to read a text file, search for a string and print all lines between this matching string and another one.

The textfile looks like the following:

Text=variables.Job_SalesDispatch.CaptionNew
    Tab=0
    TabAlign=0
    }
   }
  }
[UserVariables]
 User1=@StJid;IF(fields.Fieldtype="Artikel.Gerät"  , STR$(fields.id,0,0)  , @StJid)
[Parameters]
 [@Parameters]
  {
  [Parameters]
   {
   LL.ProjectDescription=? (default)
   LL.SortOrderID=
   }
  }
[PageLayouts]
 [@PageLayouts]
  {
  [PageLayouts]
   {
   [PageLayout]
    {
    DisplayName=
    Condition=Page() = 1
    SourceTray=0

Now I want to print all "UserVariables", so only the lines between [UserVariables] and the next line starting with a square bracket. In this example this would be [Parameters].

What I have done so far is:

with open("path/testfile.lst", encoding="utf8", errors="ignore") as file:

  for line in file:
    uservars = re.findall('\b(\w*UserVariables\w*)\b', line)
    print (uservars)

what gives me only [].

My desired output is User1=@StJid;IF(fields.Fieldtype="Artikel.Gerät" , STR$(fields.id,0,0) , @StJid)in this example. But it is also possible to have more UserVariables like User2=@StJid;IF(fields.Fieldtype="Artikel.Referenzgerät" , STR$(fields.id,0,0) , @StJid). — Gardinero
– Gardinero, Commented Apr 24, 2019 at 13:39

Budagov Blues · Accepted Answer · 2019-04-24 10:58:19Z

2

If using regular expressions is not a mandatory requirement for you, you can go with something like this:

with open("path/testfile.lst", encoding="utf8", errors="ignore") as file:
  inside_uservars = False
  for line in file:
    if inside_uservars:
      if line.strip().startswith('['):
        inside_uservars = False
      else:
        print(line)
    if line.strip() == '[UserVariables]':
      inside_uservars = True

answered Apr 24, 2019 at 10:58

Budagov Blues

462 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Gardinero Over a year ago

Thank you. That one is working for me. I will go through the code to learn something.

Tim Biegeleisen · Accepted Answer · 2019-04-24 13:44:24Z

0

We can try using re.findall with the following regex pattern:

\[UserVariables\]\n((?:(?!\[.*?\]).)*)

This says to match a [UserVariables] tag, followed by a slightly complicated looking expression:

((?:(?!\[.*?\]).)*)

This expression is a tempered dot trick which matches any character, one at a time, so long as what lies immediately ahead is not another tag contained in square brackets.

matches = re.findall(r'\[UserVariables\]\n((?:(?!\[.*?\]).)*)', input, re.DOTALL)
print(matches)

[' User1=@StJid;IF(fields.Fieldtype="Artikel.Ger\xc3\xa4t"  , STR$(fields.id,0,0)  , @StJid)\n']

Edit:

My answer assumes that the entire file content sits in memory, in a single Python string. You may read the entire file using:

with open('Path/to/your/file.txt', 'r') as content_file:
    input = content_file.read()
matches = re.findall(r'\[UserVariables\]\n((?:(?!\[.*?\]).)*)', input, re.DOTALL)
print(matches)

edited Apr 24, 2019 at 13:44

answered Apr 24, 2019 at 10:46

Tim Biegeleisen

526k32 gold badges323 silver badges399 bronze badges

2 Comments

Gardinero Over a year ago

The RegEx-part is very cool and exactly what I was looking for. Unfortunately I am too dump to get it working in my code.

Tim Biegeleisen Over a year ago

@Gardinero See my update. My answer will only work if you read the entire file content into a single Python string. Assuming your memory requirements/limitations would allow this, my answer should work, and is basically a one-liner.

Collectives™ on Stack Overflow

How to extract text part from file using Python & Regular Expressions

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related