0

I have a quiet big file of data, which is not in a really good state for further processing. So I want to regex the best out of it and process this data in pandas for further data analysis.

The Data-Information segment repeats itself within the file and contains the necessary information.

My approach so far for the regex was to get some header information out of it. What I'm missing right now, is all three sections of data points. I only need the header from Points to the last data point. How could I grep these sections into multiple or one group?

^(?:Data-Information.*)
(?:\nName:\t+)(?P<Name>.+)
(?:\nSample:\t+)(?P<Sample>.+)
((?:\r?\n.+)+)
(?:\nSystem:\t+)(?P<System>.+)
(?:\r?\n(?!Data-Information).*)*

Sample file

Data-Information
Name:           Polymer A
Sample:     Sunday till Monday
User:           SUD
Count Segments:         5
Application:            RHEOSTAR
Tool:           CP
Date/Time:          24.10.2021; 13:37
System:         CP25

Constants:
- Csr [min/s]:          2,5421
- Css [Pa/mNm]:         2,54679

Section:            1
Number measuring points:            0

Time limit:         2 measuring points, drop
            Duration 30 s
Measurement profile:
  Temperature           T[-1] = 25 °C

Section:            2
Number measuring points:            30

Time limit:         30 measuring points
            Duration 2 s

Points  Time    Viscosity   Shear rate  Shear stress    Momentum    Status
    [s] [Pa·s]  [1/s]   [Pa]    [mNm]   []
1   62  10,93   100 1.090   4,45    TGC,Dy_
2   64  11,05   100 1.100   4,5 TGC,Dy_
3   66  11,07   100 1.110   4,51    TGC,Dy_
4   68  11,05   100 1.100   4,5 TGC,Dy_
5   70  10,99   100 1.100   4,47    TGC,Dy_
6   72  10,92   100 1.090   4,44    TGC,Dy_


Section:            3
Number measuring points:            0

Time limit:         2 measuring points, drop
            Duration 60 s

Section:            4
Number measuring points:            30

Time limit:         30 measuring points
            Duration 2 s

Points  Time    Viscosity   Shear rate  Shear stress    Momentum    Status
    [s] [Pa·s]  [1/s]   [Pa]    [mNm]   []
*** 1 ***   242 -6,334E+6   -0,0000115  72,7    0,296   TGC,Dy_
2   244 63,94   10,3    661 2,69    TGC,Dy_
3   246 35,56   20,7    736 2,99    TGC,Dy_
4   248 25,25   31  784 3,19    TGC,Dy_
5   250 19,82   41,4    820 3,34    TGC,Dy_


Section:            5
Number measuring points:            300

Time limit:         300 measuring points
            Duration 1 s

Points  Time    Viscosity   Shear rate  Shear stress    Momentum    Status
    [s] [Pa·s]  [1/s]   [Pa]    [mNm]   []
1   301 4,142   300 1.240   5,06    TGC,Dy_
2   302 4,139   300 1.240   5,05    TGC,Dy_
3   303 4,138   300 1.240   5,05    TGC,Dy_
4   304 4,141   300 1.240   5,06    TGC,Dy_
5   305 4,156   300 1.250   5,07    TGC,Dy_
6   306 4,153   300 1.250   5,07    TGC,Dy_


Data-Information
Name:           Polymer B
Sample:     Monday till Tuesday
User:           SUD
Count Segments:         5
Application:            RHEOSTAR
Tool:           CP
Date/Time:          24.10.2021; 13:37
System:         CP25

Constants:
- Csr [min/s]:          2,5421
- Css [Pa/mNm]:         2,54679

Section:            1
Number measuring points:            0

Time limit:         2 measuring points, drop
            Duration 30 s
Measurement profile:
  Temperature           T[-1] = 25 °C

Section:            2
Number measuring points:            30

Time limit:         30 measuring points
            Duration 2 s

Points  Time    Viscosity   Shear rate  Shear stress    Momentum    Status
    [s] [Pa·s]  [1/s]   [Pa]    [mNm]   []
1   62  10,93   100 1.090   4,45    TGC,Dy_
2   64  11,05   100 1.100   4,5 TGC,Dy_
3   66  11,07   100 1.110   4,51    TGC,Dy_
4   68  11,05   100 1.100   4,5 TGC,Dy_
5   70  10,99   100 1.100   4,47    TGC,Dy_
6   72  10,92   100 1.090   4,44    TGC,Dy_


Section:            3
Number measuring points:            0

Time limit:         2 measuring points, drop
            Duration 60 s

Section:            4
Number measuring points:            30

Time limit:         30 measuring points
            Duration 2 s

Points  Time    Viscosity   Shear rate  Shear stress    Momentum    Status
    [s] [Pa·s]  [1/s]   [Pa]    [mNm]   []
*** 1 ***   242 -6,334E+6   -0,0000115  72,7    0,296   TGC,Dy_
2   244 63,94   10,3    661 2,69    TGC,Dy_
3   246 35,56   20,7    736 2,99    TGC,Dy_
4   248 25,25   31  784 3,19    TGC,Dy_
5   250 19,82   41,4    820 3,34    TGC,Dy_


Section:            5
Number measuring points:            300

Time limit:         300 measuring points
            Duration 1 s

Points  Time    Viscosity   Shear rate  Shear stress    Momentum    Status
    [s] [Pa·s]  [1/s]   [Pa]    [mNm]   []
1   301 4,142   300 1.240   5,06    TGC,Dy_
2   302 4,139   300 1.240   5,05    TGC,Dy_
3   303 4,138   300 1.240   5,05    TGC,Dy_
4   304 4,141   300 1.240   5,06    TGC,Dy_
5   305 4,156   300 1.250   5,07    TGC,Dy_
6   306 4,153   300 1.250   5,07    TGC,Dy_
2
  • So you're only looking to capture all the Points...Status... blocks (including the data rows before an empty line)? Commented Oct 27, 2021 at 13:03
  • 2
    If you want the 3 items per Data-Information part, you can first match ^Data-Information(?:\n(?!Data-Information$).*)* regex101.com/r/v9a1dq/1 and then per match get the Points part using ^Points\b.*(?:\n(?!\n).*)* regex101.com/r/CtIMRA/1 Commented Oct 27, 2021 at 13:08

1 Answer 1

2

One option is to do it in 2 steps.

First get all the Data-Information parts using a pattern that starts with Data-Information and matches all following lines that do not start with Data-Information.

^Data-Information(?:\n(?!Data-Information$).*)*

Regex demo for Data-Information

The for every part, you can match the line that start with Points, and then match all following lines that contain at least a character (no empty lines)

^Points\b.*(?:\n.+)+

Regex demo for Points

Sign up to request clarification or add additional context in comments.

1 Comment

I was pretty much focused on a 'one step solution'. Anyway, that should work out I guess. Thanks :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.