1

I'm new to the Regex world and I've browse many site without finding what I'm looking for. I have a file where I need to fetch the address. The address is align-left of the paper (there's text in the same line at the right).

Some information on multiple line (6)
that I don't need and can't paste because
it contains some personal information. 
So imagine a lot of text here...
So imagine a lot of text here...
So imagine a lot of text here...

Sold To                                              Bill To
Some Cie                                             Some Other Cie
1111 chemin some-road                                2222 chemin some-other-road
City-Here QC J0Q 1W0                                 Other City-Here QC J0Q 1W0 
Canada                                               Canada

I need to fetch the text in the 'Sold To' side. I tried to use the \r but it returns nothing! I don't know how to fetch the text from the start of the line until there's a bunch of spaces. Ex: Some Cie (if more than 1 spaces, go to next line)

then I have: Sold\sTo(?=\s{2,100}) but it won't work while (?=\s{2, 100}) returns everything!!!

I saw this: ^((?:\S+\s+){2}\S+).*, which is very close to what I want, but I don't understand the whole thing. I would like to match from 2 to 5 words.

Then I have this: ^([A-Za-z0-9-]*)(?=\s{2,100}) which I thought would match At the beginning of the line until there's more than 2 spaces. What am I getting wrong?

I need to do this in pure Regex (no flags allowed).

I'm completely lost. Some guidance would be much appreciated.

0

1 Answer 1

1

You're pretty close on your last attempt. Here's what I came up with:

^.+?(?=[^\S\n]{2,})

Explanation:

  • .+ - One or more characters
    • ? - Non-greedy, to give the next part priority, i.e. avoid matching a bunch of spaces
  • [^\S\n] - Any whitespace character except newline (this is like \s minus \n)
    • {2,} - Two or more

Matches from the example:

Sold To
Some Cie
1111 chemin some-road
City-Here QC J0Q 1W0
Canada

Try it on Regex101

Simple example in Python:

import re

pattern = re.compile(r'^.+?(?=[^\S\n]{2,})')

with open(filename) as f:
    for line in f:
        m = pattern.match(line)
        if m:
            print(m.group())
Sign up to request clarification or add additional context in comments.

4 Comments

It's working, but only if flag 'Global' and 'multiline' enabled. Is there a way to make it work without specifying those? I'm using a tool that doesn't allow me to specify any flags
I found out (m?) enabled multiline. Now I'm pretty sure the issue is coming from my tool. Thanks!
@mrinfo Global is not a real flag in Python. Instead you use a different function, like re.search vs re.findall. Regex101 just uses it for convenience. As well you don't need multiline if you iterate over the lines. I posted a simple example.
in fact, it's because I'm using a tool that uses python, I can't write my own python method, only regex in a yml file. To be honest, it's kinda weird... When I use my real file, it's fetching way too much line, I need to add some delimiters in there I think

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.