1

I am trying to extract data from this String:

Hello there. Blah blahblah blah Building 016814 - Door 01002 BlahBLAHblah DUHHH 78787 blah, Blah blah Building Dr 4647 8989 BLAHBlah blah blahBlah

I am looking to loop through the String and pull each instance of Building and Door number and output to console. However, since both instances of Building and Door number are different form one another, I know that I will need to use two different Regex patterns.

Here is my code:

public static void main(String agrs[]) {
    String myStr = "Hello there. Blah blahblah blah Building 016814 - Door 01002"+
           " BlahBLAHblah DUHHH 78787 blah, Blah blah Building Dr 4647 8989 BLAHBlah blah blahBlah";

    Pattern p = Pattern.compile("Building.+?(?:[Dd]).+?(\\d+).+?(\\d+)");
    Pattern p1 = Pattern.compile("Building.+?(\\d+).+?(?:[Dd]).+?(\\d+)");

    Matcher m = p.matcher(myStr);
    Matcher m1 = p1.matcher(myStr);

    while(m1.find() && m.find()) {
         System.out.print(" Building " + m1.group(1) + " " + "Door ");
         System.out.print(m1.group(2));
         System.out.print(" Building " + m.group(1)+" "+ "Door "+m.group(2));
    }

And here is my output:

Building 016814 Door 01002 Building 01002 Door 78787

I know it has something to do with my p regex pattern. It seems to be pulling any numbers in between. I am a newbie to regex so let me know if you need more info about this. Any help will be much appreciated.

2
  • In that last line of Building Dr 4647 8989, should it match anything? I took that to mean it should match Dr 4647. Commented Mar 15, 2011 at 17:03
  • If it is known that there are spaces between building and its number you can use Building[ ]+? instead of Building.+? This will make sure you catch correct building number. Similarly for doors too. Run a separate regex for buildings and doors each. Commented Mar 15, 2011 at 17:04

2 Answers 2

1

I believe I've figured out the answer to my own question. Thank you all so much for your input; much appreciated.

I used:

Building[ ][Dd].+?(\\d+).+?(\\d+)

and my output was:

Building 016814 Door 01002 Building 4647 Door 8989

Sign up to request clarification or add additional context in comments.

1 Comment

it's fine to answer your own question, but go ahead and mark your own answer as the "correct" answer by clicking on the checkmark to the left.
0

Your (.+?) parts are too broad. Try this:

"\\b((?:Building|Door|Dr)\\s\\d+)\\b"

Then just grab what's in the captures from group 1. Make sure you turn off case-sensitive matching if you don't want that.

I'm guessing at the results you want here. You may actually be looking for this instead:

"\\b(Building\\s\\d+)\\s(Door\\s\\d+)\\b"

Edit: Based on your comments, the simplest way I can think of is this:

"\\bBuilding\\s(?:(\\d+)\\sDoor\\s(\\d+)|Dr\\s(\\d+)\\s(\\d+))\\b"

Removing the doubled backslashes for clarity:

/\bBuilding\s(?:(\d+)\sDoor\s(\d+)|Dr\s(\d+)\s(\d+))\b/

6 Comments

Hi Justin, thanks, I want to be able to pull these two numbers: 4647 8989 (which represent building and Door (Dr)) consecutively.
@John - Check my edited version and see if that works for you.
It doesn't seem to be working. When I use your pattern, my output is: Building 016814 Door 01002 Building null Door 01002
Just to clarify; I would like my output to be: Building 016814 Door 01002 Building 4647 Door 8989. thanks
@John - How many capture groups are coming back? I think there should be 4. The building numbers will be in 1 and 3, and the door numbers will be in 2 and 4. Hope that makes sense as I've written it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.