2

I have a huge txt that can contain several house names and, for each house, there are some values specific to that specific house, and so on. Here is a similar part of my txt:

getHouseName: house1
random useless text
price: 1000
squaremtr: 75
sellVal: 1000
random useless text
random useless text
random useless text
rentPrice: 150
getHouseName: house2
price: 1004
squaremtr: 85
sellVal: 950
random useless text
rentPrice: 150
getHouseName: house3
price: 1099
squaremtr: 90
random useless text
random useless text
sellVal: 1100
random useless text
rentPrice: 199

I would like, for every house, to retrieve values specific for each house and store them into a variable using regexes. Right now this is my code:

public void testHouse() {
    Scanner txt = new Scanner(new File("path//to//file"));

    String houseName ="";
    String price = "";
    String squaremtr = "";
    String sellVal = "";
    String rentPrice = "";
    
    Pattern houseNamePatt = Pattern.compile("getHouseName: ((_!getHouseName: \\s).)*", Pattern.DOTALL);

    while(txt.hasNextLine()) {
        String str = txt.nextLine();
        Matcher m = houseNamePatt.matcher(str);
        if(m.find) {
            houseName=str.substring(m.end());
            System.out.println("houses: " + m.group());
        }
    }
}

But in this case I'm just getting a list with all the house names, not the lines between each name and I definitely can't assign the values of a specific house to my variables. Where am I wrong? Thank you

1
  • If you want the variables with the values, you can use capturing groups and use a pattern to get the values line by line. getHouseName:\h+(.*)\Rprice:\h+(\d+) etc.. Commented Nov 20, 2020 at 12:52

2 Answers 2

2

You can get all values by matching the names followed by a capturing group. If there are lines in between with random values, you can match all lines that do not start with the next expected value using a negative lookahead (?!

Then set the values of the variables equals to the group number.

^getHouseName:\h+(.+)(?:\R(?!price:).*)*\Rprice: (\d+)(?:\R(?!squaremtr:).*)*\Rsquaremtr:\h+(\d+)(?:\R(?!sellVal:).*)*\RsellVal:\h+(\d+)(?:\R(?!rentPrice:).*)*\RrentPrice:\h+(\d+)

In parts:

  • ^ Start of string
  • getHouseName:\h+(.+) Match the value for getHouseName in group 1
  • (?:\R(?!price:).*)*\Rprice: (\d+) Match until the next line with price, capture 1+ digits in group 2
  • (?:\R(?!squaremtr:).*)*\Rsquaremtr:\h+(\d+) Match until the next line with squaremtr, capture 1+ digits in group 3
  • (?:\R(?!sellVal:).*)*\RsellVal:\h+(\d+) Match until the next line with sellVal, capture 1+ digits in group 4
  • (?:\R(?!rentPrice:).*)*\RrentPrice:\h+(\d+) Match until the next line with rentPrice, capture 1+ digits in group 5

Regex demo

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! I'll try
1

The following regex will do it:

(?m)^getHouseName: (.*)\\Rprice: (.*)\\Rsquaremtr: (.*)\\RsellVal: (.*)\\RrentPrice: (.*)

Test

String hugeText = "getHouseName: house1\n" + 
                  "price: 1000\n" + 
                  "squaremtr: 75\n" + 
                  "sellVal: 1000\n" + 
                  "rentPrice: 150\n" + 
                  "getHouseName: house2\n" + 
                  "price: 1004\n" + 
                  "squaremtr: 85\n" + 
                  "sellVal: 950\n" + 
                  "rentPrice: 150\n" + 
                  "getHouseName: house3\n" + 
                  "price: 1099\n" + 
                  "squaremtr: 90\n" + 
                  "sellVal: 1100\n" + 
                  "rentPrice: 199";

String regex = "(?m)^" +
               "getHouseName: (.*)\\R" + 
               "price: (.*)\\R" + 
               "squaremtr: (.*)\\R" + 
               "sellVal: (.*)\\R" + 
               "rentPrice: (.*)";
for (Matcher m = Pattern.compile(regex).matcher(hugeText); m.find(); ) {
    String houseName = m.group(1);
    String price     = m.group(2);
    String squaremtr = m.group(3);
    String sellVal   = m.group(4);
    String rentPrice = m.group(5);
    System.out.printf("%-8s %6s %4s %6s %5s%n",
                      houseName, price, squaremtr, sellVal, rentPrice);
}

Output

house1     1000   75   1000   150
house2     1004   85    950   150
house3     1099   90   1100   199

2 Comments

I modified my question, because I have some random text between the values. How can I modify that according to the complex text?
@thranduil90 Please be more accurate when asking. Are the 5 values you're always present, and are they always in the same order? Make sure question has the full specification of what you're looking for.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.