1

I have a series of lines as follows (which can come in any order)

Distal latency   4.9 N/A N/A 4.0 N/A N/A N/A N/A 6.3 4.4 N/A

 % failed Chicago Classification  70 1 1 0 1 1 1 1 0 0 1

 % panesophageal pressurization  0 0 0 0 0 0 0 0 0 0 0

 % premature contraction  20 0 0 1 0 0 0 0 0 1 0

 % rapid contraction  10 0 0 1 0 0 0 0 0 0 0

 % large breaks  10 0 0 0 0 0 0 0 1 0 0

 % small breaks  10 0 0 1 0 0 0 0 0 0 0

I want to eventually extract the line title and each value into a Hash as follows

Distallatency=4.9,Distallatency=N/A etc.
failedChicagoClassification1=70,failedChicagoClassification1=1,failedChicagoClassification1=1,failedChicagoClassification1=0,failedChicagoClassification1=1 etc.

and so on

My strategy to do this is:

1. join the words together by replacing the \s between words
2. End the joined word with a character eg : so I can then split each line into an array based on \s
3. Loop through the array adding the line title to each value into a Hash

Here is what I have done so far:

Pattern match_patternSwallow2 = Pattern.compile("(?:.*\\d+\\.\\d|N\\/A|\\d*){4,50}");
Matcher matchermatch_patternSwallow2 = match_patternSwallow2.matcher(s);

while (matchermatch_patternSwallow2.find()){
    String found = matchermatch_patternSwallow2.group(0).trim();
    System.out.println(found);

    //Join up the words so can then split by space
    found = found.replaceAll("([A-Za-z]+)\\s", "$1_").replaceAll("\\s", ":");
    List<String> myList = new ArrayList<String>(Arrays.asList(found.split(":")));

    for (int ff=1;ff<myList.size();ff++){
        mapSwallow.put(myList.get(0)+"MapSwallowsNum"+ff,myList.get(ff));
    }
}

I get no errors with the capture but it only returns an empty string at the System.out line.

What am I doing wrong?

5
  • Are you processing line by line? Commented Nov 6, 2016 at 21:39
  • I am taking the whole document as my string and then pattern matching on that. Doesn't 'while' just process all the matches therefore I shouldn't need to go line by line? Commented Nov 6, 2016 at 21:44
  • No idea, it is difficult to help without a reproducible example. Try "(?m)^\\W*([a-zA-Z].*?)\\s*((?:(?:\\d+(?:\\.\\d+)?|N/A)\\s*)*)$" regex. Then use the .group(1).replaceAll("\\s+","") as the key, and split .group(2) with .split("\\s+") to get the values. Commented Nov 6, 2016 at 22:39
  • Something like ideone.com/ZbOcLN Commented Nov 6, 2016 at 22:44
  • OK. Seems to work for most of the data. Please post as an answer and I will vote for it Commented Nov 6, 2016 at 23:12

1 Answer 1

1

I can suggest the following regex to get each line that meets your criteria:

"(?m)^\\W*([a-zA-Z].*?)\\s*((?:(?:\\d+(?:\\.\\d+)?|N/A)\\s*)‌​*)$"

See the regex demo

Details:

  • (?m) - multiline mode on
  • ^ - start of a line
  • \\W* - 0+ non-word chars
  • ([a-zA-Z].*?) - (Group 1) a letter followed with any 0+ chars other than linebreak chars as few as possible up to
  • \\s* - zero or more whitespaces
  • ((?:(?:\\d+(?:\\.\\d+)?|N/A)\\s*)‌​*) - Group 2 capturing 0+ sequences of digits (followed with a dot and digits optionally) or N/A followed with 0+ whitespaces
  • $ - end of line.

Once you find a match, use the .group(1).replaceAll("\\s+","") as the key, and split .group(2) with .split("\\s+") to get the values.

See a sample online code:

String s = "Distal latency   4.9 N/A N/A 4.0 N/A N/A N/A N/A 6.3 4.4 N/A\n\n % failed Chicago Classification  70 1 1 0 1 1 1 1 0 0 1\n\n % panesophageal pressurization  0 0 0 0 0 0 0 0 0 0 0\n\n % premature contraction  20 0 0 1 0 0 0 0 0 1 0\n\n % rapid contraction  10 0 0 1 0 0 0 0 0 0 0\n\n % large breaks  10 0 0 0 0 0 0 0 1 0 0\n\n % small breaks  10 0 0 1 0 0 0 0 0 0 0";
Pattern match_patternSwallow2= Pattern.compile("(?m)^\\W*([a-zA-Z].*?)\\s*((?:(?:\\d+(?:\\.\\d+)?|N/A)\\s*)*)$");
Matcher matchermatch_patternSwallow2 = match_patternSwallow2.matcher(s);
HashMap<String, String> mapSwallow = new HashMap<String, String>();
while (matchermatch_patternSwallow2.find()){
    String[] myList = matchermatch_patternSwallow2.group(2).split("\\s+");
    String p1 = matchermatch_patternSwallow2.group(1).replaceAll("\\s+", "");
    int line = 1;
    for (String p2s: myList){
        mapSwallow.put(p1+line, p2s);
        line++;
    }
}
System.out.println(mapSwallow);
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.