0

I am looking for how to separate the text in a MAP in java. For example I have the following text:

2.10 Add nodev Option to Removable Media Partitions (Scored) Profile Description:Set nodev on removable media to prevent character and block special devices that are present , on the removable media from being treated as device files.

So I made the following code using the regex:

    String text ="2.10 Add nodev Option to Removable Media Partitions (Scored)"
                 +"Profile  Description:Set nodev on removable media to prevent character and "
                 +"block special devices that are present" 
                 +", on the removable media from being treated as device files. ";
     Map<String, List<String>> maps = new HashMap<>();
            Pattern pattern = Pattern.compile("^((\\d+\\.)*?(\\d+)) .*$"); //To find out if there is, for example, 1.1. 
            Pattern pattern2 = Pattern.compile("[0-9].*?.*[0-9].*$");//To retrieve the title of the paragraph: 1.1. Add Nodev Option to Removable Media Scores
            List<String> paragraphe = new ArrayList<>();
            maps.put(null, paragraphe); 

            for(String ligne : text.split("\n")) {  

                          Matcher matcher = pattern.matcher(ligne); 
                          Matcher matcher2 = pattern2.matcher(ligne); 

                              if ( matcher.matches() && matcher2.matches()) { 

                                       paragraphe = new ArrayList<>(); 
                                       maps.put( matcher2.group(0), paragraphe);
                                       paragraphe.add(ligne); 
                          }
                              else {

                                  paragraphe.add(ligne);
                              }


        }
for (Entry<String, List<String>> key : maps.entrySet()) {
                for (String strings : key.getValue()) {
                    if (strings.contains("(Scored)")) {                 
                        System.out.println("Key : " + key.getKey() + " Value : " + key.getValue());
                    }
                }

            }

This code displays the following result:

Key : 2.10 Add nodev Option to Removable Media Partitions (Scored)

Value : [2.10 Add nodev Option to Removable Media Partitions (Scored)

Profile Description:Set nodev on removable media to prevent character and block special devices that are present , on the removable media from being treated as device files. ]

But I want to have the following results: the key containing the title (2.10 Add nodev Option to Removable Media Partitions (Scored)) and the value of their content (Profile Description:Set nodev on removable ......):

Key : 2.10 Add nodev Option to Removable Media Partitions (Scored)

Value : [ Profile Description:Set nodev on removable media to prevent character and block special devices that are present , on the removable media from being treated as device files. ]

Someone could help me to get the right result. Thank you

5
  • 1
    You split text with \n, but the text has no \n. Commented May 23, 2017 at 8:45
  • Since there isn't really a line delimiter, how do you plan to determine where the title ends and the paragraph starts? Commented May 23, 2017 at 8:51
  • I want to separate the text without \n, for example if the line contains the word " (Scored)", so we will only take the contents of this line: 2.10 Add nodev Option to Removable Media (Scored) Commented May 23, 2017 at 8:52
  • Please, can you explain better what exactly determines key and description? Is it the string "(Scored)" that marks always key end, or also a "\n" can mark key end? Can description span on multiple lines (separated by "\n")? Does description always start in a new line? Commented May 23, 2017 at 8:53
  • Each title ends with the keyword " (Socred)" The description of the paragraph begins with a new line. Commented May 23, 2017 at 8:56

3 Answers 3

1

I'd use a single regex representing the three parts of a paragraph while grouping them in their own capturing group :

((\d+(?:\.\d+)?)?.*\(Scored\))\n?(.*)

to be used with the DOTALL flag, so

Pattern.compile("((\\d+(?:\\.\\d+)?)?.*\\(Scored\\))\\n?(.*)", Pattern.DOTALL)

The first group is the title, the second group the number at the beginning of the title and the third the body of the paragraph.

I've added a \n? to remove the leading linefeed of the body.

You can try it on regex101 or on ideone.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the solution @Aaron
1

If (Scored) is the last word in the title, and the text after is the paragraph, then change your regex patterns to:

  • For Title:

    ^((\d+\.)*?(\d+)).*\(Scored\)
    

    Added \(Scored\) at the end to make sure the title ends with (Scored)

  • For the paragraph:

    (?<=\(Scored\) ).*$
    

    Added a positive lookbehind (?<=\(Scored\) ) that makes sure the match is preceded by (Scored).

Regex101 Demo for Title

Regex101 Demo for Paragrapgh

2 Comments

Thanks for the solution @degant
The two solution you proposed it works but I can not accept two solutions at the same time. thanks
0

The solution is to replace the following line:

Pattern pattern = Pattern.compile("^((\\d+\\.)*?(\\d+)) .*$");

by

Pattern.compile("((\\d+(?:\\.\\d+)?)?.*\\(Scored\\))\\n?(.*)", Pattern.DOTALL)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.