How to speparate a text using regex in Java?

Question

I am looking for how to separate the text in a MAP in java. For example I have the following text:

2.10 Add nodev Option to Removable Media Partitions (Scored) Profile Description:Set nodev on removable media to prevent character and block special devices that are present , on the removable media from being treated as device files.

So I made the following code using the regex:

    String text ="2.10 Add nodev Option to Removable Media Partitions (Scored)"
                 +"Profile  Description:Set nodev on removable media to prevent character and "
                 +"block special devices that are present" 
                 +", on the removable media from being treated as device files. ";
     Map<String, List<String>> maps = new HashMap<>();
            Pattern pattern = Pattern.compile("^((\\d+\\.)*?(\\d+)) .*$"); //To find out if there is, for example, 1.1. 
            Pattern pattern2 = Pattern.compile("[0-9].*?.*[0-9].*$");//To retrieve the title of the paragraph: 1.1. Add Nodev Option to Removable Media Scores
            List<String> paragraphe = new ArrayList<>();
            maps.put(null, paragraphe); 

            for(String ligne : text.split("\n")) {  

                          Matcher matcher = pattern.matcher(ligne); 
                          Matcher matcher2 = pattern2.matcher(ligne); 

                              if ( matcher.matches() && matcher2.matches()) { 

                                       paragraphe = new ArrayList<>(); 
                                       maps.put( matcher2.group(0), paragraphe);
                                       paragraphe.add(ligne); 
                          }
                              else {

                                  paragraphe.add(ligne);
                              }


        }
for (Entry<String, List<String>> key : maps.entrySet()) {
                for (String strings : key.getValue()) {
                    if (strings.contains("(Scored)")) {                 
                        System.out.println("Key : " + key.getKey() + " Value : " + key.getValue());
                    }
                }

            }

This code displays the following result:

Key : 2.10 Add nodev Option to Removable Media Partitions (Scored)

Value : [2.10 Add nodev Option to Removable Media Partitions (Scored)

Profile Description:Set nodev on removable media to prevent character and block special devices that are present , on the removable media from being treated as device files. ]

But I want to have the following results: the key containing the title (2.10 Add nodev Option to Removable Media Partitions (Scored)) and the value of their content (Profile Description:Set nodev on removable ......):

Key : 2.10 Add nodev Option to Removable Media Partitions (Scored)

Value : [ Profile Description:Set nodev on removable media to prevent character and block special devices that are present , on the removable media from being treated as device files. ]

Someone could help me to get the right result. Thank you

Since there isn't really a line delimiter, how do you plan to determine where the title ends and the paragraph starts? — degant
– degant, Commented May 23, 2017 at 8:51
I want to separate the text without \n, for example if the line contains the word " (Scored)", so we will only take the contents of this line: 2.10 Add nodev Option to Removable Media (Scored) — Michael1
– Michael1, Commented May 23, 2017 at 8:52
Please, can you explain better what exactly determines key and description? Is it the string "(Scored)" that marks always key end, or also a "\n" can mark key end? Can description span on multiple lines (separated by "\n")? Does description always start in a new line? — Sampisa
– Sampisa, Commented May 23, 2017 at 8:53
Each title ends with the keyword " (Socred)" The description of the paragraph begins with a new line. — Michael1
– Michael1, Commented May 23, 2017 at 8:56

Aaron · Accepted Answer · 2017-05-23 09:10:08Z

1

I'd use a single regex representing the three parts of a paragraph while grouping them in their own capturing group :

((\d+(?:\.\d+)?)?.*\(Scored\))\n?(.*)

to be used with the DOTALL flag, so

Pattern.compile("((\\d+(?:\\.\\d+)?)?.*\\(Scored\\))\\n?(.*)", Pattern.DOTALL)

The first group is the title, the second group the number at the beginning of the title and the third the body of the paragraph.

I've added a \n? to remove the leading linefeed of the body.

You can try it on regex101 or on ideone.

edited May 23, 2017 at 9:10

answered May 23, 2017 at 9:03

Aaron

24.9k2 gold badges41 silver badges61 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Michael1 Over a year ago

Thanks for the solution @Aaron

degant · Accepted Answer · 2017-05-23 08:59:24Z

1

If (Scored) is the last word in the title, and the text after is the paragraph, then change your regex patterns to:

For Title:
```
^((\d+\.)*?(\d+)).*$Scored$
```
Added $Scored$ at the end to make sure the title ends with (Scored)
For the paragraph:
```
(?<=$Scored$ ).*$
```
Added a positive lookbehind (?<=$Scored$ ) that makes sure the match is preceded by (Scored).

Regex101 Demo for Title

Regex101 Demo for Paragrapgh

answered May 23, 2017 at 8:59

degant

4,9711 gold badge19 silver badges29 bronze badges

2 Comments

Michael1 Over a year ago

Thanks for the solution @degant

Michael1 Over a year ago

The two solution you proposed it works but I can not accept two solutions at the same time. thanks

Michael1 · Accepted Answer · 2017-05-23 09:38:13Z

0

The solution is to replace the following line:

Pattern pattern = Pattern.compile("^((\\d+\\.)*?(\\d+)) .*$");

by

Pattern.compile("((\\d+(?:\\.\\d+)?)?.*\\(Scored\\))\\n?(.*)", Pattern.DOTALL)

answered May 23, 2017 at 9:38

Michael1

2531 gold badge6 silver badges19 bronze badges

Collectives™ on Stack Overflow

How to speparate a text using regex in Java?

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related