1

I have text that looks like something like this:

1. Must have experience in Java 2. Team leader...

I want to render this in HTML as an ordered list. Now adding the </li> tag to the end is simple enough:

s = replace(s, ". ", "</li>");

But how do I go about replacing the 1., 2. etc with <li>?

I have the regular expression \d*\.$ which matches a number with a period, but the problem is is that is a substring so matching 1. Must have experience in Java 2. Team leader with \d*\.$ returns false.

1
  • 1
    @Reimeus the user isn't parsing HTML, they're trying to generate it. Commented Nov 16, 2017 at 15:07

3 Answers 3

3

Code

See regex in use here

\d+\.\s+(.*?)\s*(?=\d+\.\s+|$)

Replace

<li>$1</li>\n

Results

Input

  1. Must have experience in Java 2. Team leader...

Output

<li>Must have experience in Java</li>
<li>Team leader...</li>

Explanation

  • \d+ Match one or more digits
  • \. Match the dot character . literally
  • \s+ Match one or more whitespace characters
  • (.*?) Capture any character any number of times, but as few as possible, into capture group 1
  • \s* Match any number of whitespace characters
  • (?=\d+\.\s+|$) Positive lookahead ensuring either of the following doesn't match
    1. \d+\.\s+
      • \d+ Match one or more digits
      • \. Match the dot character . literally
      • \s+ Match one or more whitespace characters
    2. $ Assert position at the end of the line
Sign up to request clarification or add additional context in comments.

Comments

2

But how do I go about replacing the 1., 2. etc with <li>?

You can use String#replaceAll which can allow regex instead of replace :

s = s.replaceAll("\\d+\\.\\s", "</li>");

Note

  • You don't need to use $ in the end of your regex.
  • You have to escape dot . because it's mean any character in regex
  • You can use \s for one space or \s* for zero or more spaces or \s+ for one or more space

3 Comments

This works, (although in your answer your string replaces the regex match with </li> when it needs to be <li>) but @ctwheels answer is much more robust.
oh sorry, I misunderstood that comment. If you look at @ctwheels's answer under "output" it's exactly what I'm looking for. Though to be fair I was more interested in getting the opening <li> tag, the closing one was less complicated.
its ok @mohammedkhan it was just a comment and i'm happy that ctwheels answer help you ;)
0

We want

<ol>
  <li>one</li>
  <li>two<li>
</ol>

This can be done as:

s = s.replaceAll("(?s)(\\d+\\.)\\s+(.*\\.)\\s*", "<li>$2</li></ol>");
s = s.replaceFirst("<li>", "<ol><li>");
s = s.replaceAll("(?s)</li></ol><li>", "</li>\n<li>");

The trick is to first add </li></ol> with a spurious </ol> that should only remain after the last list item.

(?s) is the DOTALL notation, causing . to also match line breaks.

In case of more than one numbered list this will not do. Also it assumes one single sentence per list item.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.