2

I'm trying to remove markdown code blocks from the following string:

Problem with encoding a very large BigInteger or BigDecimal without fraction

We have an issue to encode a very large BigDecimal. For example, when I tried to encode _7.533938258014959827307132527342E+545_ I got the following error:

``` java
Error com.n1analytics.paillier.EncodeException: Input value cannot be encoded.
at com.n1analytics.paillier.StandardEncodingScheme.encode(StandardEncodingScheme.java:115)
at com.n1analytics.paillier.StandardEncodingScheme.encode(StandardEncodingScheme.java:239)
...
```

The reason why I got the error is as follows:

I've tried:

txt = re.sub('(```[a-z]*\n[\s\S]*?\n```)', '', txt) # https://regex101.com/r/aA5bI3/3
txt = re.sub('(```.+?```)', '', txt)  # https://regex101.com/r/aA5bI3/3

Even though I saw some of these regex at https://coderwall.com/p/r6b4xg/regex-to-match-github-s-markdown-code-blocks and https://regex101.com/r/aA5bI3/3, I had no success

2 Answers 2

2

I believe you can simply use this regex

```.*?```

with single-line option enabled

Example: https://regex101.com/r/fubH7e/1

Sign up to request clarification or add additional context in comments.

Comments

1

The first pattern almost works, but in the example data there is a space before java

If the spaces and the chars a-z are the only acceptable characters after the backticks and they are optional, you could match optional whitespace chars without a newline [^\S\r\n]*

The pattern could look like

```[^\S\r\n]*[a-z]*\n.*?\n```

Instead of using [\s\S]*? you can use the re.DOTALL flag to make the dot match a newline.

Regex demo | Python demo

For example

txt = re.sub(r"```[^\S\r\n]*[a-z]*\n.*?\n```", '', txt, 0, re.DOTALL)

If the backticks always begin at the start of the string, and also end at the start of the string, you could make the pattern a bit more efficient matching all the lines in between that do not start with 3 backticks using a negative lookahead to prevent unnecessary backtracking.

As the pattern uses an anchor, you should use the re.MULTILINE flag.

^```[^\S\r\n]*[a-z]*(?:\n(?!```$).*)*\n```

Explanation

  • ^ Start of string
  • ``` Match 3 backticks
  • [^\S\r\n]*[a-z]* Match optional spaces without a newline and optional chars a-z
  • (?: Non capture group
    • \n(?!```$) Match a newline and assert that the line does not start with 3 backticks
    • .* If that is the case, match the whole line
  • )* Close non capture group and repeat 0+ times to match all lines
  • \n``` Match a newline and 3 backticks

Regex demo | Python demo

For example

txt = re.sub(r"^```[^\S\r\n]*[a-z]*(?:\n(?!```$).*)*\n```", '', txt, 0, re.MULTILINE)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.