1

I'm working with Emergency Services data in the NEMSIS XSD. I have a field, which is constrained to only 50 characters. I've searched this site extensively, and tried many solutions - Notepad++ rejects all of them, saying not found.

Here's an XML Sample:

<E09>
        <E09_01>-5</E09_01>
        <E09_02>-5</E09_02>
        <E09_03>-5</E09_03>
        <E09_04>-5</E09_04>
        <E09_05>this one is too long Non-Emergency - PT IS BEING DISCHARGED FROM H AFTER BEING ADMITTED FOR FAILURE TO THRIVE AND ALCOHOL WITHDRAWAL</E09_05>
</E09>
<E09>
        <E09_01>-5</E09_01>
        <E09_02>-5</E09_02>
        <E09_03>-5</E09_03>
        <E09_04>-5</E09_04>
        <E09_05>this one is is okay</E09_05>
</E09>

I've tried solutions naming the E09_05 tag in different ways, using <\/E09_05> for the closing tag as I've seen in some examples, and as just </E09_05> as I've seen in others. I've tried ^.{50,}$ between them, or [a-zA-Z]{50,}$ between them, I've tried wrapping those in-between expressions in () and without. I even tried just [\s\S]*? in between the tags. The only thing that Notepad++ finds is when I use ^.{50,}$ by itself with no XML tags ... but then I wind up hitting on all the E13_01 tags (which are EMS narratives, and always > 50 characters) -- making for painstaking and wrist-aching clicks.

I wanted to XSLT this, but there is too much individual, hands on tweeking of each E09_05 field for automating it. Perl is not an option in this environment (and not a tool I know at all anyway).

To be truly sublime, both E09_05 and E09_08 fields with string lengths >50 need to be what is selected on the search ... but no other elements of any kind or length.

Thanks in advance. I'm sure I'm just missing some subtle \, or () or [] somewhere ... hopefully ...

3
  • . will match < but you need not to match the start of the closing tag so you need to do something like <E09_05>[^<]{51,}</E09_05> Commented Feb 1, 2020 at 21:34
  • You could easily do this with XSLT-2.0 which supports RegEx'es with the fn:replace function (or even the <analyze-string> element). Commented Feb 1, 2020 at 21:42
  • Trust me, XSLT was my first go-to. I'm trying to get approval for a workaround. The problem is, the field is a subjective one, and the data in it must be preserved, and there is no way to automate that other than moving it to another field (which I'm trying to get an XSLT to do ... if the government data manager approves ... but as we all know, to ask permission is to seek ...) Commented Feb 1, 2020 at 22:43

1 Answer 1

3

The following regex will find the text content of <E09_05> elements with more than 50 characters.

(?<=<E09_05>).{51,}?(?=</E09_05>)

Explanation

(?<=<E09_05>)     Start matching right after <E09_05>

.{51,}?           Match 51 or more characters (in a single line)
                  The ? makes it reluctant, so it'll stop at first </E09_05>

(?=</E09_05>)     Stop matching right before </E09_05>

For truly sublime matching, i.e. both E09_05 and E09_08 fields with string lengths >50, use:

(?<=<(E09_0[58])>).{51,}?(?=</\1>)

Explanation

<(E09_0[58])>     Match <E09_05> or <E09_08>, and capture the name as group 1

</\1>             Use \1 backreference to match name inside </name>

If you want to shorten the text with ellipsis at the end, e.g. Hello World with max length 8 becomes Hello..., use:

Find what: (?<=<(E09_0[58])>)(.{47}).{4,}(?=</\1>)
Replace with: \2...

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Andreas! This (?<=<(E09_0[58])>).{51,}?(?=</\1>) is the money! My wrists thank you too. I'm still gonna see if I can XSLT this, but meanwhile, this is gonna shave a lot of time--and mouse clicks/hard returns--off.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.