13

do you know function in java that will validate a string to be a good XML element name.

Form w3schools:

XML elements must follow these naming rules:

  1. Names can contain letters, numbers, and other characters
  2. Names cannot start with a number or punctuation character
  3. Names cannot start with the letters xml (or XML, or Xml, etc)
  4. Names cannot contain spaces

I found other questions that offered regex solutions, isn't there a function that already does that?

4 Answers 4

14

If you are using Xerces XML parser, you can use the XMLChar (or XML11Char) class isValidName() method, like this:

org.apache.xerces.util.XMLChar.isValidName(String name)

There is also sample code available here for isValidName.

Sign up to request clarification or add additional context in comments.

3 Comments

Nice, it looks exactly what I am looking for but do you know why XMLChar.isValidName("xml") returns true? (Question approved)
"xml", as measured case-insensitively, is valid -- but reserved. You may come across it in practice. If you are checking for input, you might want to add && !name.toLowerCase().startsWith("xml")
The required code for the name validity check actually consists of just a few lines and can be copied out to avoid another external dependency. See my answer for details.
4

The relevant production from the spec is http://www.w3.org/TR/xml/#NT-Name

Name ::== NameStartChar NameChar *

NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

So a regex to match it is

"^[:A-Z_a-z\\u00C0\\u00D6\\u00D8-\\u00F6\\u00F8-\\u02ff\\u0370-\\u037d"
+ "\\u037f-\\u1fff\\u200c\\u200d\\u2070-\\u218f\\u2c00-\\u2fef\\u3001-\\ud7ff"
+ "\\uf900-\\ufdcf\\ufdf0-\\ufffd\\x10000-\\xEFFFF]"
+ "[:A-Z_a-z\\u00C0\\u00D6\\u00D8-\\u00F6"
+ "\\u00F8-\\u02ff\\u0370-\\u037d\\u037f-\\u1fff\\u200c\\u200d\\u2070-\\u218f"
+ "\\u2c00-\\u2fef\\u3001-\\udfff\\uf900-\\ufdcf\\ufdf0-\\ufffd\\-\\.0-9"
+ "\\u00b7\\u0300-\\u036f\\u203f-\\u2040]*\\Z"

If you want to deal with namespaced names, you need to make sure that there is at most one colon, so

"^[A-Z_a-z\\u00C0\\u00D6\\u00D8-\\u00F6\\u00F8-\\u02ff\\u0370-\\u037d"
+ "\\u037f-\\u1fff\\u200c\\u200d\\u2070-\\u218f\\u2c00-\\u2fef\\u3001-\\udfff"
+ "\\uf900-\\ufdcf\\ufdf0-\\ufffd]"
+ "[A-Z_a-z\\u00C0\\u00D6\\u00D8-\\u00F6\\u00F8-\\u02ff\\u0370-\\u037d"
+ "\\u037f-\\u1fff\\u200c\\u200d\\u2070-\\u218f\\u2c00-\\u2fef\\u3001-\\udfff"
+ "\\uf900-\\ufdcf\\ufdf0-\\ufffd\\-\\.0-9\\u00b7\\u0300-\\u036f\\u203f-\\u2040]*"
+ "(?::[A-Z_a-z\\u00C0\\u00D6\\u00D8-\\u00F6\\u00F8-\\u02ff\\u0370-\\u037d"
+ "\\u037f-\\u1fff\\u200c\\u200d\\u2070-\\u218f\\u2c00-\\u2fef\\u3001-\\udfff"
+ "\\uf900-\\ufdcf\\ufdf0-\\ufffd]"
+ "[A-Z_a-z\\u00C0\\u00D6\\u00D8-\\u00F6\\u00F8-\\u02ff\\u0370-\\u037d"
+ "\\u037f-\\u1fff\\u200c\\u200d\\u2070-\\u218f\\u2c00-\\u2fef\\u3001-\\udfff"
+ "\\uf900-\\ufdcf\\ufdf0-\\ufffd\\-\\.0-9\\u00b7\\u0300-\\u036f\\u203f-\\u2040]*)?\\Z"

(missed another 03gf; changed both to 036f)

4 Comments

Thanks, Does this means that rule number 3 is not right "3. Names cannot start with the letters xml (or XML, or Xml, etc)"
The answer is yes and no. "Names beginning with the string "xml", or with any string which would match (('X'|'x') ('M'|'m') ('L'|'l')), are reserved for standardization in this or future versions of this specification." So it is a valid name, but it is reserved.
According to my understanding of the original production, these regular expressions contain several errors: in all character classes except for the first one,\\udfff is used instead of \\ud7ff and the range \\x10000-\\xEFFFF is missing. This range also needs to be \x{10000}-\x{EFFFF} in Java (missing {}). I unfortunately cannot propose an edit due to too many pending edits.
I think those style escapes may not have worked in the versions of the standard library at the time. Please edit if you've a better way.
2

Using the org.apache.xerces utilities is a good way to go; however, if you need to stick to Java code that's part of the standard Java API then the following code will do it:

public void parse(String xml) throws Exception {

    XMLReader parser = XMLReaderFactory.createXMLReader();
    parser.setContentHandler(new DefaultHandler());
    InputSource source = new InputSource(new ByteArrayInputStream(xml.getBytes()));
    parser.parse(source);
}

1 Comment

But be aware that the overhead of instantiating an XMLReader for this task is rather high, especially if it's done using the JAXP factory search. No problem if it's reused often enough, of course.
2

As a current addition to the accepted answer:

At least Oracle's JDK 1.8 (probably older ones as well) use the Xerces parser internally in the non-public com.sun.* packages. You should never directly use any implementations from those classes as they may change without further notice in future versions of the JDK! However, the required code for the xml element name validity check is very well encapsulated and can be copied out to your own code. This way, you can avoid another dependency to an external library.

This is the required code taken from the internal class com.sun.org.apache.xerces.internal.util.XMLChar:

public class XMLChar {

    /** Character flags. */
    private static final byte[] CHARS = new byte[1 << 16];

    /** Name start character mask. */
    public static final int MASK_NAME_START = 0x04;

    /** Name character mask. */
    public static final int MASK_NAME = 0x08;

    static {
        // Initializing the Character Flag Array
        // Code generated by: XMLCharGenerator.

        CHARS[9] = 35;
        CHARS[10] = 19;
        CHARS[13] = 19;

        // ...
        // the entire static block must be copied
    }

    /**
     * Check to see if a string is a valid Name according to [5]
     * in the XML 1.0 Recommendation
     *
     * @param name string to check
     * @return true if name is a valid Name
     */
    public static boolean isValidName(String name) {
        final int length = name.length();
        if (length == 0) {
            return false;
        }
        char ch = name.charAt(0);
        if (!isNameStart(ch)) {
            return false;
        }
        for (int i = 1; i < length; ++i) {
            ch = name.charAt(i);
            if (!isName(ch)) {
                return false;
            }
        }
        return true;
    }

    /**
     * Returns true if the specified character is a valid name start
     * character as defined by production [5] in the XML 1.0
     * specification.
     *
     * @param c The character to check.
     */
    public static boolean isNameStart(int c) {
        return c < 0x10000 && (CHARS[c] & MASK_NAME_START) != 0;
    }

    /**
     * Returns true if the specified character is a valid name
     * character as defined by production [4] in the XML 1.0
     * specification.
     *
     * @param c The character to check.
     */
    public static boolean isName(int c) {
        return c < 0x10000 && (CHARS[c] & MASK_NAME) != 0;
    }
}

1 Comment

It will not copiled with jdk 11: Error: [ERROR] (package com.sun.org.apache.xerces.internal.util is declared in module java.xml, which does not export it)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.