-1

I have this string:

Miami, Florida

I would like to find a regex to help defect to see if this string contains ASCII code.

I have tried these regex \\p{ASCII}, ^[\\u0000-\\u007F]*$, ^\p{ASCII}*$, \A\p{ASCII}*\z, and /^[\\x00-\\x7F]+$/ but none are working for me.

Ideally, the regex would return (examples):

  • Miami, Florida - true
  • Miami, Florida - false
  • Miami Florida - false
  • Miami,Florida - false

Once I have detected the string has ASCII code, how would I be able to convert the string to Miami, Florida?

7
  • 1
    Do you mean non-ASCII characters? And do you mean HTML encoded special sequences? (Edit: or are you just specifying where the non-ASCII is by using an HTML encoded sequence?) Commented Aug 23, 2024 at 3:30
  • Sorry @markspace, I am not familiar with these terminologies and still learning.. Commented Aug 23, 2024 at 3:39
  • 2
    HTML encoded sequences: mateam.net/html-escape-characters Commented Aug 23, 2024 at 3:41
  • 2
    All those strings are ASCII; you seem to want to detect the HTML entity character escape (which can encode characters not in ASCII!). Just use a library that provides supported for unescaping them (e.g. Apache Commons Text) Commented Aug 23, 2024 at 8:05
  • 1
    Related, possibly duplicate: How can I unescape HTML character entities in Java? Commented Aug 23, 2024 at 8:08

2 Answers 2

3

To detect and replace ASCII code entities in a string using Java, you can use the regular expression pattern &#\d+; to find such entities and then replace them with their corresponding characters.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class ASCIICodeConverter {
    public static void main(String[] args) {
        String input = "Miami, Florida";
        String regex = "&#(\\d+);";

        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(input);

        boolean containsASCII = matcher.find();
        System.out.println("Contains ASCII codes: " + containsASCII);
        StringBuffer convertedString = new StringBuffer();
        matcher.reset();
        while (matcher.find()) {
            int asciiCode = Integer.parseInt(matcher.group(1));
            matcher.appendReplacement(convertedString, Character.toString((char) asciiCode));
        }
        matcher.appendTail(convertedString);
        System.out.println("Converted string: " + convertedString.toString());
    }
}

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

1

After some digging I found this: https://stackoverflow.com/a/13975581/14342895 and https://stackoverflow.com/a/38031991

Combining these two I was able to fix my problem with this code:

public static String isEscapeCodePresent(String attribute) {
    return (isValidHtmlEscapeCode(attribute)?
             HtmlUtils.htmlUnescape(attribute):attribute);
}

public static boolean isValidHtmlEscapeCode(String string) {
    if (string == null) {
        return false;
    }
    Pattern p = Pattern
            .compile("&(?:#x([0-9a-fA-F]+)|#([0-9]+)|([0-9A-Za-z]+));");
    Matcher m = p.matcher(string);

    if (m.find()) {
        int codePoint = -1;
        String entity = null;
        try {
            if ((entity = m.group(1)) != null) {
                if (entity.length() > 6) {
                    return false;
                }
                codePoint = Integer.parseInt(entity, 16);
            } else if ((entity = m.group(2)) != null) {
                if (entity.length() > 7) {
                    return false;
                }
                codePoint = Integer.parseInt(entity, 10);
            }
            return 0x00 <= codePoint && codePoint < 0xd800
                    || 0xdfff < codePoint && codePoint <= 0x10FFFF;
        } catch (NumberFormatException e) {
            return false;
        }
    } else {
        return false;
    }
}

Although, @Sarath Molathoti answer is correct and useful, the answer below is something I was more looking for.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.