3

how to encode/decode escape sequence character '\x13' in python into a character that is valid in a RSS or XML.

use case is, I am getting data from arbitrary sources and making a RSS feed for that data. The data source sometimes have escape sequence character which is breaking my RSS feed.

So how can I sanitize the input data with escape sequence character.

1 Answer 1

2

\x13 (ASCII 19, ‘DC3’) can't be escaped; it is invalid in XML 1.0, period. You can include one, encoded as &#19; or &#x13; in XML 1.1, but then you have to include the <?xml version="1.1"?> declaration and many tools won't like it.

I've no idea why that character would be included in your data, but the way forward is probably to completely remove control codes. For example:

re.sub('[\x00-\x08\x0B-\x1F]', '', s)

For some kinds of escape sequence (eg. ANSI colour codes) you might get stray (non-control) characters still in there, in which case you'd probably want a custom parser for that particular format.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.