I am using NSXMLParser to parse HTML from web sites. Testing site is under my control but in operation sites will not be.
Problem is when parser encounters javascript which contains "bad" characters. For example, javascript containing if(screen.width<=521). The problem is the < in the code. I can see the problem but am unsure if there is any good way round it. (the NSXMLParser is reporting NSXMLParserErrorDomain error 68. and I can see why - it is treating the <= as the start of a new tag but = is not a valid tag name character...). But then what would I do with e.g. if(var<20) ?
I actually not interested in the specific content so could do things like global replace/removal of e.g. "<=" and ">=" (etc.) but in some regards that seems a bit of a mess as I was using NSXMLParser to avoid having to start messing around with the content. If substitution is the best way forward, I can envisage "<=" and ">=" but any other sequences I should include ?
I am new to Cocoa so may easily have missed something obvious - in which case many apologies. I did see that others have found similar problems but could not get a good way forward from the questions.
I am handling the error OK (in a tidy manner) but it is preventing my app from doing what it is meant to do - i.e. I need to avoid the error rather than handle it.
Background: that application is doing a "before" and "after" comparison on the html and looking for changes. I could swap "<=" for something really weird, then swap it back when necessary. I could even check the data for the replace content first to eliminate possible ambiguities (e.g. find a UID sequence not in the downloaded page, replace "<=" with UID sequence, parse page, if need be, replace UID with "<=", ditto for ">=".
(I have looked at e.g. libtidy of libxml2 but cannot find easy documentation and am wary about launching down such a route if it will not solve the issues.)