3

I've got some string that contain invisible characters, but they are in somewhat predictable places. Typically the surround the piece of text I want to extract, and then after the 2nd occurrence I want to keep the rest of the text.

I can't seem to figure out how to both key off of the invisible characters, and exclude them from my result. To match invisibles I've been using this regex: /\xA0\x00-\x09\x0B\x0C\x0E-\x1F\x7F/ which does seem to work.

Here's an example: [invisibles]Keep as match 1[invisibles]Keep as match 2

Here's what I've been using so far without success:

/([\xA0\x00-\x09\x0B\x0C\x0E-\x1F\x7F]+)(.+)([\xA0\x00-\x09\x0B\x0C\x0E-\x1F\x7F]+)/(.+)

I've got the capture groups in there, but it's bee a while since I've had to use regex's in this way, so I know I'm missing something important. I was hoping to just make the invisible matches non-capturing groups, but it seems that JavaScript does not support this.

8
  • 1
    A non-capturing group would be (?:something) rather than (something). Could you show also a little bit of the JS that uses that regex? Why is the closing / not quite at the end of your regex? (See also MDN's regex page.) Commented Apr 25, 2013 at 3:01
  • 1
    "invisible characters" you mean like Sue from the fantastic four? No? How about defining that a little more clearly then? Commented Apr 25, 2013 at 3:02
  • How are you actually getting the data from the match? If you are reading only the two capture groups, then it should be fine. What is that second (.+) outside of your second regex though? Commented Apr 25, 2013 at 3:04
  • In your case, there's no reason to use a non-capturing group--regex characters are non-capturing by default. Just capture the ones you want. Commented Apr 25, 2013 at 3:07
  • 1
    @7stud No - invisible characters aren't a standard term. Does the questioner mean whitespace? If so he can just use '\s' Commented Apr 25, 2013 at 3:35

1 Answer 1

1

Something like this seems like what you want. The second regex you have pretty much works, but the / is in totally the wrong place. Perhaps you weren't properly reading out the group data.

var s = "\x0EKeep as match 1\x0EKeep as match 2";
var r = /[\xA0\x00-\x09\x0B\x0C\x0E-\x1F\x7F]+(.+)[\xA0\x00-\x09\x0B\x0C\x0E-\x1F\x7F]+(.+)/;

var match = s.match(r);

var part1 = match[1];
var part2 = match[2];
Sign up to request clarification or add additional context in comments.

5 Comments

just to be clear. There are invisible characters between all characters right. So in string "abc" there are 3 invisible characters. like this I=specialcharacter. IaIbIc
@MuhammadUmer I'm not sure I 100% follow, are you asking if every string automatically has invisible characters between all the visible characters? If so, the answer is no, that is not the case. If you have a specific question, please create a new one.
@MuhammadUmer You can write a regex that can match the space between characters, but that does not mean there is an invisible character between each one.
wow that is confusing but thanks...so if get a match between ab what exactly i am matching with.
@MuhammadUmer Please open a new question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.