0

I'd like to split a string if:

  • It doesn't starts with a quatre or a soixante AND
  • It doesn't ends with a dix or a vingt

For example:

'deux-cent-quatre-vingt-trois'.split(/**/);
> ['deux', 'cent', 'quatre-vingt', 'trois' ]

I've had a few tries and failures, for example:

'deux-cent-quatre-vingt-dix-trois'
          .split(/^(?![quatre|soixante]-[dix|vingt])(\w*)-(\w*)/);
> [ '', 'deux', 'cent', '-quatre-vingt-trois' ]

or:

'deux-cent-quatre-vingt-dix-trois'.split(/(?!quatre|soixante)-(?!vingt|dix)/);
> [ 'deux' 'cent', 'quatre-vingt', 'trois' ]

which works, but this does not:

'cent-vingt'.split(/(?!quatre|soixante)-(?!vingt|dix)/);
> [ 'cent-vingt' ]

I know using a matcher or a find would be so easy, but it would be great to do it in a single split...

4
  • I assume you want && between your points.. Commented Jul 29, 2013 at 17:25
  • Yeah, sorry, I didn't understand what you meant at first. Commented Jul 29, 2013 at 17:30
  • Hmm... It seems like javascript doesn't support lookbehinds.. Are you trying to separate all the french numbers from zero to quatre-vingt-dix-neuf + cent? Commented Jul 29, 2013 at 17:37
  • Yes, more or less. It's for a module I've been working on, readint. It's a written number parser. I'm trying to keep it as simple as possible to make the translating process easier. I only do one split, to identify and mark number tokens. Commented Jul 29, 2013 at 17:40

1 Answer 1

1

You can do it like this:

var text = "deux-cent-quatre-vingt-trois";

console.log(text.split(/(?:^|-)(quatre-vingt(?:-dix|s$)?|soixante-dix|[^-]+)/));

The idea is to add a capturing group whose content is added to the split list.

The capturing group contains at first particular cases and after the most general, described with [^-]+ (all that is not a -)

Notice: since quatre-vingt is written with a s when it is not followed by a number, i added s$ as a possibility.

Sign up to request clarification or add additional context in comments.

2 Comments

suppose one needs to prevent the '-' followed by 'quatre' alone from getting removed in that sentence, what is the regex for that? If possible please let me know, I am trying to understand the answer.
Either you capture quatre-vingt (or quatre-vingts, or quatre-vingt-dix), or you capture soixante-dix, or you capture whatever you find until an hyphen. That last case is what separates the elements, by capturing everything but the trailing hyphen.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.