5

I want to split a large string by a series of words.

E.g.

$splitby = array('these','are','the','words','to','split','by');
$text = 'This is the string which needs to be split by the above words.';

Then the results would be:

$text[0]='This is';
$text[1]='string which needs';
$text[2]='be';
$text[3]='above';
$text[4]='.';

How can I do this? Is preg_split the best way, or is there a more efficient method? I'd like it to be as fast as possible, as I'll be splitting hundreds of MB of files.

1
  • Afternote: racar's answer is the fastest, if array_flip is performed on $splitby and then isset() is used instead of in_array(). preg_split does not work because there are hundreds of words in $splitby. Commented Nov 10, 2011 at 7:05

4 Answers 4

7

This should be reasonably efficient. However you may want to test with some files and report back on the performance.

$splitby = array('these','are','the','words','to','split','by');
$text = 'This is the string which needs to be split by the above words.';
$pattern = '/\s?'.implode($splitby, '\s?|\s?').'\s?/';
$result = preg_split($pattern, $text, -1, PREG_SPLIT_NO_EMPTY);
Sign up to request clarification or add additional context in comments.

2 Comments

Exactly what I wanted. Thank you!
@Alasdair: Glad to help! Note codaddict suggestion of \s* which may be useful if there is possibly more than one space between words in your sample data.
5

preg_split can be used as:

$pieces = preg_split('/'.implode('\s*|\s*',$splitby).'/',$text,-1,PREG_SPLIT_NO_EMPTY);

See it

Comments

4

I don't think using pcre regex is necessary ... if it's really splitting words you need.

You could do something like this and benchmark see if it's faster / better ...

$splitby = array('these','are','the','words','to','split','by');
$text = 'This is the string which needs to be split by the above words.';

$split = explode(' ', $text);
$result = array();
$temp = array();

foreach ($split as $s) {

    if (in_array($s, $splitby)) {
        if (sizeof($temp) > 0) {
           $result[] = implode(' ', $temp);
           $temp = array();
        }            
    } else {
        $temp[] = $s;
    }
}

if (sizeof($temp) > 0) {
    $result[] = implode(' ', $temp);
}

var_dump($result);

/* output

array(4) {
  [0]=>
  string(7) "This is"
  [1]=>
  string(18) "string which needs"
  [2]=>
  string(2) "be"
  [3]=>
  string(5) "above words."
}

The only difference with your output is the last word because "words." != "word" and it's not a split word.

7 Comments

Thank you for your help. Though in_array() is very slow for large arrays, preg_split is much faster.
maybe you're right, but you may get "Compilation failed: regular expression is too large at offset ******" if you use preg_split. I just try with a array of 5490 words and it failed.
Well it turned out that preg_split was taking too long for my liking. See my solution below. Your solution is good, but in_array() function has problems in PHP. A faster way to check for the existence for a value in an array is to array_flip the array and then check for the existence of the key with isset(), which is about 1000x faster than using in_array().
array_flip + isset seems a good idea. But the difference is "only" 30ms for an array of 200k element.
In my experience the difference is seconds vs. hours, literally. I think there's a serious problem with in_array(). Anyway, neither the preg_split nor my method I posted then deleted has achieved what I want. I'm now testing your method modified to use isset().
|
-1

Since the words in your $splitby array are not regular expression maybe you can use

str_split

2 Comments

str_split() cannot separate a string by a string. It merely splits a string up into an array of characters the length of the last argument (which defaults to 1).
This answer doesn't make sense, considering he wants to split the string by the specific words, not split it into word-sized chunks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.