1

I've got a large string that I want to put in an array after each 50 words. I thought about using strsplit to cut, but realised that wont take the words in to consideration, just split when it gets to x char.

I've read about str_word_count but can't work out how to put the two together.

What I've got at the moment is:

$outputArr = str_split($output, 250);

foreach($outputArr as $arOut){

echo $arOut;
echo "<br />";

}

But I want to substitute that to form each item of the array at 50 words instead of 250 characters.

Any help will be much appreciated.

2
  • 1
    off-topic: your coming soon page is amazing =) Commented Aug 29, 2012 at 10:45
  • related stackoverflow.com/questions/790596/… Commented Aug 29, 2012 at 10:45

2 Answers 2

2

Assuming that str_word_count is sufficient for your needs¹, you can simply call it with 1 as the second parameter and then use array_chunk to group the words in groups of 50:

$words = str_word_count($string, 1);
$chunks = array_chunk($words, 50);

You now have an array of arrays; to join every 50 words together and make it an array of strings you can use

foreach ($chunks as &$chunk) { // important: iterate by reference!
    $chunk = implode(' ', $chunk);
}

¹ Most probably it is not. If you want to get what most humans consider acceptable results when processing written language you will have to use preg_split with some suitable regular expression instead.

Sign up to request clarification or add additional context in comments.

5 Comments

What about words separated by something that is not ` `?
@raina77ow: You would have to provide an ironclad definition of "word" first. Then, preg_split.
No, that is another question. For example, the previous sentence will be reconstructed as No that is another question - if , is not considered a part of word No, at least.
@raina77ow: That's because of how str_word_count processes text. I explicitly mention that it's unlikely to be sufficient for written language. Regexes are going to be ugly (e.g. see this, and even that may not be good enough).
If you want punctuation to be included then you can pass a string of punctuation as a third parameter. E.g. str_word_count($string, 1, ',!?.;:');
0

There's another way:

<?php

$someBigString = <<<SAMPLE
  This, actually, is a nice' old'er string, as they said, "divided and conquered".
SAMPLE;

// change this to whatever you need to:     
$number_of_words = 7; 

$arr = preg_split("#([a-z]+[a-z'-]*(?<!['-]))#i", 
  $someBigString, $number_of_words + 1, PREG_SPLIT_DELIM_CAPTURE);

$res = implode('', array_slice($arr, 0, $number_of_words * 2));
echo $res;

Demo.

I consider preg_split a better tool (than str_word_count) here. Not because the latter is inflexible (it is not: you can define what symbols can make up a word with its third param), but because preg_split will essentially stop processing the string after getting N items.

The trick, as quite common with this function, is to capture delimiters as well, then use them to reconstruct the string with the first N words (where N is given) AND punctuation marks saved.

(of course, the regex used in my example does not strictly comply to str_word_count locale-dependent behavior. But it still restricts the words to consist of alpha, ' and - symbols, with the latter two not at the beginning and the end of any word).

1 Comment

If I misunderstood your question, and what you actually need is splitting the string by 50 words, this solution can be used too - but the main reason to use preg_split will be lost. ) So use Jon's solution instead.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.