1

I have this code that performs word ranking from text files. It opens the file and outputs an array of how many times every word within the file appears. This part works well, but on the second part, the code is to then look through every other text file within the given folder and output how many times every word appears as a total from all the files. The issue is the output array is not the merged total. There are repetitions. For instance, I get -

the -- 2
quick -- 1
brown -- 1
fox -- 1
jumped -- 1
over -- 1
lazy -- 1
dog -- 1
dog -- 2
a -- 2
lazy -- 1
fox -- 1
cannot -- 1
catch -- 1
fast -- 1
the -- 1
may -- 1
be -- 1

Instead of -

the -- 3
dog -- 3
fox -- 2
lazy -- 2
a -- 2
quick -- 1
brown -- 1
jumped -- 1
over -- 1
very -- 1
cannot -- 1
catch -- 1
fast -- 1
may -- 1
be -- 1

This is the entire code-

<?php
echo "<h3>Word Rank From One File</h3>";
$counted = strtolower(file_get_contents("docs/one.txt"));
$wordArray = preg_split('/[^a-z]/', $counted, -1, PREG_SPLIT_NO_EMPTY);
$wordFrequencyArray = array_count_values($wordArray);

/* Sort array from higher to lower, keeping keys */
arsort($wordFrequencyArray);

/* grab Top 10, huh sorted? */
$top10words = array_slice($wordFrequencyArray,0,10);

/* display them */
foreach ($top10words as $topWord => $frequency)
    echo "$topWord --  $frequency<br/>";

echo "<h3>Total From All Files</h3>";
$path = realpath('docs');
foreach(glob($path.'/*.*') as $file) {
    $counted = strtolower(file_get_contents($file));
    $wordArray = preg_split('/[^a-z]/', $counted, -1, PREG_SPLIT_NO_EMPTY);
    $wordFrequencyArray = array_count_values($wordArray);
    $combine = array_merge($wordFrequencyArray);
    /* Sort array from higher to lower, keeping keys */
    arsort($wordFrequencyArray);

    /* grab Top 10, huh sorted? */
    $top10words = array_slice($wordFrequencyArray,0,10);

    /* display them */
    foreach ($top10words as $topWord => $frequency)
        echo "$topWord --  $frequency<br/>";
    }

?>

What am I doing wrong or not doing? The two sample text files have;

The quick brown fox jumped over the lazy dog. The dog that the fox jumped ran so fast afterwards.

and

A lazy fox cannot catch a fast dog. The dog may be very quick. I noticed too that some words have been skipped.

2
  • array_merge accepts one or more arrays for merging, what are you doing with the $combine = array_merge($wordFrequencyArray) ? Commented Jun 24, 2017 at 12:35
  • I'm trying to merge the output arrays from the text files that have been read. Like merging the output from one.txt and two.txt . So I'm assuming $wordFrequencyArray contains the two different outputs. Commented Jun 24, 2017 at 12:40

1 Answer 1

1

You must aggregate all words from your files, and then count its frequencies.

$wordArrayTotal = [];
foreach (glob($path.'/*.*') as $file) {
    $counted = strtolower(file_get_contents($file));
    $wordArray = preg_split('/[^a-z]/', $counted, -1, PREG_SPLIT_NO_EMPTY);
    $wordArrayTotal = array_merge($wordArrayTotal, $wordArray);
}

$wordFrequencyArray = array_count_values($wordArrayTotal);

/* Sort array from higher to lower, keeping keys */
arsort($wordFrequencyArray);

/* grab Top 10, huh sorted? */
$top10words = array_slice($wordFrequencyArray, 0, 10);

/* display them */
foreach ($top10words as $topWord => $frequency) {
    echo "$topWord --  $frequency<br/>";
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.