0

Problem:

Extracting information from text file using PHP based on a structure that is as following:

  • Date (in the format YYYY-MM-DD)
  • Title
  • Text: value
  • Text: value
  • Text: value

Input:

2015-03-18
 Store A
Text 1: 5,00 USD
Text 2: 2015-03-18
Text 3: 2015-03-12
 Store B
Text 1: 10,00 USD
Text 2: 2015-03-18
Text 3: 2015-03-12
 Store C
Text 1: 15,00 USD
Text 2: 2015-03-18
Text 3: 2015-03-12
2015-03-19
 Store D
Text 1: 20,00 USD
Text 2: 2015-03-18
Text 3: 2015-03-12

PHP Code (so far):

<?php
    // Creates array to store data from textfile
    $data       = array();

    // Opens text file
    $text_file  = fopen('data.txt', 'r');

    // Loops through each line
    while ($line = fgets($text_file))
    {
        // Checks whether line is a date
        if (preg_match("/^[0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|[1-2][0-9]|3[0-1])$/", trim($line)))
        {
            $data[$line] = array();
        }
        else
        {
            $data[] = trim($line);
        }
    }

    // Removes first array key
    $data = array_slice($data, 1);

    // Prints out full array
    echo "<xmp>" . print_r($data, true) . "</xmp>";
 ?>

HTML Code:

<table border="1">
  <tr>
    <th>Date</th>
    <th>Store</th>
    <th>Text 1</th>
    <th>Text 2</th>
    <th>Text 3</th>
  </tr>
  <tr>
    <td>2015-03-18</td>
    <td>Store A</td>
    <td>5,00 USD</td>
    <td>2015-03-18</td>
    <td>2015-03-12</td>
  </tr>
  <tr>
    <td></td>
    <td>Store B</td>
    <td>10,00 USD</td>
    <td>2015-03-18</td>
    <td>2015-03-12</td>
  </tr>
  <tr>
    <td></td>
    <td>Store C</td>
    <td>15,00 USD</td>
    <td>2015-03-18</td>
    <td>2015-03-12</td>
  </tr>
  <tr>
    <td>2015-03-19</td>
    <td>Store D</td>
    <td>20,00 USD</td>
    <td>2015-03-18</td>
    <td>2015-03-12</td>
  </tr>
</table>

Desired output:

enter image description here

Questions:

  1. What is the appropriate way to extract and store the different values?
  2. What is the appropriate way to print out the information as the output example?
17
  • 3
    What is the question? Commented Mar 18, 2015 at 11:52
  • @kexxcream are you able to match the dates OK? If so, the output is not that hard. Can you share what you have tried? Commented Mar 18, 2015 at 12:21
  • @JayBlanchard The dates work fine. I have been able to add it to the array. What I am stuck with is how to detect what kind of information is in the different rows. I have updated the code. Commented Mar 18, 2015 at 12:26
  • 1
    Your welcome :) - Any queries then feel free to post comments. I will answer them. Commented Mar 18, 2015 at 18:11
  • 1
    drat - there is a failure if the last group in the file is the one skipped. No more edits to this answer. New, complete code can be found here: Pastebin - questions 29121286. Commented Mar 19, 2015 at 10:34

1 Answer 1

1

I am interested in the 'groups' of records in the source file.

Date group - indicated by the a line with just a date on it

  • Store Group - consists of..
  • store name
  • price
  • a group of dates

Added Requirement: print out only store groups that is current date and forward? I will call this the 'cutoff_date' in the code.

I use a 'read-ahead' technique so there is always a record to process

I supply functions to help 'identify things'. They are used so it is easier to see the controlling' logic.

The code:

<?php // https://stackoverflow.com/questions/29121286/extract-information-from-text-file-using-php

/**
 * We need to only show store entries on or after a certain date
 * i call this the 'cutoff_date'.
 *
 * It will default to todays date
 */
$now = new DateTime();
$CUTOFF_DATE = $now->format('Y-m-d');

// output stored in here
$outHtml = '<table border="1">
  <tr>
    <th>Date</th>
    <th>Store</th>
    <th>Text 1</th>
    <th>Text 2</th>
    <th>Text 3</th>
  </tr>';


// source - we use 'read-ahead' as it makes life easier
$sourceFile = fopen(__DIR__ . '/Q29121286.txt', 'rb');

$currentLine = readNextLine($sourceFile); // read-ahead

while (!empty($currentLine)) { // process until eof...

    // start of a date group...
    $currentGroupDate = $currentLine; // ignore this group if less than CUTOFF_DATE
    $currentLine = readNextLine($sourceFile); // read ahead

    while (!empty($currentGroupDate) && $currentGroupDate < $CUTOFF_DATE) { // find next date_group record
        while (!empty($currentLine) && datePosition($currentLine) !== 0) { // read to end of current group
            $currentLine = readNextLine($sourceFile);
        }
        $currentGroupDate = $currentLine;
        $currentLine = readNextLine($sourceFile); // read ahead
   }

    $htmlCurrentDate = $currentGroupDate; // only print the date once

    $html = '';
    // display all the rows for this 'date group' -- look for next 'date'
    while (!empty($currentLine) && datePosition($currentLine) !== 0) {

        $html = '<tr>';

        $html .= '<td>'. $htmlCurrentDate .'</td>';
        $htmlCurrentDate = ''; // only display the date once

        $html .= '<td>'. $currentLine .'</td>'; // store
        $currentLine = readNextLine($sourceFile);

        // process the price
         $lineParts = explode(':', $currentLine); // need the price...
         $html .= '<td>'. $lineParts[1] .'</td>';
         $currentLine = readNextLine($sourceFile);

        // now process the group of dates - look for a line
        // that starts with 'text' and must contain a date
        while (   !empty($currentLine)
                && isTextLine($currentLine)
                && datePosition($currentLine) >= 1) {

            $lineParts = explode(':', $currentLine); // need the date...
            $html .= '<td>'. $lineParts[1] .'</td>';
            $currentLine = readNextLine($sourceFile); // read next
        }

        // end of this group...
        $html .= '</tr>';

        $outHtml .= $html;

    } // end of 'dateGroup'
} // end of data file...

$outHtml .= '</table>';
fclose($sourceFile);


// display output
echo $outHtml;
exit;

/**
 * These routines hide the low-level processing;
 */

/**
 * Return position of date string - will be -1 if not found
 * @param type $line
 * @return integer
 */
function datePosition($line)
{
    $result = preg_match("/\d{4}-\d{2}-\d{2}/", $line, $matches, PREG_OFFSET_CAPTURE);
    $pos = -1;
    if (!empty($matches)) {
        $match = current($matches);
        $pos = $match[1];
    }
    return $pos;
}

/**
 * return whether line is a text line
 *
 * @param type $text
 * @return type
 */
function isTextLine($text)
{
    return strpos(strtolower($text), 'text') === 0;
}

/**
 * return trimmed string or an empty string at eof
 * Added 'fudge' to not read passed the eof - ;-/
 * @param type $handle
 * @return string
 */
function readNextLine($handle)
{
    static $isEOF = false;

    if ($isEOF) {
        return '';
    }

    $line = fgets($handle);
    if ($line !== false) {
        $line = trim($line);
    }
    else {
        $isEOF = true;
        $line = '';
    }
    return $line;
}

Original output from the supplied file:

| Date       | Store   | Text 1    | Text 2     | Text 3     |
|------------|---------|-----------|------------|------------|
| 2015-03-18 | Store A | 5,00 USD  | 2015-03-18 | 2015-03-12 |
|            | Store B | 10,00 USD | 2015-03-18 | 2015-03-12 |
|            | Store C | 15,00 USD | 2015-03-18 | 2015-03-12 |
| 2015-03-19 | Store D | 20,00 USD | 2015-03-18 | 2015-03-12 |
Sign up to request clarification or add additional context in comments.

1 Comment

You nailed it, and saved the day. Thanks a lot! Now it's time for me to go through the code extensively and learn what you have done.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.