2

I need to parse a text file into php array. Here is my text file :

file: slide1.jpg    | title: Title here                     | description: Aenean eleifend ultrices leo at venenatis. Suspendisse luctus    | crop: top
file: slide2.jpg    | description: Phasellus ac tortor ut dolor blandit tincidunt   | title: Nullam cursus                                  | crop: bottom
file: slide3.jpg    | title: Hendrerit lacinia nisl         | description: Tortor ut dolor blandit tincidunt                                | crop: bottom
file: slide4.jpg    | title: Morbi hendrerit lacinia nisl   | description: Maecenas venenatis lectus vitae                                  | crop: left

I want to parse it into such structured array :

array(4) {
  "slide1.jpg" => array (
    "title"  => "Title here",
    "description"  => "Aenean eleifend ultrices leo at venenatis. Suspendisse luctus",
    "crop"  => "top"
  ),
  "slide2.jpg" => array (
    "title"  => "Nullam cursus",
    "description"  => "Phasellus ac tortor ut dolor blandit tincidunt",
    "crop"  => "top"
  ),
  "slide3.jpg" => array (
    "title"  => "Hendrerit lacinia nisl",
    "description"  => "Tortor ut dolor blandit tincidunt",
    "crop"  => "top"
  ),
  "slide4.jpg" => array (
    "title"  => "Morbi hendrerit lacinia nisl",
    "description"  => "Maecenas venenatis lectus vitae",
    "crop"  => "top"
  )
}

I tried with a many repetitive foreach statements but it was not so efficient and the code got very lengthy. Does anybody know a way to achieve it simpler.

5
  • 1
    The content in the text file is not formatted as a CSV, and the "columns" do not line up. Are these text files generated somehow? Commented Mar 3, 2014 at 16:22
  • 3
    That's not a CSV file. At least, it's maybe a |SV. Commented Mar 3, 2014 at 16:22
  • 1
    A csv file could be delimited by anything, it doesnt just have to be a comma. I used | as delimiter because a comma is a common text character. It is manually edited / maintained for easy structure. The new line defines a new item, pipes separate the properties. Commented Mar 3, 2014 at 16:26
  • 2
    If you're generating that yourself, you should use a format that's more amendable to parsing. e.g. xml, json, etc... especially since you're putting headerse/metadata into each of those fields that'd just need to be stripped out anyways. Commented Mar 3, 2014 at 16:34
  • @Cruising2hell not true. The CSV format is well defined here: tools.ietf.org/html/rfc4180 It is true that "in the wild" there are variants. But they all follow this spec fairly closely. Your's does not! Commented Mar 3, 2014 at 16:53

3 Answers 3

4

First of all: Be Careful!

This is potentially hairy thing with many possible exceptions. The solution I provide does:

  • ... not use regexes, which should make the code more readable, maintainable, yada yada yada :)
  • ... not check if a value contains pipes |, which will trip up this thing. On the other hand, a value may safely contain colons.
  • ... not deal with multi-byte characters.
  • ... not care about performance.
  • ... assume the key "file" is always present.
  • ... not insert missing keys, which should be dealt elsewhere in that case.

Take these notes into consideration before blindly copy/pasting! ;)

In addition, my solution contains the file-name in each element, which is redundant. But removing it would have made the solution messier without much gained value.

Here's a solution:

<?php

/**
* Parse a line of the file. Returns an associative array, using the part 
* before the colon as key, the following part as value.
*
* @param $line A line of text.
*/
function parse_line($line) {
  // split on each '|' character.
  $fields = explode('|', $line);
  $data = array();
  foreach($fields as $field) {
    // unpack key/value from each 'key: value' text. This will only split on 
    // the first ":", so the value may contain colons.
    list($key, $value) = explode(':', $field, 2);
    // remove surrounding white-space.
    $key = trim($key);
    $value = trim($value);
    $data[$key] = $value;
  }
  return $data;
}


/**
* Parses a file in the specified format.
*
* Returns an associative array, where the key is a filename, and the value is 
* an associative array of metadata.
*
* @param $fname The filename
*/
function parse_file($fname) {
  $handle = fopen($fname, "r");
  $lines = array();
  if ($handle) {
    while (($line = fgets($handle)) !== false) {
      $data = parse_line($line);
      $lines[$data["file"]] = $data;
    }
  } else {
    // error opening the file.
  }
  return $lines;
}

var_dump(parse_file("testdata.txt"));
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much. This is closest to what I need.
I can only stress the note of @MarcB. If you have control over the process which generates that file, use a parsable format like JSON or XML. These formats deal with all strange corner-cases you may come across. Especially when the file contains unexpected input!
1

The following should do the trick.

$rows = array();

foreach (preg_split('#\n#', file_get_contents('blah.txt')) as $line) {
  if (preg_match_all('#([^"|]+)\s*:\s*([^|]+)#', $line, $parts)) {
    $properties = array_map('trim', $parts[1]);
    $values = array_map('trim', $parts[2]);

    assert(count($properties) == count($values));

    $row = array();
    foreach ($properties as $index => $propertyName) {
      $row[$propertyName] = $values[$index];
    }
    $rows[] = $row;
  }
}

var_dump($rows);

3 Comments

interesting, it didnt run though. Can you please define the $parts. I think it might be running into an error.
Parts is not necessary to define since it'll get populated when run. If needed, you can create it before the if as $parts = array()
A patttern like #\n# effectively doesn't require regex -- it is a literal newline character. Just explode on it. If you want to accommodate different newline sequences across all operating systems, use \R when splitting.
-1

Try:

$new_array = array();
while (($data = fgetcsv($csvfile, 1000, ";")) !== FALSE) {
    $new_array[$data[0]] = array('title' => $data[1], 'description' => $data[2], 'crop' => $data[3]);
}

var_dump($new_array);

1 Comment

Thanks, but this seems to be going into a forever loop.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.