How to extract data from a .list file with node.js

Question

I have a .list file containing information on movies. The file is formatted as follows

New  Distribution  Votes  Rank  Title
      0000000125  1176527   9.2  The Shawshank Redemption (1994)
      0000000125  817264   9.2  The Godfather (1972)
      0000000124  538216   9.0  The Godfather: Part II (1974)
      0000000124  1142277   8.9  The Dark Knight (2008)
      0000000124  906356   8.9  Pulp Fiction (1994)

The code I have so far is as follows:

//modules ill be using
var fs = require('fs');
var csv = require('csv');

csv().from.path('files/info.txt', { delimiter: '  '})
.to.array(function(data){
    console.log(data);
});

But because the values are separated by single spaces, double spaces and tabs. There is no single delimiter to use. How can I extract this information into an array?

This list file is auto-generated or, you have manualy created it? — Hüseyin BABAL
– Hüseyin BABAL, Commented Mar 20, 2014 at 13:02
auto generated, its the imdb one found ftp.fu-berlin.de/pub/misc/movies/database — wazzaday
– wazzaday, Commented Mar 20, 2014 at 13:02

Hüseyin BABAL · Accepted Answer · 2014-03-20 13:11:34Z

3

You can shrink multiple spaces in to one space with and then you can read it as string like;

fs = require('fs')
fs.readFile('files/info.txt', 'utf8', function (err, csvdata) {
  if (err) {
    return console.log(err);
  }
  var movies = csvdata.replace(/\s+/g, "\t");

  csv().from.string(moviews, { delimiter: '\t'})
    .to.array(function(data){
        console.log(data);
    });

});

answered Mar 20, 2014 at 13:11

Hüseyin BABAL

15.6k5 gold badges53 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

José F. Romaniello Over a year ago

I think multiple spaces to one tab will be better, otherwise "The Shawshank Redemption (1994)" will be parsed as four fields.

wazzaday Over a year ago

I decided to separate with commas on two or more spaces data.replace(/\s{2,}/g, ",") - thanks for the response :)

Hüseyin BABAL Over a year ago

Good to hear that, an upvote would be appreciated :)

José F. Romaniello · Accepted Answer · 2014-03-20 13:07:31Z

0

It looks easy to parse with regex:

function parse(row) {
  var match = row.match(/\s{6}(\d*)\s{2}(\d*)\s{3}(\d*\.\d)/)
  return {
    distribution: match[1],
    votes: match[2],
    rank: match[3]
  };
}

fs.readFileSync(file)
  .split('\n')
  .slice(1) //since we don't care about the first row
  .map(parse);

I will live you to build the rest of the regex. I juse two tools to do so: rubular.com and node.js repl.

This \s{6}(\d*)\s{2}(\d*) means: MATCH 6 SPACEs, then capture an arbitrary number of digits then match 2 spaces, then capture another arbitrary number of digits, etc.

answered Mar 20, 2014 at 13:07

José F. Romaniello

14.2k3 gold badges38 silver badges38 bronze badges

Collectives™ on Stack Overflow

How to extract data from a .list file with node.js

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related