19

I love how in python I can do something like:

points = []
for line in open("data.txt"):
    a,b,c = map(float, line.split(','))
    points += [(a,b,c)]

Basically it's reading a list of lines where each one represents a point in 3D space, the point is represented as three numbers separated by commas

How can this be done in C++ without too much headache?

Performance is not very important, this parsing only happens one time, so simplicity is more important.

P.S. I know it sounds like a newbie question, but believe me I've written a lexer in D (pretty much like C++) which involves reading some text char by char and recognizing tokens,
it's just that, coming back to C++ after a long period of python, just makes me not wanna waste my time on such things.

1
  • 19
    How about some of the examples from the following, they are somewhat python-esq: codeproject.com/KB/recipes/Tokenizer.aspx Furthermore they are very efficient and somewhat elegant. Commented Nov 4, 2010 at 2:04

10 Answers 10

24

I`d do something like this:

ifstream f("data.txt");
string str;
while (getline(f, str)) {
    Point p;
    sscanf(str.c_str(), "%f, %f, %f\n", &p.x, &p.y, &p.z); 
    points.push_back(p);
}

x,y,z must be floats.

And include:

#include <iostream>
#include <fstream>
Sign up to request clarification or add additional context in comments.

2 Comments

If you decide to change from using floats to using doubles, don't forget to change each %f to %lf. A solution using operator>>() instead of sscanf() doesn't need to be changed in this case.
I accepted this answer for brevity and straight-forwardness :)
19

The C++ String Toolkit Library (StrTk) has the following solution to your problem:

#include <string>
#include <deque>
#include "strtk.hpp"

struct point { double x,y,z; }

int main()
{
   std::deque<point> points;
   point p;
   strtk::for_each_line("data.txt",
                        [&points,&p](const std::string& str)
                        {
                           strtk::parse(str,",",p.x,p.y,p.z);
                           points.push_back(p);
                        });
   return 0;
}

More examples can be found Here

Comments

17

All these good examples aside, in C++ you would normally override the operator >> for your point type to achieve something like this:

point p;
while (file >> p)
    points.push_back(p);

or even:

copy(
    istream_iterator<point>(file),
    istream_iterator<point>(),
    back_inserter(points)
);

The relevant implementation of the operator could look very much like the code by j_random_hacker.

1 Comment

This is definitely the way to do it if you will input Point objects in several different places in your code.
14
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <algorithm>     // For replace()

using namespace std;

struct Point {
    double a, b, c;
};

int main(int argc, char **argv) {
    vector<Point> points;

    ifstream f("data.txt");

    string str;
    while (getline(f, str)) {
        replace(str.begin(), str.end(), ',', ' ');
        istringstream iss(str);
        Point p;
        iss >> p.a >> p.b >> p.c;
        points.push_back(p);
    }

    // Do something with points...

    return 0;
}

1 Comment

@Iraimbilanja: Although I traverse the string twice (first using replace(), then via iss), I suspect this is at least as fast in practice as the other solutions, with the possible exception of klew's sscanf()-based approach. CPUs are good at replace().
7

This answer is based on the previous answer by j_random_hacker and makes use of Boost Spirit.

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <boost/spirit.hpp>

using namespace std;
using namespace boost;
using namespace boost::spirit;

struct Point {
    double a, b, c;
};

int main(int argc, char **argv) 
{
    vector<Point> points;

    ifstream f("data.txt");

    string str;
    Point p;
    rule<> point_p = 
           double_p[assign_a(p.a)] >> ',' 
        >> double_p[assign_a(p.b)] >> ',' 
        >> double_p[assign_a(p.c)] ; 

    while (getline(f, str)) 
    {
        parse( str, point_p, space_p );
        points.push_back(p);
    }

    // Do something with points...

    return 0;
}

2 Comments

Maybe because using boost::spirit to parse comma separated lists is an overkill? Boost::spirit significantly affects compile time.
Maybe because of the fact that you're instantiating the rule inside the loop, typically this would be a huge source of inefficiency, you'd be better off having it defined outside of the loop. - Spirit is overkill, adds huge amount of compilation time, and is nearly impossible to debug, compiler warnings and error messages are simply incomprehensible.
4

Fun with Boost.Tuples:

#include <boost/tuple/tuple_io.hpp>
#include <vector>
#include <fstream>
#include <iostream>
#include <algorithm>

int main() {
    using namespace boost::tuples;
    typedef boost::tuple<float,float,float> PointT;

    std::ifstream f("input.txt");
    f >> set_open(' ') >> set_close(' ') >> set_delimiter(',');

    std::vector<PointT> v;

    std::copy(std::istream_iterator<PointT>(f), std::istream_iterator<PointT>(),
             std::back_inserter(v)
    );

    std::copy(v.begin(), v.end(), 
              std::ostream_iterator<PointT>(std::cout)
    );
    return 0;
}

Note that this is not strictly equivalent to the Python code in your question because the tuples don't have to be on separate lines. For example, this:

1,2,3 4,5,6

will give the same output than:

1,2,3
4,5,6

It's up to you to decide if that's a bug or a feature :)

Comments

3

You could read the file from a std::iostream line by line, put each line into a std::string and then use boost::tokenizer to split it. It won't be quite as elegant/short as the python one but a lot easier than reading things in a character at a time...

Comments

1

Its nowhere near as terse, and of course I didn't compile this.

float atof_s( std::string & s ) { return atoi( s.c_str() ); }
{ 
ifstream f("data.txt")
string str;
vector<vector<float>> data;
while( getline( f, str ) ) {
  vector<float> v;
  boost::algorithm::split_iterator<string::iterator> e;
  std::transform( 
     boost::algorithm::make_split_iterator( str, token_finder( is_any_of( "," ) ) ),
     e, v.begin(), atof_s );
  v.resize(3); // only grab the first 3
  data.push_back(v);
}

1 Comment

Fugly, you know. You're reading CSV and you make it look like some sort of rocket science. Keep it simple.
1

One of Sony Picture Imagework's open-source projects is Pystring, which should make for a mostly direct translation of the string-splitting parts:

Pystring is a collection of C++ functions which match the interface and behavior of python’s string class methods using std::string. Implemented in C++, it does not require or make use of a python interpreter. It provides convenience and familiarity for common string operations not included in the standard C++ library

There are a few examples, and some documentation

Comments

1

all these are good examples. yet they dont answer the following:

  1. a CSV file with different column numbers (some rows with more columns than others)
  2. or when some of the values have white space (ya yb,x1 x2,,x2,)

so for those who are still looking, this class: http://www.codeguru.com/cpp/tic/tic0226.shtml is pretty cool... some changes might be needed

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.