C++ string parsing (python style)

Question

I love how in python I can do something like:

points = []
for line in open("data.txt"):
    a,b,c = map(float, line.split(','))
    points += [(a,b,c)]

Basically it's reading a list of lines where each one represents a point in 3D space, the point is represented as three numbers separated by commas

How can this be done in C++ without too much headache?

Performance is not very important, this parsing only happens one time, so simplicity is more important.

P.S. I know it sounds like a newbie question, but believe me I've written a lexer in D (pretty much like C++) which involves reading some text char by char and recognizing tokens,
it's just that, coming back to C++ after a long period of python, just makes me not wanna waste my time on such things.

How about some of the examples from the following, they are somewhat python-esq: codeproject.com/KB/recipes/Tokenizer.aspx Furthermore they are very efficient and somewhat elegant. — Matthieu N.
– Matthieu N., Commented Nov 4, 2010 at 2:04

hasen · Accepted Answer · 2009-02-14 03:57:46Z

24

I`d do something like this:

ifstream f("data.txt");
string str;
while (getline(f, str)) {
    Point p;
    sscanf(str.c_str(), "%f, %f, %f\n", &p.x, &p.y, &p.z); 
    points.push_back(p);
}

x,y,z must be floats.

And include:

#include <iostream>
#include <fstream>

edited Feb 14, 2009 at 3:57

hasen

167k66 gold badges199 silver badges235 bronze badges

answered Feb 11, 2009 at 10:33

klew

15k7 gold badges51 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

j_random_hacker Over a year ago

If you decide to change from using floats to using doubles, don't forget to change each %f to %lf. A solution using operator>>() instead of sscanf() doesn't need to be changed in this case.

hasen Over a year ago

I accepted this answer for brevity and straight-forwardness :)

9 revs, 2 users 96% Matthieu N. · Accepted Answer · 2010-12-13 05:34:36Z

19

The C++ String Toolkit Library (StrTk) has the following solution to your problem:

#include <string>
#include <deque>
#include "strtk.hpp"

struct point { double x,y,z; }

int main()
{
   std::deque<point> points;
   point p;
   strtk::for_each_line("data.txt",
                        [&points,&p](const std::string& str)
                        {
                           strtk::parse(str,",",p.x,p.y,p.z);
                           points.push_back(p);
                        });
   return 0;
}

More examples can be found Here

edited Dec 13, 2010 at 5:34

community wiki

9 revs, 2 users 96%
Matthieu N.

Comments

Konrad Rudolph · Accepted Answer · 2009-02-11 11:45:35Z

17

All these good examples aside, in C++ you would normally override the operator >> for your point type to achieve something like this:

point p;
while (file >> p)
    points.push_back(p);

or even:

copy(
    istream_iterator<point>(file),
    istream_iterator<point>(),
    back_inserter(points)
);

The relevant implementation of the operator could look very much like the code by j_random_hacker.

answered Feb 11, 2009 at 11:45

Konrad Rudolph

549k142 gold badges967 silver badges1.3k bronze badges

1 Comment

j_random_hacker Over a year ago

This is definitely the way to do it if you will input Point objects in several different places in your code.

j_random_hacker · Accepted Answer · 2009-02-11 13:51:26Z

14

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <algorithm>     // For replace()

using namespace std;

struct Point {
    double a, b, c;
};

int main(int argc, char **argv) {
    vector<Point> points;

    ifstream f("data.txt");

    string str;
    while (getline(f, str)) {
        replace(str.begin(), str.end(), ',', ' ');
        istringstream iss(str);
        Point p;
        iss >> p.a >> p.b >> p.c;
        points.push_back(p);
    }

    // Do something with points...

    return 0;
}

edited Feb 11, 2009 at 13:51

answered Feb 11, 2009 at 9:59

j_random_hacker

51.6k10 gold badges109 silver badges176 bronze badges

1 Comment

j_random_hacker Over a year ago

@Iraimbilanja: Although I traverse the string twice (first using replace(), then via iss), I suspect this is at least as fast in practice as the other solutions, with the possible exception of klew's sscanf()-based approach. CPUs are good at replace().

Benoît · Accepted Answer · 2011-01-11 07:15:35Z

7

This answer is based on the previous answer by j_random_hacker and makes use of Boost Spirit.

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <boost/spirit.hpp>

using namespace std;
using namespace boost;
using namespace boost::spirit;

struct Point {
    double a, b, c;
};

int main(int argc, char **argv) 
{
    vector<Point> points;

    ifstream f("data.txt");

    string str;
    Point p;
    rule<> point_p = 
           double_p[assign_a(p.a)] >> ',' 
        >> double_p[assign_a(p.b)] >> ',' 
        >> double_p[assign_a(p.c)] ; 

    while (getline(f, str)) 
    {
        parse( str, point_p, space_p );
        points.push_back(p);
    }

    // Do something with points...

    return 0;
}

edited Jan 11, 2011 at 7:15

answered Feb 11, 2009 at 10:19

Benoît

17.1k8 gold badges49 silver badges67 bronze badges

2 Comments

JBeurer Over a year ago

Maybe because using boost::spirit to parse comma separated lists is an overkill? Boost::spirit significantly affects compile time.

Matthieu N. Over a year ago

Maybe because of the fact that you're instantiating the rule inside the loop, typically this would be a huge source of inefficiency, you'd be better off having it defined outside of the loop. - Spirit is overkill, adds huge amount of compilation time, and is nearly impossible to debug, compiler warnings and error messages are simply incomprehensible.

Éric Malenfant · Accepted Answer · 2009-02-11 16:31:40Z

Fun with Boost.Tuples:

#include <boost/tuple/tuple_io.hpp>
#include <vector>
#include <fstream>
#include <iostream>
#include <algorithm>

int main() {
    using namespace boost::tuples;
    typedef boost::tuple<float,float,float> PointT;

    std::ifstream f("input.txt");
    f >> set_open(' ') >> set_close(' ') >> set_delimiter(',');

    std::vector<PointT> v;

    std::copy(std::istream_iterator<PointT>(f), std::istream_iterator<PointT>(),
             std::back_inserter(v)
    );

    std::copy(v.begin(), v.end(), 
              std::ostream_iterator<PointT>(std::cout)
    );
    return 0;
}

Note that this is not strictly equivalent to the Python code in your question because the tuples don't have to be on separate lines. For example, this:

1,2,3 4,5,6

will give the same output than:

1,2,3
4,5,6

It's up to you to decide if that's a bug or a feature :)

Timo Geusch · Accepted Answer · 2009-02-11 09:55:27Z

3

You could read the file from a std::iostream line by line, put each line into a std::string and then use boost::tokenizer to split it. It won't be quite as elegant/short as the python one but a lot easier than reading things in a character at a time...

answered Feb 11, 2009 at 9:55

Timo Geusch

24.4k5 gold badges55 silver badges71 bronze badges

Comments

Sanjaya R · Accepted Answer · 2009-02-11 20:58:00Z

1

Its nowhere near as terse, and of course I didn't compile this.

float atof_s( std::string & s ) { return atoi( s.c_str() ); }
{ 
ifstream f("data.txt")
string str;
vector<vector<float>> data;
while( getline( f, str ) ) {
  vector<float> v;
  boost::algorithm::split_iterator<string::iterator> e;
  std::transform( 
     boost::algorithm::make_split_iterator( str, token_finder( is_any_of( "," ) ) ),
     e, v.begin(), atof_s );
  v.resize(3); // only grab the first 3
  data.push_back(v);
}

answered Feb 11, 2009 at 20:58

Sanjaya R

6,4662 gold badges20 silver badges19 bronze badges

1 Comment

JBeurer Over a year ago

Fugly, you know. You're reading CSV and you make it look like some sort of rocket science. Keep it simple.

dbr · Accepted Answer · 2009-10-25 14:19:26Z

1

One of Sony Picture Imagework's open-source projects is Pystring, which should make for a mostly direct translation of the string-splitting parts:

Pystring is a collection of C++ functions which match the interface and behavior of python’s string class methods using std::string. Implemented in C++, it does not require or make use of a python interpreter. It provides convenience and familiarity for common string operations not included in the standard C++ library

There are a few examples, and some documentation

answered Oct 25, 2009 at 14:19

dbr

171k69 gold badges284 silver badges348 bronze badges

Comments

Lior · Accepted Answer · 2011-04-19 15:15:48Z

1

all these are good examples. yet they dont answer the following:

a CSV file with different column numbers (some rows with more columns than others)
or when some of the values have white space (ya yb,x1 x2,,x2,)

so for those who are still looking, this class: http://www.codeguru.com/cpp/tic/tic0226.shtml is pretty cool... some changes might be needed

answered Apr 19, 2011 at 15:15

Lior

40.6k12 gold badges40 silver badges40 bronze badges

Collectives™ on Stack Overflow

C++ string parsing (python style)

10 Answers 10

2 Comments

Comments

1 Comment

1 Comment

2 Comments

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

2 Comments

Comments

1 Comment

1 Comment

2 Comments

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related