3

I want to recover all the links from a page, while executing this code I get:

Microsoft Visual C++ Debug Library

Debug Assertion Failed!

Program: C:\Users\Gandalf\Desktop\proxy\Debug\Proxy.exe File: C:\Program Files\Microsoft Visual Studio 10.0\VC\include\xstring Line: 78

Expression: string iterator not dereferencable

For information on how your program can cause an assertion failure, see the Visual C++ documentation on asserts.

(Press Retry to debug the application)

Abort Retry Ignore

void Deltacore::Client::get_links() {
boost::smatch matches;
boost::match_flag_type flags = boost::match_default;
boost::regex URL_REGEX("^<a[^>]*(http://[^\"]*)[^>]*>([ 0-9a-zA-Z]+)</a>$");

if(!response.empty()) {

    std::string::const_iterator alfa = this->response.begin();
    std::string::const_iterator omega   = this->response.end();

    while (boost::regex_search(alfa, omega, matches, URL_REGEX))
    {
        std::cout << matches[0];
        //if(std::find(this->Links.begin(), this->Links.end(), matches[0]) != this->Links.end()) {
            this->Links.push_back(matches[0]);
        //}
        alfa = matches[0].second;
    }
}
}

Any Ideea?

Added more code:

        Deltacore::Client client;
    client.get_url(target);
    client.get_links();

            boost::property_tree::ptree props;
            for(size_t i = 0; i < client.Links.size(); i++)
                props.push_back(std::make_pair(boost::lexical_cast<std::string>(i), client.Links.at(i)));

            std::stringstream ss;
            boost::property_tree::write_json(ss, props, false);

            boost::asio::async_write(socket_,
                boost::asio::buffer(ss.str(), ss.str().length()),
                boost::bind(&session::handle_write, this,
                boost::asio::placeholders::error));

Thanks in advance

12
  • Just try with the std::string::iterator instead of const_iterator. Commented Jul 26, 2012 at 21:52
  • 1
    @Wug It's in the C++ basic includes, I'm pretty sure the error is in my code. Commented Jul 26, 2012 at 21:56
  • @Mahesh boost::regex_search for some reason forces me to use std::string::const_iterator Commented Jul 26, 2012 at 21:57
  • Maybe it wants end() - 1 or something. That's an assertion right? (It says it is.) What is the value of the string before you get the iterators? Commented Jul 26, 2012 at 21:59
  • this->response is the full HTML output of a page (I get it using cURL). Commented Jul 26, 2012 at 22:00

2 Answers 2

4

The problem is on this line:

boost::asio::buffer(ss.str(), ss.str().length())

str() returns a temporary std::string object, so you are actually invalidating the buffer as soon as you create it – vanilla UB, as I commented. ;-]

Token documentation citation:

The buffer is invalidated by any non-const operation called on the given string object.

Of course, destroying the string qualifies as a non-const operation.

Sign up to request clarification or add additional context in comments.

1 Comment

That actually fixed it. Thank you. It's 2 AM and I'm writing bad code :/
1

Skipping the lecture on using regex to parse HTML (and how you really shouldn't...), your regex doesn't look like it will work like you intend. This is yours:

"^<a[^>]*(http://[^\"]*)[^>]*>([ 0-9a-zA-Z]+)</a>$"

The first character class will be greedy and eat up your http and following parts. You want to add a question mark to make it not greedy.

"^<a[^>]*?(http://[^\"]*)[^>]*>([ 0-9a-zA-Z]+)</a>$"

This might or might not be related to the exception.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.