2

I'm learning C++ for one of my CS classes, and for our first project I need to parse some URLs using c-strings (i.e. I can't use the C++ String class).

The only way I can think of approaching this is just iterating through (since it's a char[]) and using some switch statements. From someone who is more experienced in C++ - is there a better approach? Could you maybe point me to a good online resource? I haven't found one yet.

6 Answers 6

6

Weird that you're not allowed to use C++ language features i.e. C++ strings!

There are some C string functions available in the standard C library.

e.g.

strdup - duplicate a string
strtok - breaking a string into tokens. Beware - this modifies the original string.
strcpy - copying string
strstr - find string in string
strncpy - copy up to n bytes of string
etc

There is a good online reference here with a full list of available c string functions for searching and finding things.

http://www.cplusplus.com/reference/clibrary/cstring/

You can walk through strings by accessing them like an array if you need to.

e.g.

char* url="http://stackoverflow.com/questions/1370870/c-strings-in-c"
int len = strlen(url);
for (int i = 0; i < len; ++i){
  std::cout << url[i];
}
std::cout << endl;

As for actually how to do the parsing, you'll have to work that out on your own. It is an assignment after all.

Sign up to request clarification or add additional context in comments.

2 Comments

strdup is not in the standard library, it defined by POSIX.
If he doesn't have strdup(), it would be a nice little part of the assignment to provide it. Bootstraps!
5

There are a number of C standard library functions that can help you.

First, look at the C standard library function strtok. This allows you to retrieve parts of a C string separated by certain delimiters. For example, you could tokenize with the delimiter / to get the protocol, domain, and then the file path. You could tokenize the domain with delimiter . to get the subdomain(s), second level domain, and top level domain. Etc.

It's not nearly as powerful as a regular expression parser, which is what you would really want for parsing URLs, but it works on C strings, is part of the C standard library and is probably OK to use in your assignment.

Other C standard library functions that may help:

  • strstr() Extracts substrings just like std::string::substr()
  • strspn(), strchr() and strpbrk() Find a character or characters in a string, similar to std::string::find_first_of(), etc.

Edit: A reminder that the proper way to use these functions in C++ is to include <cstring> and use them in the std:: namespace, e.g. std::strtok().

2 Comments

strtok is pretty nasty since it modifies the string. I am a big fan of const so I'd recommend avoiding strtok.
IMO, strtok is quite useful and a lot less painful than hand-coding everything when it comes to parsing strings using only the C standard library. But yes, you do have to beware of its gotchas including the string modification and its non-reentrancy (although POSIX provides a re-entrant version called strtok_r)
1

You might want to refer to an open source library that can parse URLs (as a reference for how others have done it -- obviously don't copy and paste it!), such as curl or wget (links are directly to their url parsing files).

3 Comments

For some reason I doubt that what his instructor is looking for.
@Michael: I thought the same as you until I realized he might mean for the questioner to use the source for ideas.
Fair enough... Now I wonder if someone who's unaware of C library basics will be able to keep his head from asploding reading through that code?
1

I don't know what the requirements are for parsing the URLs, but if this is CS level it would be appropriate to use (very simple) BNF and a (very simple) recursive descent parser.

This would make for a more robust solution than direct iteration, e.g. for malformed URLs.

Very few string functions from the standard C library would be needed.

Comments

0

You can use C functions like strtok, strchr, strstr etc.

Comments

0

Many of the runtime library functions that have been mentioned work quite well, either in conjunction with or apart from the approach of iterating through the string that you mentioned (which I think is time honored).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.