I'm writing Http-Client which takes URL on somefile, download it and save it on a disk. Like curl does. I can use only C/C++ with std:: and libc. I have no problems with downloading text files like XML, CSV or txt, because they were saved like it should be and if to open them in editor - it's ok, there's that text which was expected. But when i download tar or pdf and trying to open them, it tells that files are corrupted.
Here's 2 main methods of my class HttpClient. HttpClient::get - send Http-request to the host, which is mentioned in URL, and calls the 2nd main method - HttpClient::receive which defines what kind of data there is - binary or text, and write whole Http-request body in a file using binary or text mode. All other methods i decided not to show, but i can if someone needs.
HttpClient::get:
bool HttpClient::get() {
std::string protocol = getProtocol();
if (protocol != "http://") {
std::cerr << "Don't support no HTTP protocol" << std::endl;
return false;
}
std::string host_name = getHost();
std::string request = "GET ";
request += url + " HTTP/" + HTTP_VERSION + "\r\n";
request += "Host: " + host_name + "\r\n";
request += "Accept-Encoding: gzip\r\n";
request += "Connection: close\r\n";
request += "\r\n";
sock = socket(AF_INET, SOCK_STREAM, 0);
if (sock < 0) {
std::cerr << "Can't create socket" << std::endl;
return false;
}
addr.sin_family = AF_INET;
addr.sin_port = htons(HTTP_PORT);
raw_host = gethostbyname(host_name.c_str());
if (raw_host == NULL) {
std::cerr << "No such host: " << host_name << std::endl;
return false;
}
if (!this->connect()) {
std::cerr << "Can't connect" << std::endl;
return false;
} else {
std::cout << "Connection established" << std::endl;
}
if (!sendAll(request)) {
std::cerr << "Error while sending HTTP request" << std::endl;
return false;
}
if (!receive()) {
std::cerr << "Error while receiving HTTP response" << std::endl;
return false;
}
close(sock);
return true;
}
HttpClient::receive:
bool HttpClient::receive() {
char buf[BUF_SIZE];
std::string response = "";
std::ofstream file;
FILE *fd = NULL;
while (1) {
size_t bytes_read = recv(sock, buf, BUF_SIZE - 1, 0);
if (bytes_read < 0)
return false;
buf[bytes_read] = '\0';
if (!file.is_open())
std::cout << buf;
if (!file.is_open()) {
response += buf;
std::string content = getHeader(response, "Content-Type");
if (!content.empty()) {
std::cout << "Content-Type: " << content << std::endl;
if (content.find("text/") == std::string::npos) {
std::cout << "Binary mode" << std::endl;
file.open(filename, std::ios::binary);
}
else {
std::cout << "Text mode" << std::endl;
file.open(filename);
}
std::string::size_type start_file = response.find("\r\n\r\n");
file << response.substr(start_file + 4);
}
}
else
file << buf;
if (bytes_read == 0) {
file.close();
break;
}
}
return true;
}
I can't find help, but i think that binary data is encoded in some way, but how to decode it?
buf[bytes_read] = '\0';-- Unless I'm mistaken how you're reading the file, If the file is binary, why are you artificially sticking a null in the data? That would corrupt the binary data.response += bufalso won't work if there are nul characters in your binary data which is very likely to be the case.receive()is not properly parsing the HTTP response. It is just blindly reading arbitrary chunks of data until disconnected, trying to parse as it goes. You need to read the HTTP headers until you reach the terminating\r\n\r\n, THEN parse the headers to know the transmission format of the body, THEN read the body accordingly. See my answers to Receiving only necessary data with C++ Socket and When is an HTTP response finished? for pseudo code on reading an HTTP response properly.