0

I have a string vector that holds some values. These values are supposed to be hex bytes but are being stored as strings inside this vector. The bytes were read from inside a text file actually, something like this:

(contents of the text file)

<jpeg1>
0xFF,0xD8,0xFF,0xE0,0x00,0x10,0x4A,0x46,0x49,0x46,0x00,0x01,0x01,0x01,0x00,0x60
</jpeg1>

so far, what my code does is, it starts reading the line after the {JPEG1} tag until the {/jpeg1} tag and then using the comma ',' as a delimeter it stores the bytes into the string vector.

After Splitting the string, the vector at the moment stores the values like this :

vector<string> myString = {"0xFF", "0xD8", "0xFF", "0xE0", "0x00", "0x10", "0x4A", "0x46", "0x49", "0x46", "0x00", "0x01", "0x01", "0x01", "0x00", "0x60"};

        and if i print this i get the following:
            0: 0xFF
            1: 0xD8
            2: 0xFF
            3: 0xE0
            4: 0x00
            5: 0x10
            6: 0x4A
            7: 0x46
            8: 0x49
            9: 0x46

What I would want is that, I'd like to store these bytes inside an unsigned char array, such that each element be treated as a HEX byte and not a string value.

Preferably something like this :

     unsigned char myHexArray[] = {0xFF,0xD8,0xFF,0xE0,0x00,0x10,0x4A,0x46,0x49,0x46,0x00,0x01,0x01,0x01,0x00,0x60};

        if i print this i get:
            0:  
            1: ╪
            2:  
            3: α
            4:
            5: 
            6: J
            7: F
            8: I
            9: F

Solved!
Thanks for your help guys, so far "ranban282" solution has worked for me, I'll try solutions provided by other users as well.

7
  • Do you need the vector of strings in the first place? Commented Apr 18, 2017 at 6:21
  • Asked like this, it's a duplicate of stackoverflow.com/questions/1070497/… . Commented Apr 18, 2017 at 6:21
  • you might even extract the textual source from between the tags and include it in the C++ source ... Commented Apr 18, 2017 at 6:22
  • @n.m no its not necessary. Im using vectors because the function that im using (copied from stackoverflow) to split the string uses vectors. Commented Apr 18, 2017 at 6:27
  • Eventually what is required is, read bytes from the textfile and store them into a unsigned Char Array of some sort. :) Commented Apr 18, 2017 at 6:28

4 Answers 4

2

I wouldn't even go through the std::vector<std::string> stage, you don't need it and it wastes a lot of allocations for no good reason; just parse the string to bytes "online".

If you already have an istream for your data, you can parse it straight from it, although I had terrible experiences about performance for it.

// is is some derived class of std::istream
std::vector<unsigned char> ret;
while(is) {
    int val = 0;
    is>>std::hex>>val;
    if(!is) {
        break; // failed conversion; remember to clean up the stream
               // if you need it later!
    }
    ret.push_back(val);
    if(is.getc()!=',') break;
}

If instead you have it in a string - as often happens when extracting data from an XML file, you can parse it either using istringstream and the code above (one extra string copy + generally quite slow), or parse it straight from the string using e.g. sscanf with %i; say that your string is in a const char *sz:

std::vector<unsigned char> ret;
for(; *sz; ++sz) {
    int read = 0;
    int val = 0;
    if(sscanf(sz, " %i %n", &val, &read)==0) break; // format error
    ret.push_back(val):
    sz += read;
    if(*sz && *sz != ',') break; // format error
} 
// now ret contains the decoded string

If you are sure that the strings are always hexadecimal, regardless of the 0x prefix, and that whitespace is not present strtol is a bit more efficient and IMO nicer to use:

std::vector<unsigned char> ret;
for( ;*sz;++sz) {
    char *endp;
    long val = strtol(sz, &endp, 16);
    if(endp==sz) break; // format error
    sz = endp;
    ret.push_back(val);
    if(*sz && *sz!=',') break; // format error
}

If C++17 is available, you can use std::from_chars instead of strtol to cut out the locale bullshit, which can break your parsing function (although that's more typical for floating point parsing) and slow it down for no good reason.

OTOH, if the performance is critical but from_chars is not available (or if it's available but you measured that it's slow), it may be advantageous to hand roll the whole parser.

auto conv_digit = [](char c) -> int {
    if(c>='0' && c<='9') return c-'0';
    // notice: technically not guaranteed to work;
    // in practice it'll work on anything that doesn't use EBCDIC
    if(c>='A' && c<='F') return c-'A'+10;
    if(c>='a' && c<='f') return c-'a'+10;
    return -1;
};
std::vector<unsigned char> ret;
for(; *sz; ++sz) {
    while(*sz == ' ') ++sz;
    if(*sz!='0' || sz[1]!='x' || sz[1]!='X') break; // format error
    sz+=2;
    int val = 0;
    int digit = -1;
    const char *sz_before = sz;
    while((digit = conv_digit(*sz)) >= 0) {
        val=val*16+digit; // or, if you prefer: val = val<<4 | digit;
        ++sz;
    }
    if(sz==sz_before) break; // format error
    ret.push_back(val);
    while(*sz == ' ') ++sz;
    if(*sz && *sz!=',') break; // format error
}
Sign up to request clarification or add additional context in comments.

5 Comments

@n.m.: what would be the C++ idiomatic way to handle a parsing problem? std::istringstream? boost::spirit? std::locale::use_face<whatever>::some_other_ridicolous_function_name_that_ultimately_calls_sscanf? Don't make me laugh... If the "C++ way" is a regression over the C way let's keep the old one.
(the only better way I see to handle this task is actually strtol, although it doesn't have the "whatever base for free" benefit as %i or, if speed is really important and we can cut on the locale bullshit, a hand-rolled parser)
Why is ret of type std::string as oppose to using std::vector?
std::istringstream would be the easy C++ way, boost::spirit would probably get you more error checks for free, I have no idea why bring facets and locales to the picture.
Anyway the answer has much more code now, so the comment is not that relevant.
1

If you're using C++11, you can use the stoi function.

vector<string> myString = {"0xFF", "0xD8", "0xFF", "0xE0", "0x00", "0x10", "0x4A", "0x46", "0x49", "0x46", "0x00", "0x01", "0x01", "0x01", "0x00", "0x60"};
    unsigned char* myHexArray=new unsigned char[myString.size()];
    for (unsigned  i=0;i<myString.size();i++)
    {
            myHexArray[i]=stoi(myString[i],NULL,0);
    }
    for (unsigned i=0;i<myString.size();i++)
    {
            cout<<myHexArray[i]<<endl;
    }

The function stoi() was introduced by C++11. In order to compile with gcc, you should compile with the flags -std=c++11.

In case you're using an older version of c++ you can use strtol instead of stoi. Note that you need to convert the string to a character array first.

myHexArray[i]=strtol(myString[i].c_str(),NULL,0);

6 Comments

What's this unsigned char* nonsense? What's wrong with a vector of bytes?
@ranban, im using codeblocks with mingw 4.9.2, the compiler is already set to use c++11. Im getting "Stoi" was not declared in the scope. using std::stoi gives the same error "stoi is not a member of std"
I read that stoi is not a member of the std namespace in the minGW, which codeblocks uses. How did you get this solution working? Did you use strtol?
well, apparently codeblocks has a bug. I searched online and someone suggested that i use the TDM-GCC-Mingw compiler..I downloaded and installed it from here: sourceforge.net/projects/tdm-gcc and then used this as a compiler for codeblocks. It works now :)
Great, please upvote the answer if you found it helpful.
|
1

You can use std::stoul on each of your values and build your array using another std::vector like this:

std::vector<std::string> vs {"0xFF", "0xD8", "0xFF" ...};

std::vector<unsigned char> vc;
vc.reserve(vs.size());

for(auto const& s: vs)
    vc.push_back((unsigned char) std::stoul(s, 0, 0));

Now you can access your array with:

vc.data(); // <-- pointer to unsigned char array

Comments

0

Here's a complete solution including a test and a rudimentary parser (for simplicity, it assumes that the xml tags are on their own lines).

#include <string>
#include <sstream>
#include <regex>
#include <iostream>
#include <iomanip>
#include <iterator>

const char test_data[] =
R"__(<jpeg1>
0xFF,0xD8,0xFF,0xE0,0x00,0x10,0x4A,0x46,0x49,0x46,0x00,0x01,0x01,0x01,0x00,0x60,
0x12,0x34,0x56,0x78,0x9a,0xbc,0xde,0xf0
</jpeg1>)__";


struct Jpeg
{
    std::string name;
    std::vector<std::uint8_t> data;
};

std::ostream& operator<<(std::ostream& os, const Jpeg& j)
{
    os << j.name << " : ";
    const char* sep = " ";
    os << '[';
    for (auto b : j.data) {
        os << sep << std::hex << std::setfill('0') << std::setw(2) << std::uint32_t(b);
        sep = ", ";
    }
    return os << " ]";

}

template<class OutIter>
OutIter read_bytes(OutIter dest, std::istream& source)
{
    std::string buffer;
    while (std::getline(source, buffer, ','))
    {
        *dest++ = static_cast<std::uint8_t>(std::stoul(buffer, 0, 16));
    }
    return dest;
}

Jpeg read_jpeg(std::istream& is)
{
    auto result = Jpeg {};
    static const auto begin_tag = std::regex("<jpeg(.*)>");
    static const auto end_tag = std::regex("</jpeg(.*)>");
    std::string line, hex_buffer;
    if(not std::getline(is, line)) throw std::runtime_error("end of file");
    std::smatch match;
    if (not std::regex_match(line, match, begin_tag)) throw std::runtime_error("not a <jpeg_>");
    result.name = match[1];

    while (std::getline(is, line))
    {
        if (std::regex_match(line, match, end_tag)) { break; }
        std::istringstream hexes { line };
        read_bytes(std::back_inserter(result.data), hexes);
    }


    return result;
}

int main()
{
    std::istringstream input_stream(test_data);
    auto jpeg = read_jpeg(input_stream);

    std::cout << jpeg << std::endl;
}

expected output:

1 : [ ff, d8, ff, e0, 00, 10, 4a, 46, 49, 46, 00, 01, 01, 01, 00, 60, 12, 34, 56, 78, 9a, bc, de, f0 ]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.