Convert vector<string> to unsigned char array in C++

Question

I have a string vector that holds some values. These values are supposed to be hex bytes but are being stored as strings inside this vector. The bytes were read from inside a text file actually, something like this:

(contents of the text file)

<jpeg1>
0xFF,0xD8,0xFF,0xE0,0x00,0x10,0x4A,0x46,0x49,0x46,0x00,0x01,0x01,0x01,0x00,0x60
</jpeg1>

so far, what my code does is, it starts reading the line after the {JPEG1} tag until the {/jpeg1} tag and then using the comma ',' as a delimeter it stores the bytes into the string vector.

After Splitting the string, the vector at the moment stores the values like this :

vector<string> myString = {"0xFF", "0xD8", "0xFF", "0xE0", "0x00", "0x10", "0x4A", "0x46", "0x49", "0x46", "0x00", "0x01", "0x01", "0x01", "0x00", "0x60"};

        and if i print this i get the following:
            0: 0xFF
            1: 0xD8
            2: 0xFF
            3: 0xE0
            4: 0x00
            5: 0x10
            6: 0x4A
            7: 0x46
            8: 0x49
            9: 0x46

What I would want is that, I'd like to store these bytes inside an unsigned char array, such that each element be treated as a HEX byte and not a string value.

Preferably something like this :

     unsigned char myHexArray[] = {0xFF,0xD8,0xFF,0xE0,0x00,0x10,0x4A,0x46,0x49,0x46,0x00,0x01,0x01,0x01,0x00,0x60};

        if i print this i get:
            0:  
            1: ╪
            2:  
            3: α
            4:
            5: 
            6: J
            7: F
            8: I
            9: F

Solved!
Thanks for your help guys, so far "ranban282" solution has worked for me, I'll try solutions provided by other users as well.

Asked like this, it's a duplicate of stackoverflow.com/questions/1070497/… . — atlaste
– atlaste, Commented Apr 18, 2017 at 6:21
you might even extract the textual source from between the tags and include it in the C++ source ... — Hagen von Eitzen
– Hagen von Eitzen, Commented Apr 18, 2017 at 6:22
@n.m no its not necessary. Im using vectors because the function that im using (copied from stackoverflow) to split the string uses vectors. — erik.martin
– erik.martin, Commented Apr 18, 2017 at 6:27
Eventually what is required is, read bytes from the textfile and store them into a unsigned Char Array of some sort. :) — erik.martin
– erik.martin, Commented Apr 18, 2017 at 6:28

Matteo Italia · Accepted Answer · 2017-04-18 07:56:22Z

2

I wouldn't even go through the std::vector<std::string> stage, you don't need it and it wastes a lot of allocations for no good reason; just parse the string to bytes "online".

If you already have an istream for your data, you can parse it straight from it, although I had terrible experiences about performance for it.

// is is some derived class of std::istream
std::vector<unsigned char> ret;
while(is) {
    int val = 0;
    is>>std::hex>>val;
    if(!is) {
        break; // failed conversion; remember to clean up the stream
               // if you need it later!
    }
    ret.push_back(val);
    if(is.getc()!=',') break;
}

If instead you have it in a string - as often happens when extracting data from an XML file, you can parse it either using istringstream and the code above (one extra string copy + generally quite slow), or parse it straight from the string using e.g. sscanf with %i; say that your string is in a const char *sz:

std::vector<unsigned char> ret;
for(; *sz; ++sz) {
    int read = 0;
    int val = 0;
    if(sscanf(sz, " %i %n", &val, &read)==0) break; // format error
    ret.push_back(val):
    sz += read;
    if(*sz && *sz != ',') break; // format error
} 
// now ret contains the decoded string

If you are sure that the strings are always hexadecimal, regardless of the 0x prefix, and that whitespace is not present strtol is a bit more efficient and IMO nicer to use:

std::vector<unsigned char> ret;
for( ;*sz;++sz) {
    char *endp;
    long val = strtol(sz, &endp, 16);
    if(endp==sz) break; // format error
    sz = endp;
    ret.push_back(val);
    if(*sz && *sz!=',') break; // format error
}

If C++17 is available, you can use std::from_chars instead of strtol to cut out the locale bullshit, which can break your parsing function (although that's more typical for floating point parsing) and slow it down for no good reason.

OTOH, if the performance is critical but from_chars is not available (or if it's available but you measured that it's slow), it may be advantageous to hand roll the whole parser.

auto conv_digit = [](char c) -> int {
    if(c>='0' && c<='9') return c-'0';
    // notice: technically not guaranteed to work;
    // in practice it'll work on anything that doesn't use EBCDIC
    if(c>='A' && c<='F') return c-'A'+10;
    if(c>='a' && c<='f') return c-'a'+10;
    return -1;
};
std::vector<unsigned char> ret;
for(; *sz; ++sz) {
    while(*sz == ' ') ++sz;
    if(*sz!='0' || sz[1]!='x' || sz[1]!='X') break; // format error
    sz+=2;
    int val = 0;
    int digit = -1;
    const char *sz_before = sz;
    while((digit = conv_digit(*sz)) >= 0) {
        val=val*16+digit; // or, if you prefer: val = val<<4 | digit;
        ++sz;
    }
    if(sz==sz_before) break; // format error
    ret.push_back(val);
    while(*sz == ' ') ++sz;
    if(*sz && *sz!=',') break; // format error
}

edited Apr 18, 2017 at 7:56

answered Apr 18, 2017 at 6:28

Matteo Italia

128k18 gold badges219 silver badges313 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Matteo Italia Over a year ago

@n.m.: what would be the C++ idiomatic way to handle a parsing problem? std::istringstream? boost::spirit? std::locale::use_face<whatever>::some_other_ridicolous_function_name_that_ultimately_calls_sscanf? Don't make me laugh... If the "C++ way" is a regression over the C way let's keep the old one.

Matteo Italia Over a year ago

(the only better way I see to handle this task is actually strtol, although it doesn't have the "whatever base for free" benefit as %i or, if speed is really important and we can cut on the locale bullshit, a hand-rolled parser)

Jonas Over a year ago

Why is ret of type std::string as oppose to using std::vector?

n. m. could be an AI Over a year ago

std::istringstream would be the easy C++ way, boost::spirit would probably get you more error checks for free, I have no idea why bring facets and locales to the picture.

n. m. could be an AI Over a year ago

Anyway the answer has much more code now, so the comment is not that relevant.

ranban282 · Accepted Answer · 2017-04-18 06:44:01Z

1

If you're using C++11, you can use the stoi function.

vector<string> myString = {"0xFF", "0xD8", "0xFF", "0xE0", "0x00", "0x10", "0x4A", "0x46", "0x49", "0x46", "0x00", "0x01", "0x01", "0x01", "0x00", "0x60"};
    unsigned char* myHexArray=new unsigned char[myString.size()];
    for (unsigned  i=0;i<myString.size();i++)
    {
            myHexArray[i]=stoi(myString[i],NULL,0);
    }
    for (unsigned i=0;i<myString.size();i++)
    {
            cout<<myHexArray[i]<<endl;
    }

The function stoi() was introduced by C++11. In order to compile with gcc, you should compile with the flags -std=c++11.

In case you're using an older version of c++ you can use strtol instead of stoi. Note that you need to convert the string to a character array first.

myHexArray[i]=strtol(myString[i].c_str(),NULL,0);

answered Apr 18, 2017 at 6:44

ranban282

1863 silver badges18 bronze badges

6 Comments

Richard Hodges Over a year ago

What's this unsigned char* nonsense? What's wrong with a vector of bytes?

erik.martin Over a year ago

@ranban, im using codeblocks with mingw 4.9.2, the compiler is already set to use c++11. Im getting "Stoi" was not declared in the scope. using std::stoi gives the same error "stoi is not a member of std"

ranban282 Over a year ago

I read that stoi is not a member of the std namespace in the minGW, which codeblocks uses. How did you get this solution working? Did you use strtol?

erik.martin Over a year ago

well, apparently codeblocks has a bug. I searched online and someone suggested that i use the TDM-GCC-Mingw compiler..I downloaded and installed it from here: sourceforge.net/projects/tdm-gcc and then used this as a compiler for codeblocks. It works now :)

ranban282 Over a year ago

Great, please upvote the answer if you found it helpful.

|

Galik · Accepted Answer · 2017-04-18 06:53:13Z

1

You can use std::stoul on each of your values and build your array using another std::vector like this:

std::vector<std::string> vs {"0xFF", "0xD8", "0xFF" ...};

std::vector<unsigned char> vc;
vc.reserve(vs.size());

for(auto const& s: vs)
    vc.push_back((unsigned char) std::stoul(s, 0, 0));

Now you can access your array with:

vc.data(); // <-- pointer to unsigned char array

answered Apr 18, 2017 at 6:53

Galik

49k5 gold badges85 silver badges126 bronze badges

Comments

Richard Hodges · Accepted Answer · 2017-04-18 07:18:10Z

Here's a complete solution including a test and a rudimentary parser (for simplicity, it assumes that the xml tags are on their own lines).

#include <string>
#include <sstream>
#include <regex>
#include <iostream>
#include <iomanip>
#include <iterator>

const char test_data[] =
R"__(<jpeg1>
0xFF,0xD8,0xFF,0xE0,0x00,0x10,0x4A,0x46,0x49,0x46,0x00,0x01,0x01,0x01,0x00,0x60,
0x12,0x34,0x56,0x78,0x9a,0xbc,0xde,0xf0
</jpeg1>)__";


struct Jpeg
{
    std::string name;
    std::vector<std::uint8_t> data;
};

std::ostream& operator<<(std::ostream& os, const Jpeg& j)
{
    os << j.name << " : ";
    const char* sep = " ";
    os << '[';
    for (auto b : j.data) {
        os << sep << std::hex << std::setfill('0') << std::setw(2) << std::uint32_t(b);
        sep = ", ";
    }
    return os << " ]";

}

template<class OutIter>
OutIter read_bytes(OutIter dest, std::istream& source)
{
    std::string buffer;
    while (std::getline(source, buffer, ','))
    {
        *dest++ = static_cast<std::uint8_t>(std::stoul(buffer, 0, 16));
    }
    return dest;
}

Jpeg read_jpeg(std::istream& is)
{
    auto result = Jpeg {};
    static const auto begin_tag = std::regex("<jpeg(.*)>");
    static const auto end_tag = std::regex("</jpeg(.*)>");
    std::string line, hex_buffer;
    if(not std::getline(is, line)) throw std::runtime_error("end of file");
    std::smatch match;
    if (not std::regex_match(line, match, begin_tag)) throw std::runtime_error("not a <jpeg_>");
    result.name = match[1];

    while (std::getline(is, line))
    {
        if (std::regex_match(line, match, end_tag)) { break; }
        std::istringstream hexes { line };
        read_bytes(std::back_inserter(result.data), hexes);
    }


    return result;
}

int main()
{
    std::istringstream input_stream(test_data);
    auto jpeg = read_jpeg(input_stream);

    std::cout << jpeg << std::endl;
}

expected output:

1 : [ ff, d8, ff, e0, 00, 10, 4a, 46, 49, 46, 00, 01, 01, 01, 00, 60, 12, 34, 56, 78, 9a, bc, de, f0 ]

Collectives™ on Stack Overflow

Convert vector<string> to unsigned char array in C++

4 Answers 4

5 Comments

6 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

6 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related