1

The question description is relatively simple, an example is given

input: 10100011
output: 110

I have tried using BFS but I don't think this is an efficient enough solution (maybe some sort of bitmap + sliding window solution?)

string IntToString(int a)
{
    ostringstream temp;
    temp << a;
    return temp.str();
}

bool is_subsequence(string& s, string& sub) {
    if(sub.length() > s.length()) return false;
    int pos = 0;
    for(char c : sub)
    {
        pos = s.find(c, pos);
        if(pos == string::npos) return false;
        ++pos;
    }
    return true;
}

string shortestNotSubsequence(string& s) {
    Queue q(16777216);
    q.push(0);
    q.push(1);
    while(!q.empty())
    {
        string str;
        int num = q.front; q.pop();
        str = IntToString(num);
        if(!is_subsequence(s, str)) return str;
        string z = str + '0';
        string o = str + '1';
        q.push(stoi(str+'0'));
        q.push(stoi(str+'1'));
    }
    return "";
}

int main() {
    string N;
    cin >> N;
    cout << shortestNotSubsequence(N) << endl;
    return 0;
}

1 Answer 1

2

You can do this pretty easily in O(N) time.

Let W = ceiling(log2(N+1)), where N is the length of the input string S.

There are 2W possible strings of length W. S must have less than N of them as substrings, and that's less than 2W, so at least one string of length W must not be present in S.

W is also less than the number of bits in a size_t, and it only takes O(N) space to store a mask of all possible strings of length W. Initialize such a mask to 0s, and then iterate through S using the lowest W bits in a size_t as a sliding window of the substrings you encounter. Set the mask bit for each substring you encounter to 1.

When you're done, scan the mask to find the first 0, and that will be a string of length W that's missing.

There may also be shorter missing strings, though, so merge the mask bits in pairs to make a mask for the strings of length W-1, and then also set the mask bit for the last W-1 bits in S, since those might not be included in any W-length string. Then scan the mask for 0s to see if you can find a shorter missing string.

As long as you keep finding shorter strings, keep merging the mask for smaller strings until you get to length 1. Since each such operation divides the mask size in 2, that doesn't affect the overall O(N) time for the whole algorithm.

Here's an implementation in C++

#include <string>
#include <vector>
#include <algorithm>

std::string shortestMissingBinaryString(const std::string instr) {
    const size_t len = instr.size();
    if (len < 2) {
        if (!len || instr[0] != '0') {
            return std::string("0");
        }
        return std::string("1");
    }
    // Find a string size guaranteed to be missing
    size_t W_mask = 0x3;
    unsigned W = 2;
    while(W_mask < len) {
        W_mask |= W_mask<<1;
        W+=1;
    }

    // Make a mask of all the W-length substrings that are present
    std::vector<bool> mask(W_mask+1, false);
    size_t lastSubstr=0;
    for (size_t i=0; i<len; ++i) {
        lastSubstr = (lastSubstr<<1) & W_mask;
        if (instr[i] != '0') {
            lastSubstr |= 1;
        }
        if (i+1 >= W) {
            mask[lastSubstr] = true;
        }
    }

    //Find missing substring of length W
    size_t found = std::find(mask.begin(), mask.end(), false) - mask.begin();

    // try to find a shorter missing substring
    while(W > 1) {
        unsigned testW = W - 1;
        W_mask >>= 1;
        // calculate masks for length testW 
        for (size_t i=0; i<=W_mask; i++) {
            mask[i] = mask[i*2] || mask[i*2+1];
        }
        mask.resize(W_mask+1);
        // don't forget the missing substring at the end
        mask[lastSubstr & W_mask] = true;

        size_t newFound = std::find(mask.begin(), mask.end(), false) - mask.begin();
        if (newFound > W_mask) {
            // no shorter string
            break;
        }
        W = testW;
        found = newFound;
    }

    // build the output string
    std::string ret;
    for (size_t bit = ((size_t)1) << (W-1); bit; bit>>=1) {
        ret.push_back((found & bit) ? '1': '0');
    }
    return ret;
}
Sign up to request clarification or add additional context in comments.

6 Comments

Hi, thank you for your answer. I am slightly confused regarding what size_t means.
In C++, size_t is an unsigned integer type that's big enough to hold the size of a string or vector. en.cppreference.com/w/cpp/types/size_t
To clarify a bit for an implementation: so the recursive process starts with size of window = len(input), and we create an array where each entry corresponds to all sequences where len (strictly?) is n. So like 000, 001, etc for n = 3. And we go mark all such sequences that exist by calculating the corresponding position (out of all binary sequences) while we move the window across the input and flip 0 to 1 accordingly. A scan will show us the minimum missing sequence of that length. How do we merge to find smaller sequences in practice? I'm not sure I got that step.
Start with window size log(len(input)), not len(input), but otherwise yes. Merging sequences is easy: if you have 010 OR you have 011 then you have 01. You just OR bits in the mask together in pairs.
Allow me to rephrase. I see that 01 is bitwise or with either the first or last digit removed. I understand the operation but I don't understand the justification. Isn't the set of all w-1 subsequences all w subsequences with their first digit removed + all w subsequences with their last digit removed?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.