1

I have a large set (100 000) of binary strings (fixed length k) like this: "011100001111000010", "111011011110000100" etc. Some binary strings include leading zeros. I'd like to obtain a list L of length k such that a[i] = the number of binary strings having 1 on ith place. For example:

Input:

"1011"
"0111"
"0111"

Output:

[1,2,3,3]

Since the number of binary strings is very big (100000+) and k is around 100 using nested for loops seems to be very inefficient. What would be the most efficient (or at least more efficient) way to tackle this?

1 Answer 1

1

There can be no faster way than looping over every character at least once, since you have to look at every character to know which counters to increment for every string. The only case where this is not true would be if you had some a priori additional knowledge about characteristics about the strings (i.e., if they were sorted according to some ordering, etc.).

So you'd have to use 2 loops: One looping over all strings, and one inner loop looping over all characters inside the current string. Then just increment the i-th counter if the string has a 1 as the i-th character.

Edit: Note that the problem is embarrassingly parallel, so it is very easy to parallelise it using threading. Although it will not make it asymptotically faster, you can probably speed it up by the number of concurrent threads your CPU supports. Just note that efficient multithreaded programming is by no means simple for those unfamiliar with it.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your reply. I am aware that I need to somehow check all characters - however I'm not sure whether using 2 simple for loops is optimal. Perhaps it would be better to say convert all strings into numpy arrays and then simply add them all up.
@BGa Even if you did that, nothing would change about the fact that every character (or bit) of every string has to be accessed once. So you still have an asymptotic complexity of O(N·k), where N is the number of strings, and k the string length. Note that in complexity theory, O(n) is the same as O(1000n + 10000), as constant factors are simply ignored.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.