By reading your quesiton it seems you can take benefit of quad trees, create a quad tree and run simulation for each segment on a diffrent processing unit. This will cause checking only happen to objects near to each other. but you'll need to sync you threads every cycle. Which means to transfer some of those boids from one processing group to another. in general every cycle will consist of 3 steps :
- Move all boids by one unit. (which can easily be processed using multiple threads)
- Assigning each boid to a group*. This means using an algorithm of O(n) you have to select which boids are most likely to make a collision. This can also be handled using multiple threads.
- In the end you have to check if two boids in a same group made collision.
*To create groups you can use the pattern below:

note that some boids may be a part of more than one group, but this pattern gives you a more accurate results. you may also create as many groups as you want using this pattern it's just a number you have to find for how many boids and the screen which screen size, what's is the best number of groups you need to create.
--edit--
thisthere is aanother idea about segmentation that came to my head just now (maik suggested that's what waswhich is described in the paper @LarsViklund sugested any how I think mine is not the same as his or at least I didn't get anything near this from that paper), this way there is far fewer double checkings and there is no need to increase/decrease number of threads between steps:

note that some areas are still a part of two groups. and width of area both group cover is exactly 2*maximum speed. In your case if boids move one pixel per simulation step you only need to share 2 pixel width area between each 2 group. and there is a small area which is a part of 4 groups. but in general this method is easier to implement and by far faster if implemented correctly. and by the way there is no reverse move this way, if some object can move it can move no more checking is required.