This question may be a better fit for the comp sci SE or the math SE as it leans on some non-trivial theoretical CS / combinatorics. If you need truly unbiased behavior, you'll probably need to ask over there. That being said, I suspect that may not be what you want though. for instance, as stated, your problem description would allow things like a 4x5 area divided into 20 paths of size 1, or a 4x5 area with 19 gap tiles.
The solution presented here does not provided that level of unbiased behavior. Here's the overview of a solution:
- Select your gap tiles.
- Build a spanning tree of the remaining area(s). Since the gaps are random & it's possible that they might partition the initial area into two disjoint regions.
- For each spanning tree, examine every intersection & modify the connectivity such that it connects at most two adjacent cells.
- Finally, examine each resulting path & consider splitting it as needed. As an extreme example, it's possible that your initial spanning tree has no branches (that is, it's a long serpentine path that fills the whole area); if that's undesirable, use some max length & subdivide the paths such that no region exceeds your threshold. Ensuring a minimum path length is not entirely possible with this approach; I suspect adding such a constraint results in a much harder problem that scales very poorly.
Here's an example, where * indicates a cell & . indicates a gap. Randomly pick some gaps:
* * * * *
* . * . *
* * * * *
* * * . *
Next, build a spanning tree. The only two ways to produce unbiased spanning trees (& by extension, unbiased mazes) are Wilson's algorithm or the Aldous-Broder algorithm (or a cross-over version that starts with one & switches to the other in an attempt to speed it up). If you don't need truly unbiased techniques, Kruskal's algorithm is pretty good & much faster. Jamis Buck's Mazes for Programmers is a great resource that covers the biases of numerous maze algorithms.
To continue, here's a possible spanning tree with the intersections labelled for later use:
*--* *--*--*
| |
* . * . *
| | |
*--A--B--*--C
| | |
*--* * . *
Next break the intersections. There are a couple of different ways to do this, each with some tradeoffs. Probably the least biased way is to first select the count (1-3 for a 3-way intersection, 2-4 for a 4-way) & then select the directions. Obviously breaking N connections on an N-way intersection will orphan the connecting node to a path of size 1. Alternatively, you could lower the max, thus ensuring that the connecting node attaches to at least one other node. This can still orphan other nodes, but that can sometimes be fixed as a post processing step if desired.
Starting with A, there are connections going Left Right & Down. Randomly select how many to break (1 or 2) & then randomly select that many. Let's say we break L & D, the result is now:
*--* *--*--*
| |
* . * . *
| | |
* *--B--*--C
| |
*--* * . *
At B, I'll break U & D. At C I'll break L, resulting in:
*--* W--*--*
| |
* . Y . *
| |
* *--*--* *
|
*--X Z . *
Note that Y & Z are orphaned. In this case, it's possible to join them to W & X respectively, but in general you cannot guarantee you'll have such options. You could attempt to find a different spanning tree, but that may not help as it's possible that some nodes may start as orphans:
* . * * *
. . * . *
* * * * *
Ultimately, it depends on what you're making. For most games, any reasonably balanced (even if subtly biased) solution is probably going to be good enough. If not, request migration to the comp sci SE or the math SE.