Efficient connected components from a 2D array

Question

In an m x n matrix consisting of 0s and 1s, we are tasked with counting the 4-directionally connected islands of 1s. A python implementation is as follows:

    def numIslands(grid):
        def sink(i, j):
            if 0 <= i < len(grid) and 0 <= j < len(grid[i]) and grid[i][j] == '1':
                grid[i][j] = '0'
                list(map(sink, (i+1, i-1, i, i), (j, j, j+1, j-1)))
                return 1
            return 0
        return sum(sink(i, j) for i in range(len(grid)) for j in range(len(grid[i])))

I have tried and failed to solve this problem in Haskell under the following constraints:

O(mn) time complexity
using only pure, total functions
in ~48 tokens or fewer (the size of the above according to wc -w)

Things I've tried:

custom DFS function returning a forest
scc from Data.Graph
dfsForest from algebraic-graphs

The issue in all cases is the plumbing code to convert the matrix into any nice graph representation where we can appeal to a library function is always too long.

Here you can see a working implementation using Algebra.Graph.AdjacencyMap that does a lot of redundant work:

data Coord = C Int Int deriving (Eq, Ord, Show)
data Cell a = Cell Coord a deriving (Eq, Ord)
type A = Cell Char

coord :: Cell a -> Coord
coord (Cell c _) = c

mkcell :: Int -> Int -> a -> Cell a
mkcell i j = Cell (C i j)

cells :: [[a]] -> [[Cell a]]
cells = zipWith (map . uncurry . mkcell) [0..] . map (zip [0..])

is1 :: A -> Bool
is1 (Cell _ '1') = True
is1 (Cell _  _ ) = False

cardinals :: Coord -> [Coord]
cardinals c = map ($ c) [up, down, left, right]
  where
    up    (C i j) = C (i+1) j
    down  (C i j) = C (i-1) j
    left  (C i j) = C i (j-1)
    right (C i j) = C i (j+1)

coordMap :: [[a]] -> Map Coord [a]
coordMap board = M.fromAscList [(C i j, [z]) | (i,zs) <- zip [0..] board, (j,z) <- zip [0..] zs]

mkstars :: [[A]] -> [(A, [A])]
mkstars xss = map (id &&& nbs) $ ones =<< xss
  where
    ones  = filter is1
    value = flip (M.findWithDefault []) (coordMap xss)
    nbs   = ones . value <=< cardinals . coord

islands :: [[Char]] -> Int
islands = length . dfsForest . stars . mkstars . cells

And some simple test cases:

main :: IO ()
main = do
  hspecWith defaultConfig {configFailFast = True} $ do
    describe "islands" $ do

      it "handles test case 0" $ do
        islands [ ['1', '1', '1', '1', '0']
                , ['1', '1', '0', '1', '0']
                , ['1', '1', '0', '0', '0']
                , ['0', '0', '0', '0', '0'] ] `shouldBe` 1

      it "handles test case 1" $ do
        islands [ ['1', '1', '0', '0', '0']
                , ['1', '1', '0', '0', '0']
                , ['0', '0', '1', '0', '0']
                , ['0', '0', '0', '1', '1'] ] `shouldBe` 3

      it "handles test case 2" $ do
        islands [['1']] `shouldBe` 1

Notice that we perform a pair of zip [0..]-like operations in two places. I think this ought to be unnecessary. This clocks in at 220 tokens according to vim, and that doesn't even include the dfsForest logic or the stuff from Data.Map.Strict! It feels ridiculous that programming in Haskell should be so verbose and complex, so I'm hoping it's just a skill issue.

Can anyone point me in the right direction? I'd also be happy with an answer of the form, "this is not possible but [here] is the best we can do."

If it helps, ultimately, my goal is to come up with a standard method for very concisely marshalling 2D array problems into a format compatible with the graph algorithms implemented in either containers, algebraic-graphs, or fgl. Although right now the stuff in algebraic-graphs is looking the nicest to me.

I think the main issue is that Haskell's lists are not arrays, so first converting the problem into a 2D Array solves probably the problem. So while the syntax [1,4,2,5] works in both Python and Haskell, it generates two different types of data structures. — willeM_ Van Onsem
– willeM_ Van Onsem, Commented Dec 13, 2024 at 11:53
Could you elaborate on the approach you're suggesting? A lot of the bread-and-butter functions from Data.Array are partial, which is why I tend to avoid it. — Brendan Langfield
– Brendan Langfield, Commented Dec 13, 2024 at 14:37
The Python implementation you claim is good doesn't meet the spec you give! — Daniel Wagner
– Daniel Wagner, Commented Dec 13, 2024 at 22:29
It certainly does not! And I do not claim it is good. Merely short. — Brendan Langfield
– Brendan Langfield, Commented Dec 14, 2024 at 2:03
I felt bad that my answer might be too specific to the exact problem being solved in this question. But with a little thought, I realized that essentially the same technique can be used with more traditional/general-purpose graph libraries as well, so I've updated it with an example showing how to use Data.Graph+scc, one of the approaches you say didn't work for you, without blowing up the code size significantly. Thought you might want a ping about it. — Daniel Wagner
– Daniel Wagner, Commented Dec 14, 2024 at 23:39

Daniel Wagner · Accepted Answer · 2024-12-14 23:43:07Z

Here's a fairly concise implementation:

{-# Language BlockArguments #-}

import Data.Partition
import Data.Vector (Vector)
import qualified Data.Set as S
import qualified Data.Vector as V

islands :: Vector (Vector Bool) -> Int
islands = length . nontrivialSets . fromSets . V.toList . V.concat . V.toList . \grid ->
    flip V.imap grid \i row ->
    flip V.imap row \j val -> S.fromList
    [ (i',j',lmao)
    | val
    , (i',j') <- [(i,j), (i-1,j), (i+1,j), (i,j-1), (i,j+1)]
    , or (grid V.!? i' >>= (V.!? j'))
    , lmao <- "01"
    ]

This uses data-partition and incurs a logarithmic overhead, as is common when translating mutable algorithms to immutable analogs naively, so runtime is O(mn log(mn)).^* The lmao variable doubles the size of each island; nontrivialSets only reports sets with size strictly greater than one, and we want to report on islands with just one node in them.

See it run in ghci:

> islands . fmap (fmap ('1'==)) . read $ "[\"11000\", \"11000\", \"00100\", \"00011\"]"
3

A similar technique can be done with containers if you prefer.

{-# Language BlockArguments #-}
import Data.Graph
import Data.Vector (Vector)
import qualified Data.Vector as V

-- I hate myself for this name. I also love myself for this name
ifordl' :: Vector a -> b -> (b -> Int -> a -> b) -> b
ifordl' v z f = V.ifoldl' f z v

islands :: Vector (Vector Bool) -> Int
islands = length . scc . (\(g, _, _) -> g) . graphFromEdges . \grid ->
    ifordl' grid [] \nodes i row ->
    ifordl' row nodes \nodes' j val ->
    [((), (i, j), filter (\(i',j') -> or (grid V.!? i >>= (V.!? j)))
                         [(i+1,j), (i-1,j), (i,j+1), (i,j-1)])
    |val] ++ nodes'

As with the previous approach, this is O(mn log(mn)).

^* It is well-known that any mutable algorithm can be made pure with an O(log n) multiplicative penalty. I believe it is an open question whether there are any problems which provably require that penalty, so it's likely that a cleverer theoretician than I could improve from here!

Brendan Langfield · Accepted Answer · 2025-02-04 01:35:21Z

Adding a slightly simplified version of my original attempt for posterity.

import Control.Arrow hiding (loop, left, right)

import Data.Maybe
import Data.Map.Strict (Map, (!?))

import Algebra.Graph.AdjacencyMap hiding (transpose, tree)
import Algebra.Graph.AdjacencyMap.Algorithm

import qualified Data.Map.Strict as M

type Coord = (Int, Int)
data Cell a = Cell { coord :: Coord, val :: a } deriving (Eq, Ord, Show)

cells :: [[a]] -> [Cell a]
cells xss = [Cell (i,j) x | (i, xs) <- zip [0..] xss, (j, x) <- zip [0..] xs]

mkstars :: (a -> Bool) -> [Cell a] -> [(Cell a, [Cell a])]
mkstars p cs = [(c, filter nice (nbs c)) | c <- cs, nice c]
  where
    nbs  = mapMaybe (vals !?) . cardinals . coord
    vals = M.fromAscList [(coord c, c) | c <- cs]
    nice = p . val
    cardinals (i,j) = [(i,j+1),(i,j-1),(i+1,j),(i-1,j)]

islands :: [[Char]] -> Int
islands = length . dfsForest . stars . mkstars (== '1') . cells

This is a fair bit shorter at ~137 tokens with type signatures, ~108 without. Note that you can see the bones of a general approach to the flavor of matrix search problems that involve some kind of aggregation (rather than returning the entire matrix modified).

We have some predicate p which denotes which entries of the matrix are actually part of our graph, and the nbs function could be factored out into something that modifies state in the Cell type, so that we could do things like store a running path or the tail of a word we're search for.

However, it may be the case that it can be expressed even more simply using the Data.Vector API, since it gives us nice things like imap and ifoldl'. (Thanks Daniel Wagner for introducing these to me!)

Collectives™ on Stack Overflow

Efficient connected components from a 2D array

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related