Haskell data structure with optimized access methods

Question

I want to create data types for accessing and modifying a simple music library, consisting of albums consisting of tracks. For a basic idea, consider the following:

data MusicCollection = MC { albums :: Seq Album }
data Album = Album { albumTitle :: String, tracks :: Seq Track }
data Track = Track { trackTitle :: String, tempo  :: Tempo }

data Tempo = Unknown | BPM Int

In addition to tempo, there may be other attributes like style or rating.

The above solution gives me fast access to random albums. Additionally, I'd like to have fast access to all tracks faster than a specified tempo. And again, it would be nice to have fast random access to the returned tracks:

fasterThan :: Int -> MusicCollection -> SomeRandomAccessCollection Track

Updating a track in the collection shouldn't be too slow as well.

Question:

I don't know if adding a Map Tempo (Seq Track) to the MusicCollection is best or if it's possible to immitate relational data bases somehow. Perhaps there are completely different solutions?

I currently don't want to use a data base, but it would be interesting to know criteria when to use them in desktop applications.

This is an interesting question, but it leaves me with many questions about your choices. Why a Seq for the collection? Is order important? Why do you need random access to the result of fasterThan? Should the result be sorted by increasing tempo, or preserve the order of the collection, or does it even matter? Are you willing to take up more memory in order to provide faster access to these queries? — Dan Burton
– Dan Burton, Commented Dec 20, 2011 at 20:33
The order is not important. "Random" access is to be taken literally here: I want to choose random values. When I was looking at Set, I didn't see how to achieve it with that and lists obviously require O(n) for this. In the meantime, I decided to give HDBC a try. After ehird's answer, I experimented with IxSet which seems nice for my original problem. I still have to learn how to design a database application in a rather pure way and so that it can easily be tested. — Duschvorhang
– Duschvorhang, Commented Dec 21, 2011 at 17:30
you may be interested in reviewing answers to my old question about selecting a random element in a Set. I never did get a very satisfactory answer to that; you'd think a Set of all things would be exemplary in O(1) random selection. Alas, for the current implementation, it is not so. — Dan Burton
– Dan Burton, Commented Dec 21, 2011 at 21:49

dflemstr · Accepted Answer · 2011-12-20 19:57:47Z

6

If you don't want to use databases, you pretty much have to make a Map for every relation that you want to have fast access to, as you described yourself.

Note that Data.Map.Map in the containers package uses ordered keys, so you can for example use Data.Map.split to get reasonably fast partitions of a map, which is useful to find "tracks faster than a specified tempo" (provided that you derive Ord for Tempo). Consequently, if you don't need ordered keys for a relation, you should use Data.HashMap.HashMap from unordered-containers instead, whose implementation is faster.

It is also possible to use in-memory relational databases with no additional dependencies. You can for example use persistent and Sqlite as the backend. How to do this is described in the persistent tutorial.

edited Dec 20, 2011 at 19:57

answered Dec 20, 2011 at 19:45

dflemstr

26.3k5 gold badges74 silver badges107 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ehird · Accepted Answer · 2011-12-20 19:45:53Z

3

Yes, Map Tempo (Seq Track) is a good choice here, especially since its Ord-based structure lets you make queries like "all tracks with tempo greater than n" efficiently; cf. Data.Map.split.

This basically is imitating relational databases by using an index.

You may also be interested in IxSet and HiggsSet (an intended successor to IxSet by a different author), which are intended to augment a set structure with various indexes, especially in tandem with a transactional serialisation system like acid-state. They support convenient comparison queries like greater than, less than, etc. If you only have one or two indices, though, it's probably best just to roll your own.

answered Dec 20, 2011 at 19:45

ehird

40.9k3 gold badges184 silver badges184 bronze badges

7 Comments

Duschvorhang Over a year ago

Thanks! What I meant with "imitating a relational database" was that MusicCollection might have the Songs directly within it and Albums and Songs might reference each other with keys. Would one ever do something like this in Haskell?

ehird Over a year ago

Well, you could, of course. But Haskell supports cyclic structures just fine thanks to laziness, and sharing ensures that they'll point to each other in memory. So unless you need the unique key for some other purpose, there's no real reason you'd ever do something like that.

Dan Burton Over a year ago

Keying the map by tempo seems odd, given the somewhat granular differentiation of the beats-per-minute measurement, although for the desired operations, I suppose it is probably quite fast.

ehird Over a year ago

@DanBurton: Well, if the operation to implement is fasterThan, then a Map keyed on the tempo is pretty optimal :)

Duschvorhang Over a year ago

@ehird I have another question: For simplicity, consider the original implementation. If I want to update a Track in a MusicCollection, I can do it fast if I still know the Album it came from. But without the album, it would be much slower. Is this a typical issue? Will I always have to keep the Album around?

|

Luis Casillas · Accepted Answer · 2011-12-20 23:37:26Z

3

It's good that you mention databases, because relational databases are designed precisely to solve problems like these. The classic example from database theory is the Invoice/Items problem, which your problem here mirrors almost exactly: an invoice has individual items, yet sometimes we want to write reports that consider items without connecting them to their invoices. A database or data structure that forces us to go through the invoices to get at the items is making us do too much work in these cases.

So one of your alternatives here would just be to use a relational database (there are lightweight embedded ones like SQLite that might be the most appropriate). Another alternative is to make a more relational-like design, and model the problem as a handful of relations that can be joined with each other in multiple ways—possibly assisted by an index that speeds up a join.

So you might start with something like this, where the tracks are available at the top level of the collection (and thus it's easy to just search through the tracks):

data MusicCollection = MC { tracks :: Set Track
                          , albums :: Set Album
                          , albumToTracks :: Album -> [Track] }
data Track = Track { album :: Album, trackTitle :: String, tempo  :: Maybe Int }
                 deriving (Eq, Ord)
data Album = Album { albumTitle :: String }
                 deriving (Eq, Ord)

When you constructed the MusicCollection value, you'd have to make sure that you construct the albumToTracks function that allows you to quickly get an album's tracks. (Or you could use a Map Album [Track] instead of a function; the function is a bit more general.)

Another alternative I haven't worked out, but that may very well be worth looking into: a circular data structure where the tracks and the albums both refer to each other. You'd have data declarations like this:

data MusicCollection = MC { tracks :: Set Track, albums :: Set Album }
data Track = Track { album :: Album, trackTitle :: String, tempo  :: Maybe Int }
                 deriving (Eq, Ord)
data Album = Album { albumTitle :: String, tracks :: Set Track }
                 deriving (Eq, Ord)

The question here is, of course, whether you can "tie the knot" when you're constructing this data structure so that it can be finitely represented. If you can pull it off, well, this would be great...

answered Dec 20, 2011 at 23:37

Luis Casillas

30.3k7 gold badges51 silver badges103 bronze badges

3 Comments

Duschvorhang Over a year ago

I have not yet understood tying the knot. But when looking at it, I had the impression that updates might become expensive with it. How would I update the function albumToTracks :: Album -> [Track]?

ehird Over a year ago

The albumToTracks, function may be a bit more general, but it will be a lot slower to use if you need to update it.

Luis Casillas Over a year ago

Well, there's a reason I singled out the albumToTracks function, because it's a major fork.

Collectives™ on Stack Overflow

Haskell data structure with optimized access methods

Question:

3 Answers 3

Comments

7 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Question:

3 Answers 3

Comments

7 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related