I need to implement a simple graph database engine, what are the things should I consider? First, I am confused between which data structure to use, I mean graph representation (like adjacency matrix or adjacency list) or the actual graph itself? I need this to be scalable. Later how do I store the graph in the hard disk as files? After I store the graph data in the form of files, I would also need a way to selectively load only certain files into the graph, since I can not load everything at once into the RAM. Sorry for being vague, but I need someone to point me in the right direction. Also please suggest the language I can use, can I use python for this project? Thank you.
-
Why don't you just use an existing graph database like neo?tddmonkey– tddmonkey2016-02-27 19:06:47 +00:00Commented Feb 27, 2016 at 19:06
-
1because it defeats the purpose of the project? I'm talking about creating something like neo4j, but a much simpler version, not using it...aditya sista– aditya sista2016-02-27 19:09:39 +00:00Commented Feb 27, 2016 at 19:09
-
I created such a database in Python already there's several implementations have a look at pypi.python.org/pypi/ajgu pypi.python.org/pypi/AjguDB and also this post hypermove.net/notes/do-it-yourself-a-graph-database-in-pythonamirouche– amirouche2016-02-28 10:14:39 +00:00Commented Feb 28, 2016 at 10:14
-
the basic is that you need to use key/value like leveldb (but it's slow) and build upon it the graph datastructure. You can go with a documentat store too, but usually they don't provide good ACID semantic.amirouche– amirouche2016-02-28 10:16:50 +00:00Commented Feb 28, 2016 at 10:16
-
1@amirouche: I did, my reputation is too low for that to appear publicly...aditya sista– aditya sista2016-05-17 09:26:48 +00:00Commented May 17, 2016 at 9:26
1 Answer
Depending on your needs you will implement different interface to the database ie. an adjacency matrix or the graph itself.
Instead of using a file based database, the important step forward you can take is use a key/value store like bsddb, leveldb or wiredtiger (prefered). This will deal with caching often accessed files, provide ACID semantic, and indices if you use wiredtiger.
The storage layer made upon the key/value store, can have several layout. It depends on the final interface you need.
To get started with developing custom databases using key/value stores I recommend you read questions answered about mostly leveldb and bsddb on SO.
Like the following: