I’ve always been interested in how information is stored and retrieved. Inspired by a project of a friend (Levelgraph), I decided to research more about Hexastores and how they can be adapted to modern NoSQL databases such as LevelDB and Cassandra. Out of this fascination and curiosity I created Hexagon: a simple Python implementation of Hexastores backed by LevelDB.

The implementation of Hexagon is inspired by a talk of a friend and by this paper about sextuple indexing

Creating a new database and adding an entry

from hexagon import Hexagon as H
t = H(LevelDB('/tmp/friends/'))
t.insert(s='person:Daniel', p='likes', o='person:Francesco')

Will insert the following entries in LevelDB:

['ops::person::likes::person',
 'ops::person::likes::person:Daniel',
 'ops::person:Francesco::likes::person',
 'ops::person:Francesco::likes::person:Daniel',
 'osp::person::person::likes',
 'osp::person::person:Daniel::likes',
 'osp::person:Francesco::person::likes',
 'osp::person:Francesco::person:Daniel::likes',
 'pos::likes::person::person',
 'pos::likes::person::person:Daniel',
 'pos::likes::person:Francesco::person',
 'pos::likes::person:Francesco::person:Daniel',
 'pso::likes::person::person',
 'pso::likes::person::person:Francesco',
 'pso::likes::person:Daniel::person',
 'pso::likes::person:Daniel::person:Francesco',
 'sop::person::person::likes',
 'sop::person::person:Francesco::likes',
 'sop::person:Daniel::person::likes',
 'sop::person:Daniel::person:Francesco::likes',
 'spo::person::likes::person',
 'spo::person::likes::person:Francesco',
 'spo::person:Daniel::likes::person',
 'spo::person:Daniel::likes::person:Francesco']

How can this benefit our system in any way? LevelDB stores entries lexicographically sorted by keys and it provides very fast seeking capabilities. Let’s have a look at a query and see how we can take advantage of Hexastores. But first, lets insert some more data.

t.insert(s='person:Daniel', p='likes', o='person:Lorenzo')
t.insert(s='person:Daniel', p='likes', o='animal:Fido')

Now let’s make a simple query: give me everyone that Daniel likes

for triple in t.start(s='person:Daniel').traverse(p='likes'):
    print triple

('person:Daniel', 'likes', 'animal:Fido')
('person:Daniel', 'likes', 'person:Lorenzo')
('person:Daniel', 'likes', 'person:Francesco')

Hexagon generated the query pso::likes::person:Daniel that was used by LevelDB to seek immediately at that position in the map, and to iterate over every key matching this pattern. Precisely, Hexagon iterated only through 5 entries:

['pso::likes::person:Daniel::animal',
 'pso::likes::person:Daniel::animal:Fido',
 'pso::likes::person:Daniel::person',
 'pso::likes::person:Daniel::person:Francesco',
 'pso::likes::person:Daniel::person:Lorenzo']

What about another query? let’s find all the people that are animal lovers.

t.insert(s='person:Nicoletta', p='likes', o='animal:Rudy')

t_generator = t.start(o='animal').traverse(p='likes').traverse(s='person')
list(t_generator)

[('person:Nicoletta', 'likes', 'animal:Rudy'),
 ('person:Daniel', 'likes', 'animal:Fido')]

This will generate the seek key pso::likes::person::animal, that will iterate over only 3 entries. By the way, did I tell you that start() and traverse() are generators? you pay 0 cost for initialization, you can consume only what you need, and you can combine with all the other fantastic tools provider by itertools.