Generic indexed storage for TUNGSTEN (by )

For example, say you've got a database of points on a map and you want to find all points within a rectangle. If you make B-Tree indices on the X and Y coordinates, then you can use a range query on the X coordinate based on the range of X covered by the rectangle, but the result is a list of points in, North, or South of the rectangle. To eliminate the Norths and Souths, you need to walk through that list checking them out. The Y index can't help you search an arbitrary subset of points, only all the points in the database.

This is useful for storing large images, both vector and bitmapped, by storing smaller bitmaps indexed on their X and Y coordinates, or storing each vector object indexed against the region it covers; most higher-dimensional indexes allow ranges as keys to handle the fact that higher-dimensional objects are often not just points.

However, there are other types of indexed data store, such as R-Trees and GiSTs, that do work for multi-dimensional keys, A B-Tree is a degenerated case, with a 1-dimensional key.

It's also useful in situations where normally one might use several indexes. Why index purchases on customer ID, product ID, and date when you can consider them as points in a three-dimensional customer/product/date space? You can still pick out a given customer ID by querying for records lying on a particular plane of constant customer ID, while also being able to query on all purchases of a given product on a day, or by a given customer on a day. Whereas, as described above, with three separate indices, you wouldn't be able to query on the intersections of ranges.

So why don't I use an R-Tree? Well, the problem is with the fact that I store a hierarchy in the index, by concatenating my keys. This works with one-dimensional indices, since concatenated keys still sort lexicographically into order; all the keys "within" a key sort into a single range. But this doesn't work with higher dimensions.

So, I thought to myself, how about a new kind of key? One which was composed of an ordered list of points, and each point had an arbitrary dimensionality since it consists of an ordered list of coordinates. Each point would, in fact, implicitly have an infinite number of dimensions, but for those dimensions beyond those specified, all the coordinates are taken to have an implicit value of the empty string.

In effect, this key would represent a hierarchy where each level of the hierarchy was a point in an N-dimensional space. In fact, you could have several different dimensionalities of points coexisting at each level in the hierarchy; points in N dimensions would, from the viewpoint of points in N+M dimensions, just appear to be all on a single hyperplane.

This would give us a way of generating keys that would let us nest complex structures arbitrarily. It could still work with standard one-dimensional keys, coexisting at different levels of the hierarchy with higher-dimensional keys, allowing arbitrary data structures to be nested. Given the key description, we can define a set of procedures to specialise a GiST to index them and run away laughing.

Pages: 1 2 3

No Comments

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales