Data Retrieval Performance Improvement
Today's scientific applications are very data intensive, and their data management requirements can no longer be met by special-purpose libraries for particular scientific data formats or by traditional database management systems. This project aims at creating a data-format-independent, loosely-coupled, application-tailorable set of libraries that provides a holistic data management framework for scientists. A scientific programmer can write applications that access scientific data in its native format, while using Maitri's APIs
for buffering, indexing, metadata management, and concurrency control, plus a small amount of format-specific code written by the programmer.
The main concentration of the work has so far been on the Buffer Management system GODIVA, and a bitmap based indexing system for high dimensional, high cardinality data. GODIVA is a framework that defines a buffer management API and allows the user to provide hints about the buffer management strategies, while doing the buffering in a background thread. The indexing scheme developed is an extension of bitmap indexes. Traditional bitmap indexes, shown to work well in high dimensionality, do not work that well with high cardinality attributes. Our extensions provide a way to improve the performance of bitmap indexes over high cardinality attributes as well as allow for parallel manipulation of the indexes.
The main aim of the Maitri system is to improve the performance of data retrieval for scientists during the visualization/analysis phase while still allowing them to store data in the format of their choice.