search

UMD     This Site





ISR intellectual property available to license

Inventors: Nicholas Roussopoulos, John Sismanis, Antonios Deligiannakis, Ioannis Kotidis

U.S. Patent: 7,133,876

Description
University of Maryland researchers have developed and implemented a super-compressed data structure called Dwarf for storing and querying cubes of multidimensional data. Dwarf can "slice and dice" data cubes on every possible projection. For example, click-data from a web ISP server can be stored very efficiently in data slice that group and aggregate data by click destination, or by destination, source, by time (minute/hour/day/week/month/year, etc.). Such groupings are then used in "data mining" over an extremely large set of log data.

Dwarf automatically identifies prefix and suffix structural redundancies and factors them out by coalescing their store. Prefix redundancy is high on dense areas of cubes but suffix redundancy is significantly higher for sparse areas. Putting the two together fuses the exponential sizes of high dimensional full cubes into a dramatically condensed data structure. The elimination of suffix redundancy has an equally dramatic reduction in the computation of the cube because recomputation of the redundant suffixes is avoided. This effect is multiplied in the presence of correlation amongst attributes in the cube. A Petabyte 25-dimensional cube was shrunk this way to a 2.3GB Dwarf Cube, in less than 20 minutes, a 1:400000 storage reduction ratio. Still, Dwarf provides 100% precision on cube queries and is a self-sufficient structure which requires no access to the fact table. What makes Dwarf practical is the automatic discovery, in a single pass over the fact table, of the prefix and suffix redundancies without user involvement or knowledge of the value distributions.

The dwarf structure plays a double role as a storage and indexing mechanism for high dimension data. Roll-up and drill-down queries seem to benefit from the dwarf structure due to common paths that are exploited while caching. In terms of update speed, dwarf by far outperforms the closest competitor for storing the full data cube, while their performance is comparable when the competitor is reduced to storing only a partial cube of the same size as Dwarf. For the APB-1 Benchmark density 5, the fastest performance results are published by Oracle-Express and take 4.5 hours to generate a partial cube of about 30GB while the corresponding Dwarf cube is generated in 21 minutes and has a size of 4.3GB.

A number of Dwarf patents are pending.

For more information
If you would like to license this intellectual property, have questions, would like to contact the inventors, or need more information, contact ISR External Relations Director Jeff Coriale at coriale@umd.edu or 301.405.6604.

Find more ISR IP
You can go to our main IP search page to search by research category or faculty name. Or view the entire list of available IP on our complete IP listing page.

ISR-IP-Roussopoulos ISR-IP-multidimensional ISR-IP databases

June 19, 2007


«Previous Story  

 

 

For more information, contact ISR External Relations Director
Jeff Coriale at coriale@umd.edu or 301.405.6604.

Current Headlines

Meet the Clark Scholars Class of ’29

UMD Semiconductor Retreat Builds Strategic Momentum

UMD’s Team RoboScout Delivers Again

UMD - KETEP Research Collaboration Solidified

Tom Hedberg Named ASME Fellow for Engineering Leadership

Ph.D. Student Presents Neural Research at BMES 2025

Clean Energy critical for quantum/AI

Celebrating our Native and Indigenous Community

Future Engineers Tour Robotics Labs at Maryland

MRC Seminar Series Starts with Jellyfish-Inspired Robotics

 
 
Back to top  
Home Clark School Home UMD Home