Ιδρυματικό Αποθετήριο
Πολυτεχνείο Κρήτης
EN  |  EL

Αναζήτηση

Πλοήγηση

Ο Χώρος μου

Sharing aggregate computation for distributed queries

Huebsch Ryan, Garofalakis Minos, Hellerstein, Joseph, 1952-, Stoica, Ion, 1965-

Πλήρης Εγγραφή


URI: http://purl.tuc.gr/dl/dias/8167266C-005C-4C48-A9EB-DC0302CB7517
Έτος 2007
Τύπος Πλήρης Δημοσίευση σε Συνέδριο
Άδεια Χρήσης
Λεπτομέρειες
Βιβλιογραφική Αναφορά R. Huebsch, M. Garofalakis, J. M. Hellerstein and I. Stoica, "Sharing aggregate computation for distributed queries", in ACM SIGMOD International Conference on Management of Data, 2007. doi: 10.1145/1247480.1247535 https://doi.org/10.1145/1247480.1247535
Εμφανίζεται στις Συλλογές

Περίληψη

An emerging challenge in modern distributed querying is to effi-ciently process multiple continuous aggregation queries simultaneously.Processing each query independently may be infeasible,so multi-query optimizations are critical for sharing work acrossqueries. The challenge is to identify overlapping computations thatmay not be obvious in the queries themselves.In this paper, we reveal new opportunities for sharing work in thecontext of distributed aggregation queries that vary in their selectionpredicates. We identify settings in which a large set of q suchqueries can be answered by executing k ≪ q different queries.The k queries are revealed by analyzing a boolean matrix capturingthe connection between data and the queries that they satisfy,in a manner akin to familiar techniques like Gaussian elimination.Indeed, we identify a class of linear aggregate functions (includingSUM, COUNT and AVERAGE), and show that the sharing potentialfor such queries can be optimally recovered using standard matrixdecompositions from computational linear algebra. For someother typical aggregation functions (including MIN and MAX) wefind that optimal sharing maps to the NP-hard set basis problem.However, for those scenarios, we present a family of heuristic algorithmsand demonstrate that they perform well for moderate-sizedmatrices. We also present a dynamic distributed system architectureto exploit sharing opportunities, and experimentally evaluatethe benefits of our techniques via a novel, flexible random workloadgenerator we develop for this setting.

Υπηρεσίες

Στατιστικά