A Classification of Skew Effects in Parallel Database Systems

Märtens, Holger
A major performance barrier in parallel database systems (PDBS) are skew effects, characterized by an uneven distribution of data and/or workload across the system's resources. Despite numerous proposed load balancing strategies, this problem is far from solved, partly because there is no well-structured model of the different types of skew, their causes, consequences, and interdependencies, and the methods to combat them. This paper aims to help understand how to find the appropriate load balancing methods for different forms of skew. We present a classification of skew effects on the one hand and of load balancing approaches on the other, then match the two to find sensible pairs. This will allow us to state why some previous approaches are less successful than they should be and to propose some required capabilities of future algorithms. Our study is on a purely qualitative level and makes no architectural assumptions. Instead, we focus on the fundamental relationships of different types of skew, with each other and with the various load balancing techniques, to reach general conclusions independent of numerical parameters. We find that highly dynamic scheduling methods based on observed execution times are superior in both complexity and attainable load balance. We also suggest the tuning of database schemata as a new anti-skew measure.
Appeared / Erschienen in: 
Proceedings of Euro-Par 2001 Conference, Manchester, August 2001
Pubdate / Erscheinungsdatum: 
Promoter / Gefördert durch: 
Deutsche Forschungsgemeinschaft
Pages / Seitenanzahl: 
Notes / Bemerkungen: 
The conference proceedings containing this paper were published by Springer-Verlag in the <A HREF='http://www.springer.de/comp/lncs/index.html'>Lecture Notes in Computer Science</A> series. © Springer-Verlag
2001-20.pdf69.28 KB