CATALOGO DEI PRODOTTI DELLA RICERCA

In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work. Instead of optimizing jobs independently, multi-query optimization techniques can be employed to save a considerable amount of cluster resources. In this work, we introduce a novel method combining in-memory cache primitives and multi-query optimization, to improve the efficiency of data-intensive, scalable computing frameworks. By careful selection and exploitation of common (sub) expressions, while satisfying memory constraints, our method transforms a batch of queries into a new, more efficient one which avoids unnecessary recomputations. To find feasible and efficient execution plans, our method uses a cost-based optimization formulation akin to the multiple-choice knapsack problem. Experiments on a prototype implementation of our system show significant benefits of worksharing for TPC-DS workloads.

In-memory caching for multi-query optimization of data-intensive scalable computing workloads

Michiardi Pietro;Carra Damiano;Migliorini Sara

2019-01-01

Abstract

In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work. Instead of optimizing jobs independently, multi-query optimization techniques can be employed to save a considerable amount of cluster resources. In this work, we introduce a novel method combining in-memory cache primitives and multi-query optimization, to improve the efficiency of data-intensive, scalable computing frameworks. By careful selection and exploitation of common (sub) expressions, while satisfying memory constraints, our method transforms a batch of queries into a new, more efficient one which avoids unnecessary recomputations. To find feasible and efficient execution plans, our method uses a cost-based optimization formulation akin to the multiple-choice knapsack problem. Experiments on a prototype implementation of our system show significant benefits of worksharing for TPC-DS workloads.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Parole Chiave
	
				Combinatorial optimization
Cost-based optimization
Execution plans
Large-scale distributed system
Memory constraints
Multiple choice knapsack problem
Multiquery optimization
Scalable computing
Cache memory
			
	Appare nelle tipologie:
	
				04.01 Contributo in atti di convegno

File in questo prodotto:

File	Dimensione	Formato
DARLIAP_2.pdf accesso aperto Tipologia: Versione dell'editore Licenza: Creative commons Dimensione 1.28 MB Formato Adobe PDF Visualizza/Apri	1.28 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/994224

Citazioni

ND

6

ND

social impact