Geospatial data comprise around 60% of all the publicly available data. One of the essential and most complex operations that brings together multiple geospatial datasets is the spatial join operation. Due to its complexity, there is a lot of partitioning techniques and parallel algorithms for the spatial join problem. This leads to a complex query optimization problem: which algorithm to use for a given pair of input datasets that we want to join? With the rise of machine learning, there is a promise in addressing this problem with the use of various learned models. However, one of the concerns is the lack of a standard and publicly available data to train and test on, as well as the lack of accessible baseline models. This resource paper helps the research community to solve this problem by providing synthetic and real datasets for spatial join, source code for constructing more datasets, and several baseline solutions that researchers can further extend and compare to.
Towards a Learned Cost Model for Distributed Spatial Join: Data, Code & Models
Belussi, Alberto;Migliorini, Sara;
2022-01-01
Abstract
Geospatial data comprise around 60% of all the publicly available data. One of the essential and most complex operations that brings together multiple geospatial datasets is the spatial join operation. Due to its complexity, there is a lot of partitioning techniques and parallel algorithms for the spatial join problem. This leads to a complex query optimization problem: which algorithm to use for a given pair of input datasets that we want to join? With the rise of machine learning, there is a promise in addressing this problem with the use of various learned models. However, one of the concerns is the lack of a standard and publicly available data to train and test on, as well as the lack of accessible baseline models. This resource paper helps the research community to solve this problem by providing synthetic and real datasets for spatial join, source code for constructing more datasets, and several baseline solutions that researchers can further extend and compare to.File | Dimensione | Formato | |
---|---|---|---|
CIKM22_SJML_Resource_Short_Paper.pdf
accesso aperto
Tipologia:
Versione dell'editore
Licenza:
Creative commons
Dimensione
2.62 MB
Formato
Adobe PDF
|
2.62 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.