The use of machine and deep learning techniques for dealing with spatial data is progressively increasing as the amount of such kind of information consistently grows. At the same time, the quality of the obtained results strictly depends on the quality of the training data. In regression and classification tasks, the balancing of the training set with respect to both the characteristics of the input data and the ground truth values is essential to correctly capture all the eventualities and cases in the right way. However, as already pointed out in the literature, producing balanced training sets is not simple, even when they are synthetically generated. This demonstration presents a tool for producing balanced training sets for spatial operation estimation, which starts from the synthetic generation of spatial datasets resembling real-world situations, with respect to distribution and other spatial characteristics, and then apply spatial queries for obtaining a first collection on which balancing analysis and spatial augmentation techniques are applied to obtain a final balanced collection with respect to specific metrics. This tool is a step towards the generation of good-quality training sets for different spatial query optimization and evaluation models.
AIDA: A Spatial Data Augmentation Tool for Machine Learning Dataset Preparation
Migliorini, Sara;Belussi, Alberto
2025-01-01
Abstract
The use of machine and deep learning techniques for dealing with spatial data is progressively increasing as the amount of such kind of information consistently grows. At the same time, the quality of the obtained results strictly depends on the quality of the training data. In regression and classification tasks, the balancing of the training set with respect to both the characteristics of the input data and the ground truth values is essential to correctly capture all the eventualities and cases in the right way. However, as already pointed out in the literature, producing balanced training sets is not simple, even when they are synthetically generated. This demonstration presents a tool for producing balanced training sets for spatial operation estimation, which starts from the synthetic generation of spatial datasets resembling real-world situations, with respect to distribution and other spatial characteristics, and then apply spatial queries for obtaining a first collection on which balancing analysis and spatial augmentation techniques are applied to obtain a final balanced collection with respect to specific metrics. This tool is a step towards the generation of good-quality training sets for different spatial query optimization and evaluation models.| File | Dimensione | Formato | |
|---|---|---|---|
|
sigspatial_2025_demo.pdf
accesso aperto
Tipologia:
Versione dell'editore
Licenza:
Creative commons
Dimensione
547.98 kB
Formato
Adobe PDF
|
547.98 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



