Spatial data usually encapsulate semantic characterization of features which carry out important meaning and relations among objects, such as the containment between the extension of a region and of its constituent parts. The GeoUML methodology allows one to bring the gap between the definition of spatial integrity constraints at conceptual level and the realization of validation procedures. In particular, it automatically generates SQL validation queries starting from a conceptual specification and using predefined SQL templates. These queries can be used to check data contained into spatial relational databases, such as PostGIS. However, the quality requirements and the amount of available data are considerably growing making unfeasible the execution of these validation procedures. The use of the map-reduce paradigm can be effectively applied in such context since the same test can be performed in parallel on different data chunks and then partial results can be combined together to obtain the final set of violating objects. Pigeon is a data-flow language defined on top of Spatial Hadoop which provides spatial data types and functions. The aim of this paper is to explore the possibility to extend the GeoUML methodology by automatically producing Pigeon validation procedures starting from a set of predefined Pigeon macros. These scripts can be used in a map-reduce environment in order to make feasible the validation of large datasets.
Towards massive spatial data validation with SpatialHadoop
MIGLIORINI, Sara;BELUSSI, Alberto;
2016-01-01
Abstract
Spatial data usually encapsulate semantic characterization of features which carry out important meaning and relations among objects, such as the containment between the extension of a region and of its constituent parts. The GeoUML methodology allows one to bring the gap between the definition of spatial integrity constraints at conceptual level and the realization of validation procedures. In particular, it automatically generates SQL validation queries starting from a conceptual specification and using predefined SQL templates. These queries can be used to check data contained into spatial relational databases, such as PostGIS. However, the quality requirements and the amount of available data are considerably growing making unfeasible the execution of these validation procedures. The use of the map-reduce paradigm can be effectively applied in such context since the same test can be performed in parallel on different data chunks and then partial results can be combined together to obtain the final set of violating objects. Pigeon is a data-flow language defined on top of Spatial Hadoop which provides spatial data types and functions. The aim of this paper is to explore the possibility to extend the GeoUML methodology by automatically producing Pigeon validation procedures starting from a set of predefined Pigeon macros. These scripts can be used in a map-reduce environment in order to make feasible the validation of large datasets.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.