A foundation for spatio-textual-temporal cube analytics

Iqbal, Mohsin; Lissandrini, Matteo; Pedersen (Torben Bach),

doi:10.1016/j.is.2022.102009

Large amounts of spatial, textual, and temporal (STT) data are being produced daily. This is data containing an unstructured component (text), a spatial component (geographic position), and a time component (timestamp). Therefore, there is a need for a powerful and general way of analyzing STT data together. In this paper, we define and formalize the Spatio-Textual-Temporal Cube (STTCube) structure to enable combined effective and efficient analytical queries over STT data. Our novel data model over STT objects enables novel joint and integrated STT insights that are hard to obtain using existing methods. Furthermore, our proposed STTCube Incremental Maintenance (IMstt) method maintains the already constructed STTCube efficiently when new data arrives. Moreover, we introduce the new concept of STT measures with associated novel STT-OLAP operators. To allow for efficient large-scale analytics, we present a pre-aggregation framework for exact and approximate computation of STT measures. Our comprehensive experimental evaluation on a real-world Twitter dataset confirms that our proposed methods reduce query response time by 1–5 orders of magnitude compared to the No Materialization baseline and decrease storage cost between 97% and 99.9% compared to the Full Materialization baseline while adding only a negligible overhead in the STTCube construction time. Moreover, approximate computation achieves an accuracy between 90% and 100% while reducing query response time by 3–5 orders of magnitude compared to No Materialization and IMstt achieves an order of magnitude improvement in maintenance time compared to the baseline maintenance method.