Towards automatic acquisition of high-level 3D models from images

Toldo, Roberto

Nei tempi recenti abbiamo assistito a un crescente interesse nella modellazione automatica da immagine. Mentre gli studi recenti nell'ambito della ricostruzione tridimensionale si sono concentrata soprattutto sull'estrazione di rappresentazioni dense e accurate di oggetti catturati tramite foto o video, il sostenuto interesse verso software di modellazione accessibile è una forte riprova del grande bisogno di rappresentazioni astratte e compatte degli oggetti. In questa tesi, il problema dell'estrazione di modelli di alto livello a partire dalle immagini viene discusso in dettaglio. Nella prima parte viene introdotta una pipeline di "Structure from Motion". A partire dai risultati di tale pipeline, vengono studiati due differenti approcci per la generazione di modelli di alto livello. Nel primo approccio viene innanzitutto introdotto un nuovo algoritmo di stereo multivista per produrre una nuvola di punti densa e accurata. Successivamente viene presentato un sistema di ricerca e reperimento di mesh, basato su segmentazione e un algoritmo di tipo "Bag of Words". Nel secondo approccio, la nuvola di punti sparsa proveniente dalla pipeline di "Structure from Motion" viene descritta da piani e aree planari convesse. Le aree planari sono una rappresentazione compatta e intermedia della scena. Entrambe le parti della tesi mirano ad assottigliare il divario tra acquisizione e interpretazione di una scena, attraverso la definizione di rappresentazioni ad alto livello ottenute tramite strategie molto diverse tra loro.

In recent years there has been a surge of interest in automatic modeling from images. While the current state of the art in three-dimensional reconstruction has focused on the recovery of dense and accurate representations of objects imaged through pictures or video, the sustained interest in accessible modeling software is a strong evidence of an untapped general need for compact, abstract representations of objects. In this thesis, the problem of producing high level models starting from images is discussed in details. In the first part, an automatic uncalibrated Structure from Motion pipeline is presented. Starting from the output of the pipeline, two different approaches of generating high-level renditions are studied. The first approach employs a novel Multiple view Stereo algorithm to produce a dense and accurate point cloud. A retrieval system for meshes, based on segmentation and Bag of Words, is then introduced. In the latter approach, the sparse Structure from Motion point cloud is fitted by planes and planar patches. Planar patches are a compact, intermediate representation of the scene. Both branches of the thesis aim to narrow the gap between scene acquisition and interpretation, through the definition of high level renditions produced by very different strategies.