The Bayesian Information Criterion (BIC) is a widely adopted method for audio segmentation, and has inspired a number of dominant algorithms for this applica- tion. At present, however, literature lacks in analytical and experimental studies on these algorithms. This paper tries to partially cover this gap. Typically, BIC is applied within a sliding variable-size analysis window where single changes in the nature of the audio are locally searched. Three different imple- mentations of the algorithm are described and compared: (i) the first keeps updated a pair of sums, that of input vectors and that of square input vectors, in order to save computations in estimating covariance matrices on partially shared data; (ii) the second implementation, recently proposed in literature, is based on the encoding of the input signal with cumulative statistics for an efficient estimation of covariance matrices; (iii) the third implementation consists of a novel approach, and is char- acterized by the encoding of the input stream with the cumulative pair of sums of the first approach. Furthermore, a dynamic programming algorithm is presented that, within the BIC model, finds a globally optimal segmentation of the input audio stream. All algorithms are analyzed in detail from the viewpoint of the computational cost, experimentally evaluated on proper tasks, and compared.
Evaluation of BIC-based algorithms for audio segmentation
RIZZI, ROMEO
2005-01-01
Abstract
The Bayesian Information Criterion (BIC) is a widely adopted method for audio segmentation, and has inspired a number of dominant algorithms for this applica- tion. At present, however, literature lacks in analytical and experimental studies on these algorithms. This paper tries to partially cover this gap. Typically, BIC is applied within a sliding variable-size analysis window where single changes in the nature of the audio are locally searched. Three different imple- mentations of the algorithm are described and compared: (i) the first keeps updated a pair of sums, that of input vectors and that of square input vectors, in order to save computations in estimating covariance matrices on partially shared data; (ii) the second implementation, recently proposed in literature, is based on the encoding of the input signal with cumulative statistics for an efficient estimation of covariance matrices; (iii) the third implementation consists of a novel approach, and is char- acterized by the encoding of the input stream with the cumulative pair of sums of the first approach. Furthermore, a dynamic programming algorithm is presented that, within the BIC model, finds a globally optimal segmentation of the input audio stream. All algorithms are analyzed in detail from the viewpoint of the computational cost, experimentally evaluated on proper tasks, and compared.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.