non disponibile

Transforms are mathematical tools that allow to switch from a representation in a domain to a representation in another one. Different domains reveal different properties of the signals. In the audio analysis field frequency domain is a very useful domain. When time domain is not sufficient the Fourier transform can be used to switch to frequency domain. When we need information that carries data related to time and frequency simultaneously we can use hybrid approaches like windowed transformations (i.e. the Short Term Fourier Transform) or bidimensional transforms like Wigner or Choi-Williams. Of course beside time and frequency other kind of domains can be taken into account. In this work a domain called scale is considered. The scale is a domain linked to the physical concept of scale. Scale can be defined as a physical property of a signal. The scale concept as a physical quantity has been introduced in quantum optics and mathematically described as an operator defined using frequency and time operators. From that Cohen developed, using the operator method, the scale transform, the mathematical tool that allows to switch to the scale domain. In this work this and other methods to define the scale transform are quickly introduced. The scale transform is strongly linked to the Mellin transform, a transform used for mathematical purposes (i.e. to find solution of partial differential equations). But these transforms present differences. There are properties and theorems valid for the scale that are not valid for the Mellin. In this work we present our studies on a family of transforms (called by us -Mellin) that are restrictions of the Mellin transform and a generalization of the scale transform in which the properties and theorems valid for the scale transform are still valid for the entire family. Generalized definitions, theorems and proofs are given in this work. Of particular importance is the exponential sampling theorem and the relation that the scale transform has with the Fourier transform. That relation led us to the idea of a fast algorithm (fmt) to compute the -Mellin transform (scale transform belongs to the -Mellin family). This algorithm allowed us to perform experiments in a very quick way and with large amount of data. The algorithm is explained in detail and its complexity, quality and efficiency have been analyzed and reported in this work. Since the most important part of the algorithm is based on exponential resampling, interpolation schemes have been deeply analyzed in order to obtain a good tradeoff between accuracy and computation speed. The fmt has been continuously improved during this work and every experiment made in our studies has been performed thanks to that algorithm. The experiments were various. The first one was about filtering in scale domain. The results allowed us to better understand the concept of scale, and to analyze the implications of that kind of filtering from the time-frequency point of view. The next step was to build digital audio effects based on the scale filtering and in particular an effect called Mellin pizzicator. This application was able to emulate the effect of a plucked string, reproducing the typical fade in amplitude and in frequency of this kind of sounds on an arbitrary audio sample. Another experiment was about vowel recognition. Using the scale transform on the Fourier spectrum envelope of a vowel pronounced by someone, the algorithm proposed was able do make a vowel tract normalization allowing to understand which was the pronounced vowel. Other experiments involved the scale phase. For example, for simple sounds events, it is possible to modify the perceived size of the event. Water drop sounds have been modified in that way and the results are very convincing.

A computational framework for sound analysis with the Mellin and Scale transform

DE SENA, Antonio
2008-01-01

Abstract

Transforms are mathematical tools that allow to switch from a representation in a domain to a representation in another one. Different domains reveal different properties of the signals. In the audio analysis field frequency domain is a very useful domain. When time domain is not sufficient the Fourier transform can be used to switch to frequency domain. When we need information that carries data related to time and frequency simultaneously we can use hybrid approaches like windowed transformations (i.e. the Short Term Fourier Transform) or bidimensional transforms like Wigner or Choi-Williams. Of course beside time and frequency other kind of domains can be taken into account. In this work a domain called scale is considered. The scale is a domain linked to the physical concept of scale. Scale can be defined as a physical property of a signal. The scale concept as a physical quantity has been introduced in quantum optics and mathematically described as an operator defined using frequency and time operators. From that Cohen developed, using the operator method, the scale transform, the mathematical tool that allows to switch to the scale domain. In this work this and other methods to define the scale transform are quickly introduced. The scale transform is strongly linked to the Mellin transform, a transform used for mathematical purposes (i.e. to find solution of partial differential equations). But these transforms present differences. There are properties and theorems valid for the scale that are not valid for the Mellin. In this work we present our studies on a family of transforms (called by us -Mellin) that are restrictions of the Mellin transform and a generalization of the scale transform in which the properties and theorems valid for the scale transform are still valid for the entire family. Generalized definitions, theorems and proofs are given in this work. Of particular importance is the exponential sampling theorem and the relation that the scale transform has with the Fourier transform. That relation led us to the idea of a fast algorithm (fmt) to compute the -Mellin transform (scale transform belongs to the -Mellin family). This algorithm allowed us to perform experiments in a very quick way and with large amount of data. The algorithm is explained in detail and its complexity, quality and efficiency have been analyzed and reported in this work. Since the most important part of the algorithm is based on exponential resampling, interpolation schemes have been deeply analyzed in order to obtain a good tradeoff between accuracy and computation speed. The fmt has been continuously improved during this work and every experiment made in our studies has been performed thanks to that algorithm. The experiments were various. The first one was about filtering in scale domain. The results allowed us to better understand the concept of scale, and to analyze the implications of that kind of filtering from the time-frequency point of view. The next step was to build digital audio effects based on the scale filtering and in particular an effect called Mellin pizzicator. This application was able to emulate the effect of a plucked string, reproducing the typical fade in amplitude and in frequency of this kind of sounds on an arbitrary audio sample. Another experiment was about vowel recognition. Using the scale transform on the Fourier spectrum envelope of a vowel pronounced by someone, the algorithm proposed was able do make a vowel tract normalization allowing to understand which was the pronounced vowel. Other experiments involved the scale phase. For example, for simple sounds events, it is possible to modify the perceived size of the event. Water drop sounds have been modified in that way and the results are very convincing.
2008
sound analysis; mellin and scale transform
non disponibile
File in questo prodotto:
File Dimensione Formato  
De_Sena.Antonio.PhDThesis.pdf

non disponibili

Tipologia: Tesi di dottorato
Licenza: Accesso ristretto
Dimensione 2.63 MB
Formato Adobe PDF
2.63 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/337592
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact