- Momel (algorithm)
-
Momel (Modelling melody) is an algorithm developed by Daniel Hirst[1] [2] and Robert Espesser at the CNRS Laboratoire Parole et Langage [3], Aix-en-Provence: [3] for the analysis and synthesis of intonation patterns.
Contents
Purpose
The analysis of raw fundamental frequency curves for the study of intonation needs to take into account the fact that speakers are simultaneously producing an intonation pattern and a sequence of syllables made up of segmental phones. The actual raw fundamental frequency curves which can be analysed acoustically are the result of an interaction between these two components and this makes it difficult to compare intonation patterns when they are produced with different segmental material. Compare for example the intonation patterns on the utterances Its for papa and It's for mama.
Algorithm
The Momel algorithm attempts to solve this problem by factoring the raw curves into two components:
- a macromelodic component - modelled as a quadratic spline function . This is assumed to correspond to the global pitch contour of the utterance, and which is independent of the nature of the constituent phonemes. The underlying hypothesis is that this macromelodic component is, unlike raw fundamental frequency curves, both continuous and smooth. It corresponds approximately to what we produce if we hum an utterance instead of speaking it.
- a micromelodic component consisting of deviations from the macromelodic curve - called a micromelodic profile. This residual curve is assumed to be determined entirely by the segmental constituents of the utterance and to be independent of the macromelodic component.
The quadratic spline function used to model the macromelodic component is defined by a sequence of target points, (couples <s, Hz> each pair of which is linked by two monotonic parabolic curves with the spline knot occurring (by default) at the midway point between the two targets. The first derivative of the curve thus defined is zero at each target point and the two parabolas have the same value and same derivative at the spline knot. This in fact defines the most simple mathematical function for which the curves are both continuous and smooth.
Implications
On the one hand, two utterances "For Mama!" and "For Papa!" could thus be modelled with the same target points (hence the same macromelodic component) while "For Mama?" and "For Papa?" would also have the same target points but which would probably be different from those of the first pair.
On the other hand, the utterances "For Mama!" and "For Mama?" could be modelled with the same micromelodic profile but with different target point, while "For Papa!" and "For Papa?" would also have the same micromelodic profile but which would be different from those of the first pair.
The Momel algorithm derives what its authors refer to as a phonetic representation of an intonation pattern which is neutral with respect to speech production and speech perception since while not explicitly derived from a model of either production or perception it contains sufficient information to allow it to be used as input to models of either process. The relatively theory-neutral nature of the algorithm has allowed it to be used as a first step in deriving representations such as those of the Fujisaki model (Mixdorff 1999), ToBI (Maghbouleh 1999, Wightman & al. 2000) or INTSINT (Hirst & Espesser 1993, Hirst et al. 2000).
References
- Hirst, Daniel & Robert Espesser 1993. Automatic modelling of fundamental frequency using a quadratic spline function. Travaux de l'Institut de Phonétique d'Aix 15, 71-85.
- Hirst, Daniel, Albert Di Cristo & Robert Espesser 2000. Levels of representation and levels of analysis for intonation. in M. Horne (ed) Prosody : Theory and Experiment. Kluwer Academic Publishers, Dordrecht. 51-87
- Maghbouleh, A., 1998. ToBI accent type recognition. In: Proceedings ICSLP 98.
- Mixdorff, H., 1999. A novel approach to the fully automatic extraction of Fujisaki model parameters. In Proceedings ICASSP 1999.
- Wightman, C. & Campbell, N., 1995. Improved labeling of prosodic structure. IEEE Trans. on Speech and Audio Processing.
Wikimedia Foundation. 2010.