Browsing by Author "Magrin-Chagnolleau, Ivan"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
Item Detection of Target Speakers in Audio Databases(1999-01-15) Magrin-Chagnolleau, Ivan; Rosenberg, Aaron; Parthasarathy, S.; Digital Signal Processing (http://dsp.rice.edu/)The problem of speaker detection in audio databases is addressed in this paper. Gaussian mixture modeling is used to build target speaker and background models. A detection algorithm based on a likelihood ratio calculation is applied to estimate target speaker segments. Evaluation procedures are defined in detail for this task. Results are given for different subsets of the HUB4 broadcast news database. For one target speaker, with the data restricted to high quality speech segments, the segment miss rate is approximately 7%. For unrestricted data, the segment miss rate is approximately 27%. In both cases the segment false alarm rate is 4 or 5 per hour. For two target speakers with unrestricted data, the segment miss rate is approximately 63% with about 27 segment false alarms per hour. The decrease in performance for two target speakers is largely associated with short speech segments in the two target speaker test data which are undetectable in the current configuration of the detection algorithm.Item Effect of Utterance Duration and Phonetic Content on Speaker Identification Usind Second Order Statistical Methods(1995-01-01) Magrin-Chagnolleau, Ivan; Bonastre, Jean-Francois; Bimbot, Frederic; Digital Signal Processing (http://dsp.rice.edu/)Second-order statistical methods show very good results for automatic speaker identification in controlled recording conditions. These approaches are generally used on the entire speech material available. In this paper, we study the influence of the content of the test speech material on the performances of such methods, i.e. under a more analytical approach. The goal is to investigate on the kind of information which is used by these methods, and where it is located in the speech signal. Liquids and glides together, vowels, and more particularly nasal vowels and nasal consonants, are found to be particularly speaker specific: test utterances of 1 second, composed in majority of acoustic material from one of these classes provide better speaker identification results than phonetically balanced test utterances, even though the training is done, in both cases, with 15 seconds of phonetically balanced speech. Nevertheless, results with other phoneme classes are never dramatically poor. These results tend to show that the speaker-dependent information captured by long-term second-order statistics is consistently common to all phonetic classes, and that the homogeneity of the test material may improve the quality of the estimates.Item Empirical Mode Decomposition Based Frequency Attributes(1999-11-01) Magrin-Chagnolleau, Ivan; Baraniuk, Richard G.; Digital Signal Processing (http://dsp.rice.edu/)This paper describes a new technique, called Empirical Mode Decomposition (EMD), which allows the decomposition of one-dimensional signals into intrinsic oscillatory modes. Each component, called Intrinsic Mode Function (IMF), has nice properties which allow the calculation of a meaningful instantaneous frequency. Applied to a seismic trace, this technique allows to study the different intrinsic oscillatory modes of the seismic trace, and to study the instantaneous frequency of these different modes. Applied to a seismic section, it provides new frequency attributes.Item A Further Investigation on AR-Vector Models for Text Independent Speaker Identification(1996-01-01) Magrin-Chagnolleau, Ivan; Bimbot, Frederic; Digital Signal Processing (http://dsp.rice.edu/)In this paper, we investigate on the role of dynamic information on the performances of AR-vector models for speaker recognition. To this purpose, we design an experimental protocol that destroys the time structure of speech frame sequences, which we compare to a more conventional one, i.e. keeping the natural time order. These results are also compared with those obtained with a (single) Gaussian model. Several measures are systematically investigated in the three cases, and different ways of symmetrisation are tested. We observe that the destruction of the time order can be a factor of improvement for the AR-vector models, and that results obtained with the Gaussian model are merely always better. In most cases, symmetrisation is beneficial.Item Multiscale Texture Segmentation of Dip-cube Slices using Wavelet-domain Hidden Markov Trees(1999-11-01) Magrin-Chagnolleau, Ivan; Choi, Hyeokho; van Spaendonck, Rutger; Steeghs, Philippe; Baraniuk, Richard G.; Digital Signal Processing (http://dsp.rice.edu/)Wavelet-domain Hidden Markov Models (HMMs) are powerful tools for modeling the statistical properties of wavelet coefficients. By characterizing the joint statistics of wavelet coefficients, HMMs efficiently capture the characteristics of many real-world signals. When applied to images, the model can characterize the joint statistics between pixels, providing a very good classifier for textures. Utilizing the inherent tree structure of wavelet-domain HMM, classification of textures at various scales is possible, furnishing a natural tool for multiscale texture segmentation. In this paper, we introduce a new multiscale texture segmentation algorithm based on wavelet-domain HMM. Based on the multiscale classification results obtained from the wavelet-domain HMM, we develop a method to combine the multiscale classification results to generate a reliable segmentation of the texture images. We apply this new technique to the segmentation of dip-cube slices.Item An Overview of the AT&T Spoken Document Retrieval System(1998-01-15) Choi, John; Hindle, Don; Hirschberg, Julia; Magrin-Chagnolleau, Ivan; Nakatani, Christine; Pereira, Fernando; Singhal, Amit; Whittaker, Steve; Digital Signal Processing (http://dsp.rice.edu/)We present an overview of a spoken document retrieval system developed at AT&T Labs-Research for the HUB4 Broadcast News corpus. This overview includes a description of the intonational phrase boundary detection, classification, speech recognition, information retrieval and user interface components of the system, along with updated system assessments based on the 49-query task defined for the TREC-6 SDR track. Results from a comparative ranking study, based on queries taken from AP Newswire headlines from the same time period that the Broadcast News corpus was recorded, are presented. For the AP task, retrieval accuracy is assessed by comparing the documents retrieved from ASR generated transcriptions with those from human generated transcriptions.Item SCAN - Speech Content Based Audio Navigator: A Systems Overview(1998-01-15) Choi, John; Hindle, Don; Hirschberg, Julia; Magrin-Chagnolleau, Ivan; Nakatani, Christine; Pereira, Fernando; Singhal, Amit; Whittaker, Steve; Digital Signal Processing (http://dsp.rice.edu/)SCAN (Speech Content based Audio Navigator) is a spoken document retrieval system integrating speaker-independent, large-vocabulary speech recognition with information-retrieval to support query-based retrieval of information from speech archives. Initial development focused on the application of SCAN to the broadcast news domain. This paper provides an overview of this system, including a description of its graphical user interface which incorporates machine-generated speech transcripts to provide local contextual navigation and random access for browsing large speech databases.Item Second-Order Statistical Measures for Text-Independent Speaker Identification(1995-08-20) Bimbot, Frederic; Magrin-Chagnolleau, Ivan; Digital Signal Processing (http://dsp.rice.edu/)This article presents an overview of several measures for speaker recognition. These measures relate to second-order statistical tests, and can be expressed under a common formalism. Alternate formulations of these measures are given and their mathematical properties are studied. In their basic form, these measures are asymmetric, but they can be symmetrized in various ways. All measures are tested in the framework of text-independent closed-set speaker identification, on 3 variants of the TIMIT database (630 speakers) : TIMIT (high quality speech), FTIMIT (a restricted bandwidth version of TIMIT) and NTIMIT (telephone quality). Remarkable performances are obtained on TIMIT but the results naturally deteriorate with FTIMIT and NTIMIT. Symmetrization appears to be a factor of improvement, especially when little speech material is available. The use of some of the proposed measures as a reference benchmark to evaluate the intrinsic complexity of a given database under a given protocol is finally suggested as a conclusion to this work.Item Speaker Detection in Broadcast Speech Databases(1998-01-15) Rosenberg, Aaron; Magrin-Chagnolleau, Ivan; Parthasarathy, S.; Digital Signal Processing (http://dsp.rice.edu/)Experiments have been carried out to assess the feasibility of detecting target speaker segments in multi-speaker broadcast databases. The experiemental database consists of NBC Nightly News broadcasts. The target speaker is the news anchor, Tom Brokaw. Gaussian mixture models are constructed from labelled training data for the target speaker as well as background models for other speakers, commercials, and music. Four labelled 30-min. broadcasts are used for testing. Mel-frequency cepstral features, augmented by delta cepstral features are calculated over 20 msec. windows shifted every 10 msec. through a broadcast. Likelihood ratio scores are calculated for each test frame averaged over blocks of frames with a specified duration. The block scores are input to a detection routine which returns estimates of target segments boundaries. The range of best results obtained over the test broadcasts is 82% to 100% detection of target segments with segment frame accuracy ranging from 86% to 95%. 0 to 2 false alarm segments are detected over each 30 min. broadcast.Item Time Frequency Principal Components: Application to Speaker Identification(1999-01-01) Magrin-Chagnolleau, Ivan; Durou, Geoffrey; Digital Signal Processing (http://dsp.rice.edu/)In this paper, we propose a formalism, called vector filtering of spectral trajectories, which allows to integrate under a common formalism a lot of speech parameterization approaches. We then propose a new filtering in this framework, called time-frequency principal components (TFPC) of speech. We apply this new filtering in the framework of speaker identification, using a subset of the POLYCOST database. The results show an improvement of roughly 20% compared to the use of the classical cepstral coefficients augmented by their Delta-coefficients.