R-3 Repository :: Browsing by Author "Parthasarathy, S."

Browsing by Author "Parthasarathy, S."

Now showing 1 - 2 of 2

Detection of Target Speakers in Audio Databases
(1999-01-15) Magrin-Chagnolleau, Ivan; Rosenberg, Aaron; Parthasarathy, S.; Digital Signal Processing (http://dsp.rice.edu/)
The problem of speaker detection in audio databases is addressed in this paper. Gaussian mixture modeling is used to build target speaker and background models. A detection algorithm based on a likelihood ratio calculation is applied to estimate target speaker segments. Evaluation procedures are defined in detail for this task. Results are given for different subsets of the HUB4 broadcast news database. For one target speaker, with the data restricted to high quality speech segments, the segment miss rate is approximately 7%. For unrestricted data, the segment miss rate is approximately 27%. In both cases the segment false alarm rate is 4 or 5 per hour. For two target speakers with unrestricted data, the segment miss rate is approximately 63% with about 27 segment false alarms per hour. The decrease in performance for two target speakers is largely associated with short speech segments in the two target speaker test data which are undetectable in the current configuration of the detection algorithm.
Speaker Detection in Broadcast Speech Databases
(1998-01-15) Rosenberg, Aaron; Magrin-Chagnolleau, Ivan; Parthasarathy, S.; Digital Signal Processing (http://dsp.rice.edu/)
Experiments have been carried out to assess the feasibility of detecting target speaker segments in multi-speaker broadcast databases. The experiemental database consists of NBC Nightly News broadcasts. The target speaker is the news anchor, Tom Brokaw. Gaussian mixture models are constructed from labelled training data for the target speaker as well as background models for other speakers, commercials, and music. Four labelled 30-min. broadcasts are used for testing. Mel-frequency cepstral features, augmented by delta cepstral features are calculated over 20 msec. windows shifted every 10 msec. through a broadcast. Likelihood ratio scores are calculated for each test frame averaged over blocks of frames with a specified duration. The block scores are input to a detection routine which returns estimates of target segments boundaries. The range of best results obtained over the test broadcasts is 82% to 100% detection of target segments with segment frame accuracy ranging from 86% to 95%. 0 to 2 false alarm segments are detected over each 30 min. broadcast.

Browsing by Author "Parthasarathy, S."

Results Per Page

Sort Options