Abstract: This is a preliminary study on the usefulness of rhythmic information as forensic voice comparison features in Japanese. In this study, we focus on phoneme and syllable duration and their average power as the rhythmic features. Two different segmentation schemes are employed in this study: One is manual and the other is automatic segmentation using a third party tool “Speech Segmentation Toolkit” with an open-source speech recognition engine “Julius”. The speech samples (two recordings at two different sessions) read out by 100 male Japanese speakers were extracted from the National Research Institute of Police Science, Japan (NRIPS) database. The likelihood ratios were estimated by the Multivariate Kernel Density formula for the 100 same-speaker comparisons and 9900 different-speaker comparisons for each phoneme and syllable, and then they were respectively logistic-regression fused. This study will look into what types of phonemes and syllables tend to carry more individuating information, while showing the magnitude of the LRs obtained from Japanese rhythmic information. This study will also compare the performance between the automatic and manual segmentations.
Location
Speakers
- Kimiya Akita
Contact
- Jane Simpson