A Measure of Phonetic Similarity to Quantify Pronunciation Variation by Using ASR Technology

Tianze Shi, Shun Kasahara, Teeraphon Pongkittiphan, Nobuaki Minematsu, Daisuke Saito, Keikichi Hirose

Abstract
It attracts researchers' interest how to define a quantitative measure of phonetic similarity between IPA transcripts of the same sentence read by two speakers. This problem can be divided into how to align two transcripts and how to quantify alignment gap. In this paper, we introduce a method of similarity calculation using phone-based or phoneme-based acoustic models trained with the algorithm used to develop Automatic Speech Recognition (ASR) systems. Use of acoustic models will introduce an issue of speaker dependency because speech spectrums always convey the information of the training speakers' age and gender, which is totally irrelevant to phonetic similarity calculation. We examine how independent our method is of training speakers and how close the calculated similarity is to the similarity subjectively rated through a listening test. We also compare our method to recent works and show our method can give higher correlation by 4 points to human-rated similarity.

@inproceedings{ShiKPMSH15,
    author    = {Tianze Shi and
                Shun Kasahara and
                Teeraphon Pongkittiphan and
                Nobuaki Minematsu and
                Daisuke Saito and
                Keikichi Hirose},
    title     = {A measure of phonetic similarity to quantify pronunciation variation
                by using {ASR} technology},
    booktitle = {18th International Congress of Phonetic Sciences (ICPhS 2015)},
    address   = {Glasgow, UK},
    year      = {2015},
    month     = {August},
    url       = {https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/Papers/ICPHS0432.pdf},
}