Abstract
The speech domain prevails in the spotlight for several natural language processing (NLP) tasks while the singing domain remains less explored. The culmination of NLP is the speech-to-speech translation (S2ST) task, referring to translation and synthesis of human speech. A disparity between S2ST and the possible adaptation to the singing domain, which we describe as singing-voice to singing-voice translation (SVST), is becoming prominent as the former is progressing ever faster while the latter is at a standstill. Singing voice synthesis (SVS) and singing voice conversion (SVC) are related tasks, but despite limited attention has been paid to multilingual singing writing and song translation. This paper endeavors to determine what is required for successful SVSVT and proposes PolySinger (Polyglot Singer) as the first system for SVSVT, performing lyric translation from English to Japanese. A cascaded approach is proposed to establish a framework with a high degree of control which can potentially diminish the disparity between SVSVT and S2ST. The performance of PolySinger is evaluated by a mean opinion score test with native Japanese speakers. Results and in-depth discussions with test subjects reveal that while the foundation for SVSVT has been laid, several shortcomings must be overcome, which are discussed for the future of SVSVT.
System Architecture
Key Technologies
Block 1: Whisper-Large-V3
Used for accurate lyrics transcription on the original English vocal performance.
Learn moreBlock 3: CMU Pronunciation Dictionary
Utilized for syllable definition and pronunciation guidance in the English lyrics.
Learn moreBlock 4: Omnizart Vocal-Contour Transcription
Extracts the vocal-contour, capturing the melody and pitch information of the original performance.
Learn moreBlock 5: NLLB-200-distilled-600M
Powers the translation from English to Japanese, used in both baseline and fine-tuned models.
Learn moreBlock 6: pyKAKASI
Converts kanji to hiragana in the translated Japanese lyrics for easier pronunciation.
Learn moreBlock 6: Nagisa
Performs word separation on the Japanese lyrics to improve rhythm, timing and pronunciation.
Learn moreBlock 7: Synthesizer V
Generates the final singing voice synthesis, producing the Japanese vocal performance.
Learn moreEvaluation Results
Mean Opinion Score (MOS) test conducted with 6 native Japanese speakers. The participants were asked to evaluate the quality of the synthesized Japanese vocal performances on a 5 point scale. The same songs were tested with both a baseline and a fine-tuned translation model.
Metric | Baseline | Fine-tuned |
---|---|---|
Lyrics comprehensibility | 2.53 ± 0.49 | 2.17 ± 0.46 |
Japanese naturalness | 2.57 ± 0.48 | 2.30 ± 0.48 |
Meaning preservation | 2.47 ± 0.44 | 2.10 ± 0.44 |
Singability | 2.40 ± 0.41 | 2.23 ± 0.44 |
Lyrics-melody alignment | 2.50 ± 0.52 | 2.10 ± 0.40 |
Overall quality | 2.33 ± 0.45 | 2.13 ± 0.41 |
Audio Samples and Lyrics
Original English Vocal Performance
Lyrics
Wise men say, only fools rush in, but I can't help falling in love with you. Shall I stay? Would it be a sin? If I can't help falling in love with you
Japanese Translation by Baseline Model
Lyrics
知恵ある者は愚か者だけが急いで来ると言いますが僕は君に恋するのを止められない
Japanese Translation by Fine-Tuned Model
Lyrics
賢者は愚人だけ駆け寄るけど好きになっちゃうよ俺も残ろうか好きになっちゃうなら罪なのよ
Original English Vocal Performance
Lyrics
When you were here before, couldn't look you in the eye, you're just like an angel, your skin makes me cry. You float like a feather, in a beautiful world, I wish I was special, you're so fuckin' special. But I'm a creep, I'm a weirdo, what the hell am I doin' here, I don't belong here
Japanese Translation by Baseline Model
Lyrics
あなたがここにいたとき私はあなたの目を見ることができなかったあなたは天使のようにあなたの肌は私を泣かせて美しい世界の中で羽のように浮き沈
Japanese Translation by Fine-Tuned Model
Lyrics
前に居た君は天使みたいだ君の肌が僕を泣かせる綺麗な世界で僕は特別だ君は特別だ俺は変だ何やってんだここには居場所はない
Original English Vocal Performance
Lyrics
Don't try to make yourself remember, darling, don't look for me, I'm just a story you've been told. So let's pretend a little longer, cause when we're gone, everything goes on
Japanese Translation by Baseline Model
Lyrics
思い出させるなダーリン私を探さないで俺が君に話された話ださあ少しも
Japanese Translation by Fine-Tuned Model
Lyrics
思い出さないでダーリン探さないで君に話された物語だからもう少し気取って居なくなって
Original English Vocal Performance
Lyrics
I've heard there was a secret chord, that David played, and it pleased the Lord, but you don't really care for music, do ya? It goes like this, the fourth, the fifth, the minor fall, and the major lift, The baffled king composing Hallelujah
Japanese Translation by Baseline Model
Lyrics
ダビデが演奏した秘密の弦が神様に喜ばれると聞きましたがあなたは音楽を気にしていませんかそうですね
Japanese Translation by Fine-Tuned Model
Lyrics
ダビデが弾いた秘密の和音神様に甘えて音楽なんていうか四番五番マイナーとメジャーなぞ惑う王がハレルヤ
Original English Vocal Performance
Lyrics
And now, the end is near, and so I face the final curtain, my friends, I'll say it clear, I'll state my case of which I'm certain. I've lived a life that's full, I traveled each and every highway, And more, much more than this, I did it my way
Japanese Translation by Baseline Model
Lyrics
終わりが近づいてきたので最終的な幕を前にします友よ私ははっきりと言いましょう私は私のケースを述べます
Japanese Translation by Fine-Tuned Model
Lyrics
今終わりが近づくので最後の幕を向く友よはっきり言うよ充分生きたよ高速道路も全部越えてそれ以上も道を行くよ
BibTeX
@INPROCEEDINGS{PolySinger, AUTHOR = {Silas Antonisen and Iván López-Espejo}, TITLE = {{PolySinger: Singing-Voice to Singing-Voice Translation from English to Japanese}}, BOOKTITLE = {Proceedings of the 25th International Society for Music Information Retrieval (ISMIR) Conference}, YEAR = {2024}, ADDRESS = {San Francisco, CA, USA} }