PolySinger: Singing-Voice to Singing-Voice Translation From English to Japanese

Abstract

The speech domain prevails in the spotlight for several natural language processing (NLP) tasks while the singing domain remains less explored. The culmination of NLP is the speech-to-speech translation (S2ST) task, referring to translation and synthesis of human speech. A disparity between S2ST and the possible adaptation to the singing domain, which we describe as singing-voice to singing-voice translation (SVST), is becoming prominent as the former is progressing ever faster while the latter is at a standstill. Singing voice synthesis (SVS) and singing voice conversion (SVC) are related tasks, but despite limited attention has been paid to multilingual singing writing and song translation. This paper endeavors to determine what is required for successful SVSVT and proposes PolySinger (Polyglot Singer) as the first system for SVSVT, performing lyric translation from English to Japanese. A cascaded approach is proposed to establish a framework with a high degree of control which can potentially diminish the disparity between SVSVT and S2ST. The performance of PolySinger is evaluated by a mean opinion score test with native Japanese speakers. Results and in-depth discussions with test subjects reveal that while the foundation for SVSVT has been laid, several shortcomings must be overcome, which are discussed for the future of SVSVT.

System Architecture

PolySinger System Architecture Diagram

Key Technologies

Block 1: Whisper-Large-V3

Used for accurate lyrics transcription on the original English vocal performance.

Learn more

Block 2: Schufo lyrics-aligner

Phonetically aligns the transcribed lyrics to the audio.

Learn more

Block 3: CMU Pronunciation Dictionary

Utilized for syllable definition and pronunciation guidance in the English lyrics.

Learn more

Block 4: Omnizart Vocal-Contour Transcription

Extracts the vocal-contour, capturing the melody and pitch information of the original performance.

Learn more

Block 5: NLLB-200-distilled-600M

Powers the translation from English to Japanese, used in both baseline and fine-tuned models.

Learn more

Block 6: pyKAKASI

Converts kanji to hiragana in the translated Japanese lyrics for easier pronunciation.

Learn more

Block 6: Nagisa

Performs word separation on the Japanese lyrics to improve rhythm, timing and pronunciation.

Learn more

Block 7: Synthesizer V

Generates the final singing voice synthesis, producing the Japanese vocal performance.

Learn more

Evaluation Results

Mean Opinion Score (MOS) test conducted with 6 native Japanese speakers. The participants were asked to evaluate the quality of the synthesized Japanese vocal performances on a 5 point scale. The same songs were tested with both a baseline and a fine-tuned translation model.

Metric Baseline Fine-tuned
Lyrics comprehensibility 2.53 ± 0.49 2.17 ± 0.46
Japanese naturalness 2.57 ± 0.48 2.30 ± 0.48
Meaning preservation 2.47 ± 0.44 2.10 ± 0.44
Singability 2.40 ± 0.41 2.23 ± 0.44
Lyrics-melody alignment 2.50 ± 0.52 2.10 ± 0.40
Overall quality 2.33 ± 0.45 2.13 ± 0.41

Audio Samples and Lyrics

Original English Vocal Performance

Lyrics

Wise men say, only fools rush in, but I can't help falling in love with you. Shall I stay? Would it be a sin? If I can't help falling in love with you

Japanese Translation by Baseline Model

Lyrics

知恵ある者は愚か者だけが急いで来ると言いますが僕は君に恋するのを止められない

Japanese Translation by Fine-Tuned Model

Lyrics

賢者は愚人だけ駆け寄るけど好きになっちゃうよ俺も残ろうか好きになっちゃうなら罪なのよ

Original English Vocal Performance

Lyrics

When you were here before, couldn't look you in the eye, you're just like an angel, your skin makes me cry. You float like a feather, in a beautiful world, I wish I was special, you're so fuckin' special. But I'm a creep, I'm a weirdo, what the hell am I doin' here, I don't belong here

Japanese Translation by Baseline Model

Lyrics

あなたがここにいたとき私はあなたの目を見ることができなかったあなたは天使のようにあなたの肌は私を泣かせて美しい世界の中で羽のように浮き沈

Japanese Translation by Fine-Tuned Model

Lyrics

前に居た君は天使みたいだ君の肌が僕を泣かせる綺麗な世界で僕は特別だ君は特別だ俺は変だ何やってんだここには居場所はない

Original English Vocal Performance

Lyrics

Don't try to make yourself remember, darling, don't look for me, I'm just a story you've been told. So let's pretend a little longer, cause when we're gone, everything goes on

Japanese Translation by Baseline Model

Lyrics

思い出させるなダーリン私を探さないで俺が君に話された話ださあ少しも

Japanese Translation by Fine-Tuned Model

Lyrics

思い出さないでダーリン探さないで君に話された物語だからもう少し気取って居なくなって

Original English Vocal Performance

Lyrics

I've heard there was a secret chord, that David played, and it pleased the Lord, but you don't really care for music, do ya? It goes like this, the fourth, the fifth, the minor fall, and the major lift, The baffled king composing Hallelujah

Japanese Translation by Baseline Model

Lyrics

ダビデが演奏した秘密の弦が神様に喜ばれると聞きましたがあなたは音楽を気にしていませんかそうですね

Japanese Translation by Fine-Tuned Model

Lyrics

ダビデが弾いた秘密の和音神様に甘えて音楽なんていうか四番五番マイナーとメジャーなぞ惑う王がハレルヤ

Original English Vocal Performance

Lyrics

And now, the end is near, and so I face the final curtain, my friends, I'll say it clear, I'll state my case of which I'm certain. I've lived a life that's full, I traveled each and every highway, And more, much more than this, I did it my way

Japanese Translation by Baseline Model

Lyrics

終わりが近づいてきたので最終的な幕を前にします友よ私ははっきりと言いましょう私は私のケースを述べます

Japanese Translation by Fine-Tuned Model

Lyrics

今終わりが近づくので最後の幕を向く友よはっきり言うよ充分生きたよ高速道路も全部越えてそれ以上も道を行くよ

BibTeX

@INPROCEEDINGS{PolySinger,
    AUTHOR = {Silas Antonisen and Iván López-Espejo},
    TITLE = {{PolySinger: Singing-Voice to Singing-Voice Translation from English to Japanese}},
    BOOKTITLE = {Proceedings of the 25th International Society for Music Information Retrieval (ISMIR) Conference},
    YEAR = {2024},
    ADDRESS = {San Francisco, CA, USA}
    }