LatinX: Supplementary Data

← Back to Demo Page

Data Snapshots

These tables mirror headline numbers from the paper for a quick overview.

Evaluator Demographics (N=306)

Native languageFemale MaleOtherTotal

WER — Summary

LanguageYourTTS XTTSv2LatinX FTLatinX DPO

SMOS / MOS — Averages

MetricReal XTTSv2LatinX FTLatinX DPO

Detailed Cross-Lingual Results

Detailed Word Error Rate (WER %)

Lower is better. Empty cells denote pairs not tested by a baseline model.

SrcTgt YourTTSXTTSv2 LatinX (Fine-tuned)LatinX (DPO)

Detailed Speaker Similarity (SMOS)

Human evaluation scores (Mean ± 95% CI). Higher is better.

SrcTgt Baseline (Real)XTTSv2 LatinX (Fine-tuned)LatinX (DPO)

Detailed Objective Similarity (Sim-O / Sim-E)

Cosine similarity scores. Higher is better. Best Sim-O per row is in bold. Sim-E values that surpass the best Sim-O are underlined.

SrcTgt YourTTSXTTSv2 LatinX (Fine-tuned)LatinX (DPO)

Detailed Naturalness (MOS)

Human evaluation scores for naturalness (Mean ± 95% CI). Higher is better. Real audio scores are in bold. The best result among generated models is underlined.

SrcTgt Baseline (Real Audio) XTTSv2 LatinX (Fine-tuned)LatinX (DPO)