the spatula

I’ve been working on a project which generates podcasts for learning specific grammar topics in Mandarin. There are plenty of Chinese language podcasts, but the majority are dedicated to beginner and intermediate learners, and few are devoted to grammar. What I wanted was something short and focused so I could use spaced repetition to hit a few topics at a time.

For things like this I’m more interested in the end product than the process, so I mostly relied on Claude, developing a flow which would do the following:

Take a set of existing grammar topics as input
Generate a script using SSML
Have a speech service create the audio
Create a video with synced subtitles

I found this set of grammar topics grouped by HSK level and then iterated on a plan from which the script would be derived:

Intro: Explain the grammar point in Chinese
Vocab: Vocabulary words we’ll use
Examples: Example sentences using the grammar point
Usage: Explained in Chinese and summarized in English
Story: Short story in Chinese, then English and repeated in Chinese
Outro: Simple goodbye

To create the podcast, I took a look at a few different options, like ElevanLabs and some open source models before settling on Azure. Of their models, I actually found the older neural models allowed for better control over reading speed and tone as opposed to the DragonHD and Dragon Omni models. The latter, though a more natural voice, spoke too quickly with the occasional odd artifact. I did also test ElevenLabs, and v3 was nice, but the cost for the improvement wasn’t worth the effort for what I wanted.

Creating a video was another script, which was easy given that Azure provides the timing for the speech synthesis. You can find them here as a podcast or here:

Going forward I’d like to work on a few things:

Improve the quality of the Chinese speaker as the current voice is a bit airy. I might try to redo some of it with ElevenLabs, though the cost may be a bit much for the moderate improvement it will give.
Different types of podcasts, such as sentence progressions where you start with an English sentence and build up to a complete Chinese sentence slowly.