I’ve been working on a project which generates podcasts for learning specific grammar topics in Mandarin. There are plenty of Chinese language podcasts, but the majority are dedicated to beginner and intermediate learners, and few are devoted to grammar. What I wanted was something short and focused so I could use spaced repetition to hit a few topics at a time.
For things like this I’m more interested in the end product than the process, so I mostly relied on Claude, developing a flow which would do the following:
I found this set of grammar topics grouped by HSK level and then iterated on a plan from which the script would be derived:
To create the podcast, I took a look at a few different options, like ElevanLabs and some open source models before settling on Azure. Of their models, I actually found the older neural models allowed for better control over reading speed and tone as opposed to the DragonHD and Dragon Omni models. The latter, though a more natural voice, spoke too quickly with the occasional odd artifact. I did also test ElevenLabs, and v3 was nice, but the cost for the improvement wasn’t worth the effort for what I wanted.
Creating a video was another script, which was easy given that Azure provides the timing for the speech synthesis. You can find them here as a podcast or here:
Going forward I’d like to work on a few things: