How to programatically extract YouTube captions as plain text

YouTube automatically creates subtitles for a lot of videos that are uploaded to YouTube.

So instead of paying to have your videos transcribed, you can upload your video to YouTube, and programatically download the subtitles.

But it took me a while to figure out how to get those subtitles programatically.

I started with youtube-dl, but I could only get the WebVTT format which is cumbersome to post-process. I posted an issue in their GitHub repo.

I finally found a little npm library someone had built that made this task fairly straight forward. Here is a code sample that will take a YouTube URL as a command line argument, and then log the subtitles as plain text.

By |2018-08-14T17:30:17+00:00August 14th, 2018|gist|0 Comments

About the Author:

Andrew Golightly is the lead web developer here at Golightly+. He is a passionate fullstack JavaScript developer. And creates native apps too using React Native. To balance his love for coding, he also works as a counsellor.

Leave A Comment