How to programatically extract YouTube captions as plain text

YouTube automatically creates subtitles for a lot of videos that are uploaded to YouTube.

So instead of paying to have your videos transcribed, you can upload your video to YouTube, and programatically download the subtitles.

But it took me a while to figure out how to get those subtitles programatically.

I started with youtube-dl, but I could only get the WebVTT format which is cumbersome to post-process. I posted an issue in their GitHub repo.

I finally found a little npm library someone had built that made this task fairly straight forward. Here is a code sample that will take a YouTube URL as a command line argument, and then log the subtitles as plain text.

View this gist on GitHub

2 responses to “How to programatically extract YouTube captions as plain text”

  1. Hi Andrew.

    Thank you very much for sharing this info cos I’ve been also trying to get youtube subtitles now and this is the one I’ve been trying to get.

    However, I’m facing one problem here. It was blocked by CORS policy. How did you get away with this problem??

    • Hey David. Thanks for the feedback. If you’re dealing with CORS issues you’re trying to run this code from the browser? I just checked and this code runs fine on a server or my local machine.

      Let me know if that doesn’t make sense or if you have any other questions.

Leave a Reply

Your email address will not be published.