How to programatically extract YouTube captions as plain text

YouTube automatically creates subtitles for a lot of videos that are uploaded to YouTube.

So instead of paying to have your videos transcribed, you can upload your video to YouTube, and programatically download the subtitles.

But it took me a while to figure out how to get those subtitles programatically.

I started with youtube-dl, but I could only get the WebVTT format which is cumbersome to post-process. I posted an issue in their GitHub repo.

I finally found a little npm library someone had built that made this task fairly straight forward. Here is a code sample that will take a YouTube URL as a command line argument, and then log the subtitles as plain text.

By |2018-08-14T17:30:17+00:00August 14th, 2018|gist|2 Comments

About the Author:

Andrew Golightly is the lead web developer here at Golightly+. He is a passionate fullstack JavaScript developer. He also runs an Empath Community.

2 Comments

  1. David September 9, 2019 at 1:41 am - Reply

    Hi Andrew.

    Thank you very much for sharing this info cos I’ve been also trying to get youtube subtitles now and this is the one I’ve been trying to get.

    However, I’m facing one problem here. It was blocked by CORS policy. How did you get away with this problem??

    • Andrew September 9, 2019 at 11:48 am - Reply

      Hey David. Thanks for the feedback. If you’re dealing with CORS issues you’re trying to run this code from the browser? I just checked and this code runs fine on a server or my local machine.

      Let me know if that doesn’t make sense or if you have any other questions.

Leave A Comment