I had an issue where I needed to scrape some data from a dynamically generated webpage. Open this page in a browser and you’ll see what I mean.

I tried a lot of options, and the solution that ended up working for me was to use Headless Chrome. This basically allows you to launch Chrome and use it as a tool from within your code. It’s recommended to install Chrome Canary and you can get it from here.

Essentially the pseudo code for my script goes as follows..

  1. launch Chrome in headless mode (no visible window)
  2. load a page and wait until the page is loaded
  3. then wait a little longer until all the JS on the page has completed
  4. then click on something using a querySelector to change the language to English
  5. then grab all the source code from the page
  6. then load that source code into cheerio and perform queries as needed

The full source code for this is below. You’ll of course need to install the 3 packages too.

> yarn add chrome-remote-interface chrome-launcher cheerio

I hope that helps some people! It took me ages to finally get to this place where I could scrape a dynamically generated webpage from Node.js.