Dumping the HTML of a page#

The shot-scraper html command dumps out the final HTML of a page after all JavaScript has run.

shot-scraper html https://datasette.io/

Use -o filename.html to write the output to a file instead of displaying it.

shot-scraper html https://datasette.io/ -o index.html

Add --javascript SCRIPT to execute custom JavaScript before taking the HTML snapshot.

shot-scraper html https://datasette.io/ \
  --javascript "document.querySelector('h1').innerText = 'Hello, world!'"

Retrieving the HTML for a specific element#

You can use the -s SELECTOR option to capture just the HTML for one specific element on the page, identified using a CSS selector:

shot-scraper html https://datasette.io/ -s h1

This outputs:

<h1>
  <img class="datasette-logo" src="/static/datasette-logo.svg" alt="Datasette">
</h1>

shot-scraper html --help#

Full --help for this command:

Usage: shot-scraper html [OPTIONS] URL

  Output the final HTML of the specified page

  Usage:

      shot-scraper html https://datasette.io/

  Use -o to specify a filename:

      shot-scraper html https://datasette.io/ -o index.html

Options:
  -a, --auth FILENAME             Path to JSON authentication context file
  -o, --output FILE
  -j, --javascript TEXT           Execute this JS prior to saving the HTML
  -s, --selector TEXT             Return outerHTML of first element matching
                                  this CSS selector
  --wait INTEGER                  Wait this many milliseconds before taking the
                                  snapshot
  --log-console                   Write console.log() to stderr
  -b, --browser [chromium|firefox|webkit|chrome|chrome-beta]
                                  Which browser to use
  --browser-arg TEXT              Additional arguments to pass to the browser
  --user-agent TEXT               User-Agent header to use
  --fail                          Fail with an error code if a page returns an
                                  HTTP error
  --skip                          Skip pages that return HTTP errors
  --bypass-csp                    Bypass Content-Security-Policy
  --silent                        Do not output any messages
  --auth-password TEXT            Password for HTTP Basic authentication
  --auth-username TEXT            Username for HTTP Basic authentication
  --help                          Show this message and exit.