Created at: 2009-07-15

Wget

Copyting Sites

To download a page, do::

    $ wget -O output-file.html url

To download a page with all its dependencies (images etc.), do::

    $ wget -p output-file.html url

To copy a whole site recursivelly, do::

    $ wget -p -r <website>

To copy a whole site, recursivelly, with HTTPS (SSL), ignoring certificate, do::

    $ wget -p -r --no-check-certificate <website>

To copy a file trought HTTP authentication use::

    $ wget --http-user=<user> --http-password=pass <url>

Combine both HTTPS and HTTP options as necessary.

Some sites require a referer, which means that you can't go to that page without clicking some link. So you the --referer=www.foo.com option.

To ignore robots.txt (they sometimes block access to a important directory, such as /img), use the -e robots=off option.