Created at:
Wget
Copyting Sites
To download a page, do::
$ wget -O output-file.html url
To download a page with all its dependencies (images etc.), do::
$ wget -p output-file.html url
To copy a whole site recursivelly, do::
$ wget -p -r <website>
To copy a whole site, recursivelly, with HTTPS (SSL), ignoring certificate, do::
$ wget -p -r --no-check-certificate <website>
To copy a file trought HTTP authentication use::
$ wget --http-user=<user> --http-password=pass <url>
Combine both HTTPS and HTTP options as necessary.
Some sites require a referer, which means that you can't go to that page
without clicking some link. So you the --referer=www.foo.com
option.
To ignore robots.txt
(they sometimes block access to a important
directory, such as /img
), use the -e robots=off
option.