home | tech | misc | code | bookmarks (broken) | contact | README
To download a page, do:
$ wget -O output-file.html url
To download a page with all its dependencies (images etc.), do:
$ wget -p output-file.html url
To copy a whole site recursivelly, do:
$ wget -p -r <website>
To copy a whole site, recursivelly, with HTTPS (SSL), ignoring certificate, do:
$ wget -p -r --no-check-certificate <website>
To copy a file trought HTTP authentication use:
$ wget --http-user=<user> --http-password=pass <url
Combine both HTTPS and HTTP options as necessary.
Some sites require a referer, which means that you can't go to that page without clicking some link. So you the --referer=www.foo.com option.
To ignore robots.txt (they sometimes block access to a important directory, such as /img), use the -e robots=off option.