
30 Jan
2011
30 Jan
'11
10:56 a.m.
Henk-Jan van Tuyl wrote:
I am trying to fetch the listed pages with commands like: wget -m http://oldhaskell.cs.yale.edu/hmake/ but I only get the index.html files in the directories; what am I doing wrong?
The robots.txt file on the site is telling wget that downloading the site in an automated way is not allowed. So wget stops. Normally you should never tell wget to ignore the robots.txt, as you could be damaging someone's web site. However, in this case, since you are actually trying to rescue that very web site, you can do so as follows: wget -e robots=off -m ... Hope this helps, Yitz