|
|
#1 |
|
Major Leaguer
Join Date: Apr 2002
Posts: 463
|
wget 1.8.1 "-D" or "--no-parent" does not always work...
Hello,
The way I understand it, the -D option and/or --no-parent options are used to limit downloading to specified host domain or directory tree. Well, I tried wget 1.8.1 and despite using these options, wget would download some files from areas it was not supposed to go. for example wget -D www.justthisdomain.com or wget -D www.justthisdomain.com --no-parent http://www.justthisdomain.com/notthisdirectory/onlythisdirectorytree.html (other stuff omitted in above examples) wget was "fooled", I believe, because during recursion it encountered this kind of thing: http://www.justthisdomain.com/notthisdirectory/okfoolyouareinthisdirectorytree/cgi-bin/nowyoumustdownloadsomethingfromanotherdomain.html?http://www.hahafooledyousucker-nowdownloadme-domain.com/gotcha.html well, you get the idea... of course, despite the -D restriction, www.hahafooledyousucker-nowdownloadme-domain.com/gotcha.html was downloaded. blech! i came up with a workaround, restrict download to exclude any directory named cgi-bin... but, it's not really a satisfactory solution, because you can't really predict this scenario in advance. I only found the problem after it started. So, shouldn't wget have prevented downloading outside of www.justthisdomain.com regardless??? Is this a wget bug? If so, how do I report it? Or am I doing something incorrectly? What should I do? thanks, I am: NOT REALLY PEAVED RIGHT NOW, JUST VERY IMPRESSED WITH HOW COOL wget IS, AND A LITTLE CONFUSED ABOUT USING IT
|
|
|
|
![]() |
|
|