Go Back   The macosxhints Forums > OS X Help Requests > UNIX - Newcomers



Reply
 
Thread Tools Rate Thread Display Modes
Old 01-03-2003, 02:19 AM   #1
A Little Peaved!
Major Leaguer
 
Join Date: Apr 2002
Posts: 463
wget 1.8.1 "-D" or "--no-parent" does not always work...

Hello,

The way I understand it, the -D option and/or --no-parent options are used to limit downloading to specified host domain or directory tree.

Well, I tried wget 1.8.1 and despite using these options, wget would download some files from areas it was not supposed to go.

for example

wget -D www.justthisdomain.com

or

wget -D www.justthisdomain.com --no-parent http://www.justthisdomain.com/notthisdirectory/onlythisdirectorytree.html

(other stuff omitted in above examples)


wget was "fooled", I believe, because during recursion it encountered this kind of thing:


http://www.justthisdomain.com/notthisdirectory/okfoolyouareinthisdirectorytree/cgi-bin/nowyoumustdownloadsomethingfromanotherdomain.html?http://www.hahafooledyousucker-nowdownloadme-domain.com/gotcha.html

well, you get the idea...

of course, despite the -D restriction, www.hahafooledyousucker-nowdownloadme-domain.com/gotcha.html
was downloaded. blech!

i came up with a workaround, restrict download to exclude any directory named cgi-bin...

but, it's not really a satisfactory solution, because you can't really predict this scenario in advance. I only found the problem after it started.

So, shouldn't wget have prevented downloading outside of www.justthisdomain.com regardless???

Is this a wget bug?

If so, how do I report it?

Or am I doing something incorrectly?

What should I do?



thanks,

I am:

NOT REALLY PEAVED RIGHT NOW, JUST VERY IMPRESSED WITH HOW COOL wget IS, AND A LITTLE CONFUSED ABOUT USING IT

A Little Peaved! is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump



All times are GMT -5. The time now is 05:35 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.
Site design © IDG Consumer & SMB; individuals retain copyright of their postings
but consent to the possible use of their material in other areas of IDG Consumer & SMB.