Page 1 of 1

Webscrape for Astronomy Pic of the Day

Posted: Fri Oct 29, 2004 6:01 pm
by kyodylee
Astronomy Pic of the Day ... http://antwrp.gsfc.nasa.gov/apod/ ... is just about my most favorite RSS feed and now it looks like the feed is no longer being maintained by it's third party source (the site itself does not have a feed) ... http://services.perceive.net/xml/apod_rss10.xml. :(

I tried to use Webscrape to scrape my own feed, but I'm not a programmer and I don't know how to write a script for the page. And since MyRSS no longer works either, well ... no more Pic of the Day. :cry:

I know how busy everyone is, but if there is someone out there that also likes this site and would be willing to write a script for it to use in Webscrape, I would be very eternally happy and grateful. :please:

Thanks all. :D

Re: Webscrape for Astronomy Pic of the Day

Posted: Sat Oct 30, 2004 5:06 am
by support
kyodylee wrote:Astronomy Pic of the Day ... http://antwrp.gsfc.nasa.gov/apod/ ... is just about my most favorite RSS feed and now it looks like the feed is no longer being maintained by it's third party source


The WebScrape plugin was written by Allan Wilson and if the thought of these pics as a feed doesn't get him drooling, I don't know what will :-)

It's a shame about myrss.com but I guess they're right, there really is a business opportunity in providing scraped feeds :shock:

Posted: Sat Oct 30, 2004 6:00 pm
by abwilson
You know me too well! :)

As a matter of fact, once I saw kyodylee's post yesterday, I had a look and fortunately it was easy. However, the page happens to contain a relative URL that spans multiple lines -- which WebScrape as published didn't handle. So, I have updated WebScrape.zip and Taka should be able to make it available to everyone on awasu.com.

I also included my AstronomyPoD.ini in the new WebScrape.zip, but for immediate info, here it is:

Code: Select all

[ChannelParameters]
URL=http://antwrp.gsfc.nasa.gov/apod
BaseURL=http://antwrp.gsfc.nasa.gov/apod/
Title=Astronomy Picture of the Day
Description=Photograph retrieved from Website
MaxItems=1
Shorthand=
SectionPattern=
ItemPattern-1=(?P<D><center>\s+<h1>.*?
ItemPattern-2=<center>\s*<b> (?P<T>.*?) </b> <br>.*?
ItemPattern-3=(?P<L>)<p> <hr>)


Let me know how you like it!

Posted: Sun Oct 31, 2004 12:27 am
by Guest
Allan - YIPPEE!!!! :bowdown: Thank You! Thank You! Thank You! :D

The feed is working! It actually provides more information than the original feed. However, the picture itself isn't downloading, just a place holder for the picture. I can click on the picture name though and it takes me to the website for the picture. Will the new webscrape.zip correct this? or is my 'puter not doing what it's supposed to do?

Thanks again for such a fast and speedy reply. I think MyRSS has it right. I would definitely pay for a program that did this automatically since I don't know how to write these scripts!

p.s. Allan, check your tip jar! :)

arrg - I wasn't signed in, but it's me kyodylee. :oops:

Posted: Sun Oct 31, 2004 12:55 am
by abwilson
Great -- glad you like it!

Yes, the updated WebScrape.exe (in the new WebScrape.zip) will indeed fix the image display problem you're having. By the way, thanks for letting me know about a site that happened to reveal a limitation in the way I was handling relative URLs; the new version fixes the problem (and allows your new "feed" to display properly).

Also, as Taka expected, I think it's a great page to WebScrape. :D

Allan

Posted: Sun Oct 31, 2004 3:20 am
by support
Um, I was just going to post to let you guys the new WebScrape is up in the downloads section but it looks like you're already sorted :-)

Posted: Sun Oct 31, 2004 5:34 am
by Guest
Thanks again both Allan and Taka. The updated version works like a charm! Picture perfect! I really can't thank you enough. :D

And of course I had no way of knowing that a little selfish request on my part would actually help reveal a limitation of the program, so it's also very nice to know that some greater good was also accomplished!

Cheers mates! ;)

Posted: Sun Oct 31, 2004 5:47 am
by kyodylee
I know it's Halloween (and slightly OT) but ... apparently somehow www.awasu.com got put into IE's restricted site list ... I have no idea how, I didn't do it! ... and that is what was causing me not to be able to post under my login name. Then when I fixed the restricted site list, I couldn't edit my own post above! Ghosts and goblins at play with my computer! :twisted: :wink:

Posted: Mon Nov 01, 2004 2:06 am
by abwilson
I'm also pleased everything seems to be working well with the Astronomy Picture of the Day scraping. You have your "feed" back and we nailed a bug in the process.

Thanks

Allan

Posted: Tue Nov 02, 2004 4:55 am
by kyodylee
Well, everything was going just fine until today when the Astronomy PoD feed needed to update for the first time and failed to do so. :(

When I tried to reinstall the feed, thinking that might fix the problem, webscrapesettings.exe is giving me the following error message: download failed:403 forbidden. I get the error message on any of the .ini files included with the plug-in that I try to install.

:help:

Posted: Tue Nov 02, 2004 5:04 am
by abwilson
While it won't make you feel any better, it's still working fine for me. Didn't you mention some strange goings-on with your system and restricted sites?

403 means you are basically not being allowed access.

Taka, any ideas or things to try?

Allan

Posted: Tue Nov 02, 2004 8:25 am
by kyodylee
Ok, I got the feed working again. :D

The problem was with Zone Alarm not letting webscrape.exe have access to the internet. But I had set this setting to allow webscrape.exe access when I initially installed the feed, and the feed was working, so I'm not sure why or how the setting got changed. :?

Well, now I know to check ZA first if I have another problem. :)

Posted: Tue Nov 02, 2004 4:26 pm
by abwilson
Well, congratulations on getting things working again.