Webscrape syntax help

Posted: Tue Feb 14, 2006 3:47 am
by ibnewbie

I'm new to the world of rss feeder/reader, and i love awasu and its capabilities. I liked it even more when I found out that I can get information from non rss-enabled websites via webscraper!

Now I'm relatively new to webscrape and am trying to learn the syntax, but I'm having some trouble.

Say for example that I want to grab the information from a job listing website (the company's website) ... ... ategory=26

My problem occurs when I use Webscraper Settings GUI in determining the various "Section" and "Item" patterns.

Is there an easy way to learn the syntax or how to approach this?


Posted: Wed Feb 15, 2006 2:55 am
by abwilson
Sorry, but I don't know of any easy way. You should make sure to read the WebScrape Release Notes document, which introduces regular expressions. You can also search the Web for documents about writing regular expressions; O'Reilly has an entire book devoted to the topic. Certainly try to understand how the sample .ini files delivered with WebScrape work.

Finally, pay attention to special characters and what they mean and realize that "normal" characters represent themselves. The key point is that if a web page contains a double-quote character (") you need to match, you must include it explicitly, not, say, a single-quote character (') just because in other contexts the two can sort of mean the same thing.

Taka may have some words of wisdom about WebScrapeSettings.exe.

Posted: Wed Feb 15, 2006 3:49 am
by abwilson
Looking into WebScrape (as opposed to MonitorURLs) to give you the information you want, I've struck out again: The same https problem keeps WebScrape from properly retrieving the page. So, neither WebScrape nor MonitorURLs will do what you want.

Sorry I couldn't help with this one.