Page 1 of 1

Webscrape Problems

Posted: Wed Oct 08, 2008 11:28 am
by herkimer
URL: http://www.pressargus.com/news/
__________________________________________
MY .ini:

[ChannelParameters]
URL=http://www.pressargus.com/news/
BaseURL=
Title=Van Buren Press Argus-Courier
Description=News from the Van Buren Press Argus-Courier
MaxItems=20
Shorthand=
SectionPattern=<p>News</p>
ItemPattern-1=<a>
ItemPattern-2=<span>(?P<T>)</a>
ItemPattern-3=<span>(?P<D>)</span>
__________________________________________________
Scrape Result:
Error: Can't determine the feed type.

<HEAD>
<META>
</HEAD>
<HTML><BODY>
<P>The script caused an error:
<PRE>
Traceback (most recent call last):
File "WebScrape.py", line 249, in ?
File "WebScrape.py", line 241, in main
File "WebScrape.py", line 194, in getItems
IndexError: no such group

</PRE>
</BODY></HTML>
_______________________________________________________

I've tried several different ini configurations, always with the same result. Any help would be appreciated!

Re: Webscrape Problems

Posted: Wed Oct 08, 2008 12:12 pm
by support
What version of WebScrape are you using? I don't get this error using the 1.30a release on the wiki.

I've also noticed there's a problem with the WebScrapeSettings utility provided with the 1.30a release so if you're using it, send me an email and I'll get a fixed-up copy out to you.

Still no worky

Posted: Wed Oct 08, 2008 1:40 pm
by herkimer
The .ini should be in UTF-8 form, correct?

Here's what I get now:

Traceback (most recent call last):
File "WebScrape.py", line 227, in ?
File "WebScrape.py", line 137, in main
File "WebScrape.py", line 32, in getConfigParser
File "ConfigParser.pyo", line 286, in readfp
File "ConfigParser.pyo", line 462, in _read
ConfigParser.MissingSectionHeaderError: File contains no section headers.
file: C:\Program Files\Awasu\ChannelPlugins\argus.ini, line: 1
'\xef\xbb\xbf[ChannelParameters]\n'

Re: Still no worky

Posted: Wed Oct 08, 2008 1:49 pm
by support
herkimer wrote:The .ini should be in UTF-8 form, correct?

Plain ASCII is probably better but the byte-order mark at the beginning is probably causing problems.