XML Cleaner Plugin Channel

Contributed by Tomi Junnila

This plugin corrects errors in an XML feed.

Installation

  • Install Python, PyWin32, and uTidyLib.
    uTidyLib contains an older version of the ctypes Python module. The included version works fine with Python 2.3, but for Python 2.4, ctypes should be updated to a newer version.
  • Create the two files listed below. It is recommended to put them into the ChannelPlugins directory under Awasu's installation directory.

XmlCleaner.plugin

[Config]
AuthorName=Tomi Junnila
AuthorEmailAddress=notlisted
PluginNotes=This plugin uses uTidylib to clean up an erroneous XML feed

' --------------------------------------------------------------------------------

[ChannelParameterDefinition-1]
Name=DownloadUrl
Type=string
DefaultValue=
Description=URL to read erroneous feed from.

XmlCleaner.py

# -*- coding: iso-8859-1 -*-

import sys, win32api
import socket
from xml.dom.minidom import parseString
import urllib
from datetime import datetime,tzinfo,timedelta
import time as _time
import tidy

# Set options for uTidyLib
tidyopts = dict(output_xml=1, input_xml=1, add_xml_decl=1, indent=1, tidy_mark=0, output_encoding='utf8')

# Awasu will already have downloaded the DownloadUrl and stored it into a
# temporary file pointed to by DownloadUrlFile. Get the file name:
filename = win32api.GetProfileVal("System","DownloadUrlFile","",sys.argv[1])

# Then read the file:
page = file(filename,'rb')
xml = ''
block = 'a'
while block!='':
    block = page.read()
    xml = xml + block
page.close()

# Clean up the HTML:
xml = tidy.parseString(xml, **tidyopts)
print xml

Usage

To use the XmlCleaner plugin, select File -> New channel, then "Generated by a channel plugin", and browse to the XmlCleaner.py file. Add the feed URL in the plugin's URL parameter, and you're done.