kevotheclone wrote:These appear no matter what "Feed Content" setting I use, even "Plain text".
It works correctly for me if you use Full content
- maybe you had some leftover encode=...
settings that was breaking things...?
This is what's happening (are you sitting comfortably?)...
Awasu tags every piece of text it records as being PLAIN TEXT, HTML, or UNKNOWN. The feed you gave as an example uses Atom so each piece of text can definitively be identified as TEXT or HTML (yay!). UNKNOWN is used for RSS content.
In the article about the Dutchman, the feed XML looks something like this:
Code: Select all
<content type="html"> <![CDATA[ ... They’re ... ]]> </content>
so the content gets recorded as HTML, with the 7 characters that make up the encoded character i.e. it doesn't get decoded down to a single character. This is correct behavior.
When inserting the content into a report, Awasu must consider the type of data (TEXT/HTML/UNKNOWN) being inserted, and the output format (HTML/XML/etc.) and encode accordingly. If we are using Awasu's default report, set to use Full content
, Awasu will see the data type as being HTML, and since the output format is HTML, no encoding is necessary and the content gets inserted verbatim. The browser will then convert the 7-character sequence into the correct character when it is rendering the page.
However, if you are using Excerpt
, Awasu has to create an excerpt of the content and a side-effect of this process is that the content is always set to type TEXT
(there are good reasons for doing this). So, when it comes time to insert the content into the report, data of type TEXT must be encoded so that it will render correctly in an HTML page, which is why you're seeing the encoded string.
So... it seems that the underlying problem is that content gets set to type TEXT when it is excerpted - Awasu strips out HTML tags as part of this process but doesn't decode SGML entities. "Fixing" this is hairy and I'm not sure it wouldn't cause more problems than it solves...