Why Awasu duplicates articles?

Posted: Fri Sep 01, 2017 9:28 am
by awasu.user
When I generate channels I get often articles with the same title, url and publishing date. Why is
it a possible in the same channel? Fx. I have article in the type:

"New King speach", http:\\, published: 12:08.
Sometimes I get the same article (= the same URL), but with another date. Why it does happen?

Re: Why Awasu duplicates articles?

Posted: Fri Sep 01, 2017 9:44 am
by support
The item title, URL and published time are all provided by the feed publisher, and are therefore of variable quality :| I've seen feeds where every item has the same URL, or no item has a title nor URL! So, it's possible that the channel is, in fact, publishing items with the same title and/or URL and/or timestamp.

However, what's probably happening here is "item revisions". These happen when Awasu receives an item, and the publisher then changes something (e.g. fixes a typo in the content), and Awasu receives the new version. By default, Awasu only shows the most recent revision, but you can configure it to show all received versions (Track revised items in the Advanced tab of the channel's Properties dialog). Awasu does delete these old revisions eventually, but there is a window of time where an item has multiple revisions stored in the archive database, which might be what you're seeing here.

Re: Why Awasu duplicates articles?

Posted: Fri Sep 01, 2017 9:55 am
by support
Also, items can have a GUID or ID, that is supposed to uniquely identify them, to make this detection of revisions easier, but again, publishing software and/or the author don't always get it right e.g. an item is slightly modified, but issued a new ID, which makes Awasu think that it's a brand new item.

Or, it's common for blogging software to create a new URL when the author changes a blog post, so if the RSS feed doesn't give each item a unique ID, Awasu uses the item URL's, and so thinks that it's received a new item.

Note that sometimes stuff you can't see causes an item to be flagged as revised. For example, some feeds include the number of comments each item has (which don't show in the UI, by default), so every time this changes, Awasu decides that it has received a new revision of the item (because something's changed), but you can't see anything different when you open the channel.