We currently use "Check&Get" by ActiveURLs to scrape websites for changes. Specifically, US Congressional committee websites for changes/additions/cancelations to hearing and meeting schedules -- including changes to time, date, titles, location, witnesses, etc. And also the same website monitoring for many "think tank" schedules in the DC area. The nice thing about this program is that it automatically emails us the website changes, with the page embedded in the email -- all changes highlighted in yellow.
The problem we have now is that pages are so dynamic that if their embedded twitter feed updates, the program sends us a "page updated" email. When all we want is the update of the schedules/calendars. Or we just get an embedded webpage of "gobbly gook."
We also tried creating "rss feeds" for some of these pages using Feedity -- since you can visually select the section you want updates on. But the problem with that is that it just sends us a link to the full webpage (no highlights) so we don't really know what changed. Which is time consuming.
Here are just two examples of sites that are no longer working in Check&Get:
Senate Armed Services Committee schedule -
New America Foundation schedule -
Since our non-techies will need to be able to add websites to it on the fly. We need something user-friendly for non-digit-heads (aka regular expressions can be difficult).