awasu.user wrote: ↑
Sun Jan 12, 2020 9:43 pm
I check out old Awasu API (Python 2) based library and I think about create this async based. Awasu limiting performance when using is API. Pending can block get data from Awasu. It is not desing to use API and update channels simultinously. I have to walk around.
Async IO isn't some magic pixie dust you sprinkle on things and everything suddenly runs faster
You have to understand the nature of the problem, and yes, sometimes async IO will makes things faster, but sometimes it will make things worse, and unfortunately, in this case, I think it will make things worse.
The underlying problem is that your computer is overloaded. If Awasu is running slow when you are issuing one API request at a time, changing your program to issue 100 requests simultaneously is not going to make things better, it's going to make things worse. You're asking an already-overloaded server to do even more work.
Awasu updating channels and handling API requests at the same time is fine, but you have so many channels, so much content coming in, it's a lot of work to handle it all, which means less time to do other stuff, which is why you get slow response times to API requests. If you disconnect from the internet, Awasu will notice that and stop updating channels - how does your webapp perform then (i.e. when it's only processing API requests)? Or set up a clean installation, as I described earlier, and only have 5 or 10 channels? Things are probably OK, right? So, it's only an issue when Awasu is working hard on other stuff. The code is pretty efficient, so it will use all
your CPU, all
your network bandwidth, all
your disk bandwidth during updates, which means that there's not much left for anything else.
When I build large systems around Awasu for clients, I generally have one (or more) machines running Awasu, which saves the incoming data in a database running on another machine, the search engine is on another machine, and then a web server for the front end on another machine again i.e. the load is spread over more hardware. But this kind of architecture is not suitable for you, so we need to find ways to make things run better, even with everything on a single machine.
The first thing to do is make better use of the API. Making a single API call to get the configuration for 500 channels is going to be significantly
faster than making 500 API calls to get each channel, one at a time. Check the documentation, and you'll see that most of the API allows you to do this kind of thing.
Another important technique for speeding programs up is to cache things. For example, if you issue a search request, then 5 minutes later issue the same search request, if Awasu hasn't updated its search index in the meantime, you will get exactly the same results back. IOW, you can just re-use the previous search results, because they're going to be the same. There isn't a way in the API to check if Awasu has updated its search index, but you can look for a "Updating the search index: completed OK"
message in the Activity Log (NOTE: This will only work if the user has configured their search engine to "restrict search index updates when channels are not updating", which is the default, but to cater for users who have changed this setting, you could also set a hard limit of, say, 5 minutes, after which you issue a new search request regardless).
Or if you want to get the latest content for a channel, check its "last updated" or "last new item" time. If it hasn't updated, or received a new item, since the last time you checked, you don't need to issue an API request to get its content, since it can't possibly be different to what you got last time.
Another thing you can do is to recognize that over 99% of the time, your webapp isn't doing anything. The user clicks a button, the webapp takes a few seconds to talk to Awasu and show the results, then the user spends the next 5 minutes looking through those results, and your webapp is just sitting there, not doing anything. If you can predict what the user might need in the near future, you can start issuing requests in the background, and in this case, it doesn't matter if they run a bit slowly, because the user is not sitting there waiting for the response.
So how can you predict what the user will need? Here's one example: if you know that you will be issuing requests to get the latest content for all the channels, you could have a background thread that slowly loops through all the channels, getting the content for each channel and saving it in a cache. Then, when the webapp wants to get the content for a channel, it just gets it from the cache. Yes, what's in the cache will be a little old, but is that a good trade-off for being able to instantly
get the latest content for a channel? Only you can decide that, but I would say "probably". And remember, it's only an issue if the channel has updated, and received new content, since you last refreshed your cache; if not, it doesn't matter if the cached content is hours old, it's still going to be the same as what Awasu returns to you if you made a new API call. Or, you might decide to do this for only "important" channels e.g. search channels.
Similarly, if you're issuing a search request, it's pointless to do that if Awasu hasn't updated the search index since the last time, since the results must
be the same as last time.
So, in summary:
- make better use of the API
- don't do work unless you have to
- if you have to do work, think about how you can cache the results so you don't have to do it the next time
It's about making the code smarter (e.g. realizing that if a channel hasn't updated since last time I checked, it can't have possibly received any new content, so the content must be the same as the last time I checked), and making trade-offs (e.g. accepting slightly less-fresh results in exchange for instant responses to queries).
However, note that you shouldn't go crazy implementing all the things I talked about here straight away. Get your new version working correctly with small data sets first, then try it on your full Awasu, and only then optimize the bits that need optimizing. Some of the techniques I talked about are a bit advanced, but it should give you an idea of the kind of things you can do to speed things up.