Page 2 of 3

Re: Any Python programmer want collaboration on Awasu?

Posted: Mon Jan 06, 2020 3:31 am
by support
awasu.user wrote:
Sun Jan 05, 2020 10:26 pm
I only think that better is at this stage code all project from scratch or better update existing structure. What do you think?
There is an idea in software development that you "build one to throw away", that is, you build a quick prototype to understand the problem, explore some ideas, then you throw it away and write it "properly".

You've built this to help with your job, so if it works, then it doesn't make any sense from a business perspective to spend time re-writing it. It does what you need it to, so it doesn't matter what it looks like inside.

However, this is also a learning project for you, to learn Python and all that other stuff, and because you're learning new things so fast and getting better so quickly, it can get very frustrating working with code that you wrote even a month ago, because you now know how to do it better :-) So, for the purpose of learning and getting better, it can be useful to write a new, "proper" version. You just need to be careful not to spend too much time continually throwing things out and rewriting everything, instead of adding new features and fixing bugs.

Short answer: up to you :-)
awasu.user wrote:
Sun Jan 05, 2020 10:26 pm
Could you explain me crash in more details?
The best thing to do would be to set up a clean installation of Awasu, and then you can test your code against that (i.e. what a new user will have).
* Copy your Awasu installation directory (e.g. C:\Program Files\Awasu) somewhere (e.g. c:\awasu-portable).
* In the new directory, create a file called _PORTABLE in it.
* Run awasu.exe (close your real Awasu, you can have only 1 copy of Awasu running at a time).
* You now have a separate version of Awasu running. The data files will be stored in c:\awasu-portable\users\YOUR-NAME - delete this directory to reset everything. The only things that are shared between the 2 Awasu's is what you configure in the Tools|Customize dialog.

Testing against this special Awasu will also make things easier, since you will have fewer channels, and so things will run much faster. It will also make your code better, since you will find bugs because files are not in their usual location.

NOTE: You don't actually need to make a copy of the Awasu installation directory. This page explains how to have multiple users, but what I described above is a bit easier.
awasu.user wrote:
Sat Jan 04, 2020 9:43 pm
Now I think about create API class to create better calling experience.
There is actually already a Python module that does this:
https://github.com/awasu/awasu_api/blob ... api/api.py
It's Python 2 and a bit hard-core, but it should give you an idea of how to structure things. You could replace AwasuApi.call_api with the code I posted earlier, and then implement all the API functions using it.

Re: Any Python programmer want collaboration on Awasu?

Posted: Sun Jan 12, 2020 9:43 pm
by awasu.user
Currently I'm plain new design. I want add some funcionalities like skeleton for analytics with Awasu. So at the summary up I think about create async news browser service based on Quart backed by SQLAlchemy for add support external databases if it will be need. For analysis I choose Spacy and Gensim and currently I limit languages to english. I'm waiting how Spacy will be evolve as it has plan add extra feature. For views I plan use Bootstrap 4 to add responsive. At this time it take a while when basic skeleton will be made.

I check out old Awasu API (Python 2) based library and I think about create this async based. Awasu limiting performance when using is API. Pending can block get data from Awasu. It is not desing to use API and update channels simultinously. I have to walk around. I plan some test to get how many API call can Awasu get before crash to set safe limits.

I would be apreciate for any suggestions.

Current road map is replace old desing with new one with adding control of edges like missing configuration data and flexibility in settings.

Re: Any Python programmer want collaboration on Awasu?

Posted: Mon Jan 13, 2020 2:32 pm
by support
awasu.user wrote:
Sun Jan 12, 2020 9:43 pm
I check out old Awasu API (Python 2) based library and I think about create this async based. Awasu limiting performance when using is API. Pending can block get data from Awasu. It is not desing to use API and update channels simultinously. I have to walk around.
Async IO isn't some magic pixie dust you sprinkle on things and everything suddenly runs faster :-) You have to understand the nature of the problem, and yes, sometimes async IO will makes things faster, but sometimes it will make things worse, and unfortunately, in this case, I think it will make things worse.

The underlying problem is that your computer is overloaded. If Awasu is running slow when you are issuing one API request at a time, changing your program to issue 100 requests simultaneously is not going to make things better, it's going to make things worse. You're asking an already-overloaded server to do even more work.

Awasu updating channels and handling API requests at the same time is fine, but you have so many channels, so much content coming in, it's a lot of work to handle it all, which means less time to do other stuff, which is why you get slow response times to API requests. If you disconnect from the internet, Awasu will notice that and stop updating channels - how does your webapp perform then (i.e. when it's only processing API requests)? Or set up a clean installation, as I described earlier, and only have 5 or 10 channels? Things are probably OK, right? So, it's only an issue when Awasu is working hard on other stuff. The code is pretty efficient, so it will use all your CPU, all your network bandwidth, all your disk bandwidth during updates, which means that there's not much left for anything else.

When I build large systems around Awasu for clients, I generally have one (or more) machines running Awasu, which saves the incoming data in a database running on another machine, the search engine is on another machine, and then a web server for the front end on another machine again i.e. the load is spread over more hardware. But this kind of architecture is not suitable for you, so we need to find ways to make things run better, even with everything on a single machine.

The first thing to do is make better use of the API. Making a single API call to get the configuration for 500 channels is going to be significantly faster than making 500 API calls to get each channel, one at a time. Check the documentation, and you'll see that most of the API allows you to do this kind of thing.

Another important technique for speeding programs up is to cache things. For example, if you issue a search request, then 5 minutes later issue the same search request, if Awasu hasn't updated its search index in the meantime, you will get exactly the same results back. IOW, you can just re-use the previous search results, because they're going to be the same. There isn't a way in the API to check if Awasu has updated its search index, but you can look for a "Updating the search index: completed OK" message in the Activity Log (NOTE: This will only work if the user has configured their search engine to "restrict search index updates when channels are not updating", which is the default, but to cater for users who have changed this setting, you could also set a hard limit of, say, 5 minutes, after which you issue a new search request regardless).

Or if you want to get the latest content for a channel, check its "last updated" or "last new item" time. If it hasn't updated, or received a new item, since the last time you checked, you don't need to issue an API request to get its content, since it can't possibly be different to what you got last time.

Another thing you can do is to recognize that over 99% of the time, your webapp isn't doing anything. The user clicks a button, the webapp takes a few seconds to talk to Awasu and show the results, then the user spends the next 5 minutes looking through those results, and your webapp is just sitting there, not doing anything. If you can predict what the user might need in the near future, you can start issuing requests in the background, and in this case, it doesn't matter if they run a bit slowly, because the user is not sitting there waiting for the response.

So how can you predict what the user will need? Here's one example: if you know that you will be issuing requests to get the latest content for all the channels, you could have a background thread that slowly loops through all the channels, getting the content for each channel and saving it in a cache. Then, when the webapp wants to get the content for a channel, it just gets it from the cache. Yes, what's in the cache will be a little old, but is that a good trade-off for being able to instantly get the latest content for a channel? Only you can decide that, but I would say "probably". And remember, it's only an issue if the channel has updated, and received new content, since you last refreshed your cache; if not, it doesn't matter if the cached content is hours old, it's still going to be the same as what Awasu returns to you if you made a new API call. Or, you might decide to do this for only "important" channels e.g. search channels.

Similarly, if you're issuing a search request, it's pointless to do that if Awasu hasn't updated the search index since the last time, since the results must be the same as last time.

So, in summary:
- make better use of the API
- don't do work unless you have to
- if you have to do work, think about how you can cache the results so you don't have to do it the next time

It's about making the code smarter (e.g. realizing that if a channel hasn't updated since last time I checked, it can't have possibly received any new content, so the content must be the same as the last time I checked), and making trade-offs (e.g. accepting slightly less-fresh results in exchange for instant responses to queries).

However, note that you shouldn't go crazy implementing all the things I talked about here straight away. Get your new version working correctly with small data sets first, then try it on your full Awasu, and only then optimize the bits that need optimizing. Some of the techniques I talked about are a bit advanced, but it should give you an idea of the kind of things you can do to speed things up.

Re: Any Python programmer want collaboration on Awasu?

Posted: Mon Jan 13, 2020 2:42 pm
by support
To add a bit more (because I didn't want to bog down the already-long post above), you absolutely do want to make the front-end Javascript async i.e. use Ajax.

The old-style way of building a webapp is, when the user clicks on a link, the browser sends a request to the backend, which generates an HTML page, which the browser then shows. The problem is that the browser will hang, waiting for the HTML page to come back, which is a problem if it takes some time.

The new style way that all the cool kids are doing is to use Ajax i.e. when the user clicks on a link, the browser sends a request to the backend, which sends back the result data (typically as JSON), and the frontend then uses Javascript to dynamically load those results into the page. The advantage is that the browser is still responsive while the request is in progress - you just make the HTTP request to the backend, and tell the browser: "call this Javascript function with the results, when they arrive", and in the meantime, the page continues to work.

Compare this with getting the backend Flask app to talk to Awasu using async IO. The problem is that when the browser calls your Flask handler, it cannot return until it has got the information it needs from Awasu. IOW, it doesn't matter if the Flask handler uses async IO to talk Awasu, it still has to block waiting for the response data, since it needs that response data in order to return the data back to the front-end. So, you might as well just call Awasu synchronously.

Re: Any Python programmer want collaboration on Awasu?

Posted: Mon Jan 13, 2020 2:46 pm
by support
awasu.user wrote:
Sun Jan 12, 2020 9:43 pm
I plan some test to get how many API call can Awasu get before crash to set safe limits.
Awasu should, of course, never crash :roll:, so if you're having problems, send the crash logs through and I'll see what I can do.

Any crashes will be because of what's happening at the time, not because of the number of API calls in progress. Indirectly yes, the more API calls you have, more stuff will be happening, and so crashes might be more likely, but restricting the number of API calls is not really a reliable way to avoid crashes.

Re: Any Python programmer want collaboration on Awasu?

Posted: Mon Jan 13, 2020 9:45 pm
by awasu.user
One of bottleneck is calling at start $/channel/list. I'm looking for smart way to avoid it and speed up running and showing data. Async look good here. As I wrote, I'm designing stuff to be better, not only working. I don't only know that is limit in internal design when API call. I can only said that I observe that updating is slowing getting info, but after some number of channels idle times are narrower and it's a real problem. It's about at the worst few minutes delay.

Re: Any Python programmer want collaboration on Awasu?

Posted: Tue Jan 14, 2020 1:31 am
by support
awasu.user wrote:
Mon Jan 13, 2020 9:45 pm
One of bottleneck is calling at start $/channel/list. I'm looking for smart way to avoid it and speed up running and showing data.
I had a quick look through the code and it looks like you're loading the folder structure from the user's config file (in get_nodes()), then loading channels by folder (in match_by_position()). It also seems to be getting channel names, again by folder, in get_channels_names().

First, you can get the full list of folders by calling $/channels/folders/tree, already organized as a hierarchial tree.

Second, you can call $/channels/list once at startup, with a verbose=1 parameter, which will return information about all channels, including which folders they're in.

As a quick test, I did this and it took about 4 seconds to run (I have ~650 channels). Calling $/channels/list 650 times, once for each channel, took about 20 minutes :-| (NOTE: These response times are too slow, I'll take a look at it).

So, you can see that making an API call is quite expensive, and you really want to minimize them, as much as possible. If you change the current startup code to make just these 2 API calls, you can reduce the startup time to a few seconds.

You could make things even faster by saving what the API returns in files, then read those files the next time you startup (which would be instantaneous). After the program has finished starting up and things have settled down, you then issue new API calls and overwrite the files, ready for next time). Again, trading slightly less-fresh results for speed, and an optimization for later (if ever).

Re: Any Python programmer want collaboration on Awasu?

Posted: Tue Jan 14, 2020 5:05 am
by support
support wrote:
Tue Jan 14, 2020 1:31 am
NOTE: These response times are too slow, I'll take a look at it.
On a freshly-restarted copy of Awasu with 646 channels, using Python to get the following URL (i.e. get all channels):
http://localhost:2604/channels/list?f=json&v=1
took 2.74s, of which 0.641s was spent inside Awasu.

Well under a second is not too bad, considering that the response is generated from a template and the resulting JSON file is almost 1MB.

To get a single channel using the following URL:
http://localhost:2604/channels/list?f=json&id=1915&v=1
took 2.03s, of which 0.004s (!) was spent inside Awasu.

Changing the Python code to use urllib.request instead of requests gave similar numbers.

Disabling parsing the received JSON didn't make much difference.

If you load the first URL into a browser, it clearly takes well under a second to load (most of which is probably rendering the result, anyway).

curl takes about 1.0s.

It looks like there's an overhead of over 1s per call in Python. I tried disconnecting from the internet, and disabling my virus checker, no difference.

I tried downloading the awasu.com home page using Python and curl, and while Python was a bit slower, it wasn't as bad as this.

Something's wrong. Sigh... :wall:

Re: Any Python programmer want collaboration on Awasu?

Posted: Tue Jan 14, 2020 10:27 pm
by awasu.user
support wrote:
Tue Jan 14, 2020 5:05 am
On a freshly-restarted copy of Awasu with 646 channels, using Python to get the following URL (i.e. get all channels):
http://localhost:2604/channels/list?f=json&v=1
took 2.74s, of which 0.641s was spent inside Awasu.
This API Call simplify things and get some usefull data. Only downside is time around one minutes to get data. For my machine (~5500 channels) is more related with HD speed. Awasu is very responsive to factor 3 of max read / max write speed compared to database file size. Above is start slowing down about 2-3 times.
support wrote:
Tue Jan 14, 2020 5:05 am
To get a single channel using the following URL:
http://localhost:2604/channels/list?f=json&id=1915&v=1
took 2.03s, of which 0.004s (!) was spent inside Awasu.
Response time for me it's bottlneck. From localhost I have slower response that from remote machine database access. At localhost latency is 1ms so it is ground to improvment. For me API should be very responsive on local as it is not make sense use it when you have to wait few seconds to get data. Direct access and dirty hacks to Awasu data can resolve issue, but it is not a point. Maybe something like threads on multicore machine? From decade the most computer has minimum two as standard. Multicore aproach can make app more responsive and not blocking when is updating.

Re: Any Python programmer want collaboration on Awasu?

Posted: Tue Jan 14, 2020 10:38 pm
by awasu.user
Is possible get by template timestamp? I tried:

Code: Select all

"published": "{%ITEM-METADATA% timestamp noCaption}",
I get formated date. I want consident, the same pattern for date.

Re: Any Python programmer want collaboration on Awasu?

Posted: Wed Jan 15, 2020 4:55 am
by support
awasu.user wrote:
Tue Jan 14, 2020 10:27 pm
Maybe something like threads on multicore machine?
Check the thread count in Task Manager. Awasu is highly multi-threaded and you won't find an RSS reader that updates channels faster. But if it's working really hard on this stuff, there's less resources available for API requests. You can throttle Awasu back, but you're not going to want that, either.

5500 channels is a lot, but the database size has nothing to do with returning the channel list. The first thing I would do is look at memory usage. If your machine is swapping, that would kill performance.

How long does it take to get the channel list in a browser, or using curl/wget? Does restarting Awasu make a difference?

If you look at the numbers I was getting: 0.004s spent inside Awasu, of a total response time of 2.03s, the problem is not necessarily in Awasu.

Re: Any Python programmer want collaboration on Awasu?

Posted: Wed Jan 15, 2020 4:56 am
by support
awasu.user wrote:
Tue Jan 14, 2020 10:38 pm
I want consident, the same pattern for date.

Code: Select all

"published": "{%ITEM-METADATA% timestamp format="..." noCaption}"
where "..." is a standard strftime-format string e.g. "%Y-%m-%d %H:%M:%S"

Re: Any Python programmer want collaboration on Awasu?

Posted: Thu Jan 16, 2020 7:30 am
by awasu.user
support wrote:
Wed Jan 15, 2020 4:55 am
CODE: SELECT ALL

"published": "{%ITEM-METADATA% timestamp format="..." noCaption}"
where "..." is a standard strftime-format string e.g. "%Y-%m-%d %H:%M:%S"
Thank a lot. It is a simplify a lot of things.
support wrote:
Wed Jan 15, 2020 4:55 am
5500 channels is a lot, but the database size has nothing to do with returning the channel list
I think size is related to performance as it is hard drive I/O. I see difference on fresh and after few days of using. I has a lot of more RAM than Awasu can consume as 32bit app.
support wrote:
Tue Jan 14, 2020 5:05 am
Changing the Python code to use urllib.request instead of requests gave similar numbers.
It is a response issue. Sometimes is related to busy stage of Awasu. Without start timer I see difference when Awasu is not updating and when do it. I start resolve this issue by backgrounds taks and async connection to Awasu server.
support wrote:
Mon Jan 13, 2020 2:42 pm
The new style way that all the cool kids are doing is to use Ajax i.e.
My first desing was walkaround the problem. First load data, wait for it and without delay browse them. Problem is with JS. Browser can't handle it and it is slow things down. It have to be server more oriented and less JS and frontend presentation layer.
support wrote:
Mon Jan 13, 2020 2:32 pm
Async IO isn't some magic pixie dust you sprinkle on things and everything suddenly runs faster
Not correctly. I do not think about old design with add only await in code with async in def. I think more call more at once, and using asyncio / multiprocessing to gather results. This way the slowest response is the worst scenario. It is not sum up. When you get data then you simply will be use them without delay. I'm coding new API around this way. I think how resolve async and sync in the same design, but I think I find out solution.
support wrote:
Mon Jan 13, 2020 2:42 pm
so we need to find ways to make things run better, even with everything on a single machine
My goal is another - run with lighting speed on few years old laptop.
support wrote:
Mon Jan 13, 2020 2:32 pm

Another important technique for speeding programs up is to cache things
It's very good advice. I think about it and resolve issue with SSD - to minimalise HD read/write and more consume RAM to get speed and at the same place not too much to get space for analysing data. NLP is ver inteinsive task.
awasu.user wrote:
Sun Jan 12, 2020 9:43 pm
However, note that you shouldn't go crazy implementing all the things I talked about here straight away
You know I'm going to be crazy further than you suggest :D It's only way to be better and improve skills. I will spend some times now, but when I finish desing it will be save them in the future and it is good target to follow.

Re: Any Python programmer want collaboration on Awasu?

Posted: Sat Jan 18, 2020 9:14 am
by awasu.user
I start test new api for calls. I tried async calling and I am not impressed. It's look like I'm stuck in sync mode. Is possible call Awasu at the same time multiple or not? It can be something wrong in my design too, but I want to be sure that what I want achieve is possible. My average call for folder data is around 2,5 call/sec.

Re: Any Python programmer want collaboration on Awasu?

Posted: Sat Jan 18, 2020 11:50 am
by support
awasu.user wrote:
Sat Jan 18, 2020 9:14 am
Is possible call Awasu at the same time multiple or not?
You can, but this is what I was talking about before. If things are slow when processing a single request, issuing ten simultaneous requests is not going to make things faster. There are certain situations where the overall experience might be better, but this is not one of them.

Awasu has a lot of internal tables that need to be accessed in a thread-safe way, and so if you have multiple requests in progress, they might start interfering with each other. Certainly, if Awasu is busy, then it becomes much worse (and Awasu is often doing work, even if it doesn't look like it's doing anything). And having a UI also makes things much worse - Awasu Server is much more responsive to API requests. But for the desktop version, the UI will always be given priority over API calls - you don't want the UI hanging because somebody is calling the API.
awasu.user wrote:
Sat Jan 18, 2020 9:14 am
My average call for folder data is around 2,5 call/sec.
Keep in mind that the tests I did before suggest that there is some problem with calling the API using Python. How many calls per second do you get if you use wget/curl? What version of Windows are you using (I tested on W10, but I don't see this slowdown on my main dev box, which is still W7).