|
|
The base problem of PyDS is that it needs to do several things, some of them quite compute intensive, some of them very light. The light part should be the web interface - this should be just the GUI part and nothing is worse than a slow and sluggish GUI. So we need a fast webserver and leave the sluggishness to the webbrowser PyDS has a very simple internal structure: there is the webserver (built on medusa) and then there are the tools. Every tool is in reality just an URL handler for medusa - if the URL starts with the toolname as first part, the tool will handle this URL. URLs are directly mapped to methods, and this is done by name: so if you see a index_html URL, you can be sure that there is a corresponding method in the code. Forms go to _redir methods, that themselve return 302 errors with redirections to _html methods. So external methods are allmost allways _html if you can see them and _redir if you POST to them. The redirection is done so that users can't accidently resubmit forms by just pressing the reload button in their browsers - this prevents accidently double transactions. The same problem arises with proxies - they sometimes do the same request more than one time because of timeouts on the first requests. So the GUI is built mostly of _html methods. The actual processing is triggered by _redir methods. Of course those _redir methods don't directly do the work that needs to be done. Instead they hand over tasks to other methods. Here the XMLRPC and SOAP interfaces and the template language come into play: all methods that do stuff are divided into two groups: internal and external callable methods. All internal callable methods start with _ in their name. All methods without _ in their names front can be called as macro code, or can be called via RPC methods. This gives tools a very nice structure, it's very simple to build new tools: just code your HTML interfaces as _html methods, hook up forms to _redir methods and put the code into normal methods of the tool and call those from the interface methods. As a side effect you get a callable API for your tool that other people can use in template codes or other tools. But this has one problem: most tools need to do stuff that's very compute intensive. PyDS is about rendering content, and rendering content involves reStructuredText conversion, Cheetah template interpretation and other stuff like this. Sometimes even picture management is needed, or files need to be fetched from the Internet. So you don't want to have all this happen in the GUI thread - medusa is a single threaded server, so if you do compute intensive stuff, you need to be very fast on what is actually computed. The reason is that medusa can't answer requests while you compute! Of course there is a nice solution to this and for example Twisted works like this (and had I known about Twisted when I started PyDS, I might even have used it!): break up your computations into smaller chunks and triggern them from the main event switch. What you do is create abstract work queues that are filled with small snippets of what to do and pull these through the main event dispatch - this ensures that while you compute, you still can serve requests. The problem with this is, you need to be able to break up your code into small enough snippets. This usually is possible if all you do is defined by you. But if you use external modules and some of them are as compute heavy as cheetah or reST, you get into trouble: your smalles chunks of computation that you are able to define might be to large. This is an area where I think that threads are actually very usefull in coding. Threads have the tendency to be overused - most people see concurrency in some form and just pull out threads as the tool of choice. And later on they crash fullspeed into thread problems like stomping over each others data structures or stalling the computation by overusing locks. PyDS actually does the second one: overusing locks. More on that later. Threads give a nice way to push stuff into the background. Especially they don't use many system resources - threads in Python are mapped to native threads under Linux or to usermode threads under Darwin, for example. Both forms of threads don't use much OS resources. Threads can communicate with each other via data structures - since all threads share the same data structures, you don't need complicated interprocess communications. This is very nice if the shared data is quite complex. Of course you can do things like this with processes, too (put data structures into shared memory, for example), but it's allways much more work then with threads. So the decision for PyDS was simple: each tool should have it's own thread. Compute intensive stuff should be pushed into the background using this thread. I decided for one thread per tool for one simple reason: some tools produce a big fat arseload of background events. Those would consume up queue space and would delay other tools background tasks, just because they come in big chunks. So I would have to add priorities for those command queues to allow some work chunks to propagate faster. By implementing a thread-per-tool policy I circumvented this - each tool has it's own thread that works as fast as possible on stuff. Since threads don't take up much resources this doesn't pull down performance much - non-working threads just sit there and sleep. The main thread (the GUI based on medusa) communicates with those threads via command queues. There are currently two kinds: transient and persistent queues. The transient queues just exist in memory and are lost when the running instance crashes. The persistent ones are stored on disc. Some tools share a common thread - in that case several incoming queues are worked on by one tool. This is the RenderTool, a tool for rendering simple content. Since allmost every tool has one thread, there can be concurrent calls into one tools methods. Since threads are tool local, they usually just work with tools methods - usually there are several internal methods that are pushed onto the command queue and called from the tools thread. But those methods call other tools methods as well, so in reality there might be several tools all calling into some central tool. One of those tools is the UpstreamTool for upstreaming content, or the EventTool for event logging or the PreferencesTool for user preferences. Now there is one other speciality of PyDS: it uses metakit as it's database. Metakit is a very nice, fast and stable embedded database. The only data crashes I ever heard of happened with other peoples data, not mine (and not even PyDS data) Now comes the time where I have to confess: I overused locks. Yep, I did it: I attached a lock to each and every database. Actually there are some more, but those are for shared data structures (like command queues). The central problem, though, are those pesky tool-database-locks. I protected every internal method that makes use of the database by those locks. This leads to situations where those locks block other functionality. And it's quite bad if you hit central tools like the EventTool: a tool renders content in the background, so the GUI isn't slowed down. But this rendering triggers some events to be logged, some preferences to be read and some databases to be accessed. The GUI might want to access the same areas - maybe you try to read the event log, or your _html methods needs some preference. This is where PyDS locks up the GUI for some time. It's annoying, as you don't have direct feedback on what happens - you wrote a posting some seconds ago, so you know that's what triggers the sluggishness, but it's still annoying. This is an area that will be attacked with 0.7. The first thing to do is go through those lock protected areas and try to find out which ones create problems and which ones need to be resolved. Actually it's much like the locking stuff in older linux kernels preventing high SMP performance. Other areas that will be addressed in 0.7 will be the very crude class hierarchy. For example all tools inherit from StandardTool and modify that behaviour. But there are some behaviours that are the same in many different tools - I plan to move those to mixin classes, so that tools can inherit from several base classes to build a better tailored class to start with. So would PyDS gain from an asynchronous model like Twisted has? I don't think so. PyDS has far too much compute intensive work to do. Remember the rendering stuff - we have one area where content is rendered online, that's in the wiki. Currently WikiTool doesn't cache rendered pages and so every time you look at a wiki page on your desktop server, it will be rendered. Do that with large texts and you will understand - it easily consumes up 10 seconds, even on faster machines! Reworking all tools into a event structure requires to break down those work chunks into smaller chunks that can be called from the event dispatcher. This is possible, of course. But it would require very different tool structures. last change 2003-11-10 15:26:08 |
This text is here to tell a bit of inner workings of PyDS and why they are structured the way they are.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||