About the threading model in PyDS

a picture of myself

Münsterland.org

note
If you read this text some time before, you might remember it being a bit differently. This is because the hierarchical threading scheme didn't work out as expected. A result was a very unstable PyDS due to some problems with the tool initialization/deinitialization. As a solution to that problem I threw out most of the complicated threading stuff and am currently in the process of rewriting it to a much simpler model.
hint
There is another article - Toolserver - with an implementation of medusa and multi threading. That one implements many of the ideas given here.

PyDS uses medusa as it's core web server. Medusa is a very fast singlethreaded server that uses the asyncore module and a channel model to server web pages. This is very responsive, as there is not much overhead - files can be directly served from the filesystem to the browser, many requests can be handled simultaniously without the need for multiple threads. This is really great for static stuff.

It has it's problems with computed pages, though. There are solutions: change your computations to a coroutine or generator model where only part of the computation is done per invocation and build some pseudo channel that can be used in Medusa. This is quite complicated, especially if you don't have full control about the computations invoked. For example you might rescale images on the fly to their denoted size, this might take far too much time to be done online and can't be broken into smaller chunks, as you use PIL and can't look inside the C-Functions. The problem with long running computations in the core Medusa framework is that they block the server from responding to new requests. So you need a way to push them off to free up the incoming requester again!

The solution to this problem is threading. You need a way to have working threads that deliver the output of your computation. You need dispatching methods to decide what worker should do the work. You might need additional threads for background activities like pulling down large datafiles from the web, reorganizing databases, doing backups or stuff like that. You might need to have different execution context for threads so multiple personalities of the same application can be hosted in one running server instance.

The Python Desktop Server has all of these problems, so I can present solutions to all of them here. Isn't that fun? Smiley

There is an additional complication: PyDS uses Metakit as it's database. Metakit doesn't allow multiple open filehandles to the same database with concurrent write access, so you need to make sure that only one thread accesses one database at any given time!

PyDS itself has a very modular design. Functionality is distributed over a load of core modules that set up a runtime environment. These core modules do mostly server initialisation, handling of XMLRPC and SOAP, authentication management, integration of external libraries (like docutils for restructured text). Built on top of those core modules are a large number of tools that implement the whole functionality of the Python Desktop Server. To get an overview over available modules and their function read the file OVERVIEW in the source distribution of PyDS.

One requirement by several tools is background work: several tools need to have something happen in the background. This might be downloading of data or rendering of output, or it might be something completely different - it's only qualification is that it takes up a long time to compute and so can't be done online. And that it's computational result isn't needed immediately.

Last but not least every tool manages it's own database for the relevant context. No direct access to tools databases are allowed, all access goes through exported methods of the tool instance.

A last speciality are RPC calls into the system. Although they enter the system via the main requester, they need to be moved to the background, too: they can take as long as interactive work takes. They are just the same internal methods, only called from the outside, so the same restrictions apply.

So now we have the following situation:

  • medusa needs a main thread for it's running servers
  • users need threads where each request is served
  • tools need background threads to do longrunning jobs that don't need interaction
  • RPC calls need to be pushed into background
  • to reduce thread load, inactive threads should be taken down

The way this is solved in PyDS is as follows:

  • there is a central request dispatcher that pushes requests to the background for execution
  • background request handlers use a select trigger (a pipe that is hooked into the asyncore.loop) to trigger their output.
  • static stuff is handled directly in the foreground medusa thread, as they don't take up much resources
  • the request queue is managed with Condition variables - threads are in waiting state and are fed by notify calls from the main.server context
  • tools that are fired up start their own background threads and communicate with them via their own queues (some tools use permanent queues whose content survive PyDS shutdowns).
  • request handlers that are inactive for some time are shut down

All this comes true with 0.7.1 - older versions did use a different model without background request handling. So interactive stuff that needed too much time blocked the server. This was unacceptable for multi-user variants and so I changed it.

Oh, and it was a weird fun to rewrite all the inner parts of PyDS to accomplish this, as multithreaded programming with large thread numbers, dynamically loaded and unloaded threads and a singlethreaded web server thrown in can be really demanding! Even in the worst of all possible results - that I had to throw away all this - I at least should understand multithreading much better now ... Smiley

last change 2004-09-28 11:42:24

This is the Python Desktop Server weblog.


(Donations will be used by the author to buy stuff, fullfill selfish wishes or do other silly recreational things. You have been warned.).
The PyDS is
OSI Certified Open Source Software

Python Powered

XML-Image

© 2007, Georg Bauer