Python Desktop Server Weblog 1.11.2003

a picture of myself

Münsterland.org

2003-11-01

The global interpreter lock in Python

One example why I think it isn't helpfull (I don't think it's catastrophic, not even annoying, but it is noticeable): I have written a distributed synchronizing filesystem in Python (don't ask). It's a P2P model - all machines are in the same class. Any machine that's changed a file automatically distributes that file to all other associated machines. Machine links can be defined in hierarchic or network topologies. It's quite a nice example that Python definitely is useable for systems programming, even though many people think I am crazy like shit Winky

In this case machines do lots of synchronization calls - for example when somebody pushes in large data chunks via rsync or ftp uploads.

My machines at work are all multiprocessor machines - I don't buy any single processor machines for our productive environment any more. They should be able to handle the load quite nicely - but due to the GIL they don't handle it as good as they could. Synchronization is something that should run as fast as possible, since any second it takes longer opens up the window where changes hit the machine while distributing other changes - and this sometimes adds up to an arseload of work to do for the machines.

While the synchronization is running, there are several threads active that push stuff to other machines and pull stuff from other machines and react to local change notifications. But it doesn't fully use the processors - it's not much faster than a single CPU system. And if this happens, it's mostly the synchronization that's running. Ok, the other CPU does usefull work, as there are additional processes on that machine (and that's why I say it's not catastrophic, just noticeable).

And in this case it could actually be rewritten with multiple processes - only that this doesn't work on systems without fork (and that the IPC currently works with shared data structures). I currently don't support non-unix machines, so I could change, but it's still problematic if you need to support for example windows. There are other problems with my P2P system (it's called suckfs for a very good reason ...), so I don't think it's the best show for GIL problems, but it shows a scenario where those problems are enforced.

There are other areas where I noticed that Python doesn't scale as well as it should - but don't care that much, as there are better ways to solve that problem. For example large Zope servers with needs of many concurrent hits (many in the range of 1500-2500 hits per minute). Of course this is solved with multiple machines and multiple process groups on a single machine and a central ZEOD installation, as that is the best way to do this anyway Smiley

But I don't think we should forget about that GIL. It was shown with the Linux kernel that one big fat global lock isn't really that good an idea - more locks on finer grained levels are a much better one. I think that Python could really gain from reworking this problem, as there are situations where you get into trouble, or at least don't pull as much out of your MP machine as you would like.

This post references topics: python
posted at 10:37:04    #
 

RSS feeds with doctype

Ed Taekema found a feed that can't be parsed by PyDS' rss parser. This feed has one speciality: it includes a doctype definition for rdf:rss (the top tag). I am not sure wether this is correct: shouldn't it be a doctype declaration only for rss? The problem is, PyDS uses sgmllib to build a parser that is a bit tolerant about broken or invalid feeds (it's not the ultraliberal feed parser - when I started with PyDS, Marks parser was under GPL and so couldn't be integrated into the MIT/X licensed PyDS). But sgmllib barfs about the : in the doctype and so PyDS refuses to parse that feed.

Any hints on what to do? Throw out any doctype definitions, since they aren't used for parsing and so don't carry needed data? Change the parser? Change the feed?

The feed validator doesn't like the entity definition, so the feed might be broken. But is it broken enough that I can reject parsing it?

This post references topics: python software
posted at 10:07:12    #
November 2003
MoTuWeThFrSaSu
      1 2
3 4 5 6 7 8 9
10111213141516
17181920212223
24252627282930
Oct
2003
 Dec
2003

This is the Python Desktop Server weblog.


(Donations will be used by the author to buy stuff, fullfill selfish wishes or do other silly recreational things. You have been warned.).
The PyDS is
OSI Certified Open Source Software

Python Powered

XML-Image

© 2003-2007, Georg Bauer