“Continuing” 3rd Space With Prime Mover

The Prime MoverI’ve been having a great time working on my MMOG framework, 3rd-Space. Since December, I’ve been focussing on the actual event driven simulation framework, as this framework is key to the system’s performance and scalability. The work is based on the simulation framework described by Rimon Barr in his thesis, An Efficient, Unifying Approach to Simulation Using Virtual Machines – something he calls “JiST” (Java in Simulation Time). It’s a very cool event driven simulation framework that uses Java, itself, as the simulation scripting language. He turns the Java classes into a simulation by transforming the Java classes using a byte code rewriting framework. The result is a very easy to use, completely type safe simulation framework that completely blows the doors off the closest competing C++ event simulation frameworks in terms of raw performance and scalability. The performance is so good because the framework uses the Java virtual machine as the engine for running the simulation.

I spent a while looking around for simulation frameworks in Java and doing more than a little bit of research on the state of the art, and it’s pretty clear that this framework beats the rest of the crowd hands down. From just the point of usability, it’s a real treat to simply use standard Java messages as the events between simulation entities. A simulation entity is simply a Java class which implements a marker interface – in JiST, this is the “Entity” interface. Classes which correspond to the simulation entity have certain restrictions enforced by the framework – e.g. the class must not have any public fields, no static fields and all public methods must be void return.
These restrictions allow the simulation framework to effectively encapsulate the object graph associated with the entity – something extremely important as any state shared between entities would allow them to communicate outside of the event framework, which is strictly forbidden (i.e. it completely screws the simulation). The public void methods of the entity form the signatures of the events the entity understands – i.e. events in the framework are simply void methods which are sent between entities. This means that all events are type safe and checked at compile time in the framework.
By using void methods as the events, it’s very easy for the framework to use the void return result as an asynchronous method send. In the transformations done by the framework, the message send site which corresponds to the event is turned into the equivalent of a proxy call in which the actual message send – arguments and all – are bundled up and put on the “event queue” for the simulation. The method “sending” the event continues execution, perhaps emitting more events or sleeping (i.e. advancing the simulation time) before emitting these events.
So, very cool so far. Just the system described so far provides a very powerful event processing framework. The events are completely decoupled from the sending stack and allows the system to schedule the events for future execution. But one issue with exclusively using events as your communication mechanism is that it forces you to rewrite any blocking activity in your simulation as state machines. For example, let’s say you’re modeling a TCP communication channel which blocks the sender until the bytes supplied have been processed by the end point. This “blocking” behavior is typical, for example, of the pre-NIO stream model in Java.
Now, having to rewrite your simulation to use a state machine to model this blocking behavior kind of bites. It’s not that it’s terribly difficult, but it is a transformation which a lot of people seem to have trouble with. A real world example of this is the kind of transformations that one has to perform on old style threaded servers which used blocking I/O when they are converted to use the post JDK 1.4 NIO, event driven I/O framework. It’s something that still causes programmers problems – sometimes a lot of problems. Basically, I find that your average programmer has enough problems thinking about single threaded systems and the simplicity of blocking metaphors makes things a whole lot easier to comprehend and is a natural way to model.
To make a potentially long story short, Rimon Barr’s JiST framework allows the modeler to annotate events as “blocking”, which means that the sending method “blocks” until the entity receiving the event is finished processing the event (note that the sending entity can still receive other events and process them – the sending entity, itself, is not blocked. The only fly in the ointment here is that when you block waiting for a method to finish executing, you have a Jave thread stack which is waiting for the method to return. This is unacceptable for a number of reasons. For example, this makes it difficult to schedule the event – the actual blocking event may have to be scheduled into the future, for example. But the biggest problem with having a Java stack around is that doing so simply kills scalability.
Again, a good analogy is to consider the reasons why high performance network servers use event driven, NIO architectures rather than blocking threads. Threads are expensive resources and if you’re wasting these resources doing nothing more than waiting then you’re making extremely poor use of these very expensive resources and your dreams of scalability have just been blown to smithereens.
So, what Rimon did was transform (with the simulation framework’s byte code transformations) these blocking calls into what is commonly known as Continuations. When a method encounters a blocking event, the framework transforms this to save the method’s current state – i.e. local variables and stack – into a “frame” object. Once the state is save, the method exits at this point with a “dummy” return value. When the blocking event is processed, the simulation framework re-enters the method, re-establishes the state of the method at the time of the call and “continues” with the execution of the method.
Very cool.
Naturally, this means that any method calling a method which can be “continued” will be required to itself be “continued”. This means that there’s a non trivial amount of analysis of the event interaction (i.e. method call flow analysis) that makes this all possible. Read Rimon’s thesis for all the juicy details – you’ll love it.
In any event, I’ve got my own implementation of this simulation framework working and working with a fair amount of functionality. I’ve named my embodiment of the system Rimon described in his thesis as “Prime Mover”, for what I hope is obvious reasons. Currently all the transformations required for the asynchronous event processing is done and initially tested. I have just finished the first pass a the continuation transformations and am starting to fully flesh these out.
The effort has been pretty interesting and quite challenging. I’m using ASM for the byte code rewriting and analysis. This has forced me to become quite familiar with ASM and I must say that I’m quite pleasantly surprised with the framework. Other than the fact that they only provide a highly optimized version of their binaries which makes it impossible to debug through their code when things go wrong – and considering all the freedom to screw up that you have when operating at the byte code level for transformations, that’s pretty often – ASM is a dream to use.
As I mentioned at the beginning, the Prime Mover event driven simulation framework is going to key to the scalability and performance – not to mention the ease of use – of the 3rd Space MMOG platform. The use of continuations will allow the platform to simulate many thousands of blocking processes with a few threads in much the same way as non blocking NIO network server architectures can handle many thousands of “simultaneous” clients using a single thread. The raw speed of the event processing itself will provide a robust basis for the low latency and high throughput that will be required to support hundreds of thousands of online users – simultaneously – in the same virtual space. The fact that the framework is simply Java means that highly sophisticated tools already exist to help the virtual world modeler design, debug and implement the simulations which comprise the online game.
All in all, a very cool start for my hobby.

One thought on ““Continuing” 3rd Space With Prime Mover

  1. Non Blocking JDBC, Non Blocking Servlet APIs and Other High Mysteries

    Most everyone who works with server infrastructure has figure out how to use NIO – in one form or another – to transform the standard blocking threaded server model into a far more scalable beast. Without going into boring details…

Comments are closed.