So I finally have what I consider to be the minimum – bare minimum, mind you – bar of the 3rd Space gaming framework finally passed. What I have working is a Predator/Prey simulation using the Boid rules for modeling flocking behavior. Boids were invented by Craig Reynolds to model the flocking behavior we see in birds, fish and other herd animals. The rules in Boid are actually butt simple and all local – i.e. simulating the flocking behavior only relies on the emergent behavior of the individual’s action, not some “higher authority” guiding the individuals, keeping them in formation.
In any event, what you find if you look for any Java code which implements Boids, the seeming universal implementation is to actually keep a global list of the boid positions – i.e. an implicit “global” which really shouldn’t be present. The reason why this is so is, of course, because Area Of Interest management (AOI, for short) is a tough thing to do, even if you are simply running the simulation in a single process.
Predator/Prey Boids, using Voronoi based AOI management in an event driven simulation
I really needed a non trivial simulation to use as the driver test case for the 3rd Space gaming framework I’ve been working on. Flocking behavior is, as one expects, quite common in games and is something that the human players do quite frequently as well – they don’t call humans sheep for nothing. As it turns out, managing the area of interest for simulated entities and human’s avatars is rather difficult precisely because of this tendency to flock. Moving, flocking simulated entities is therefore the de facto test case that one should be using as a model when developing the basics of a large scale, distributed simulation and gaming infrastructure – i.e. if you can’t handle this, then you have no hope of handling anything more complicated.
Same as above, revealing the voronoi overlay used to maintain AOI.
So, it’s kind of cool. I now have a non trivial simulation which runs under both the event driven framework and makes use of the voronoi based AOI management. Next step is to now distribute the simulation using Coherence so I can see how well my theories of how to massively scale this framework will work.
When you’re trying to build a massively multiplayer online gaming platform (MMOG), probably the most important part of the system is scalability. After all, if it doesn’t scale, it’s simply a multiplayer online gaming platform – without the “massive”. While it almost seems embarrassing to point this out, it’s extremely interesting to note that there have recently been a lot of discussion about scalability of online systems – in particular, the Web 2.0 applications. I won’t point to these discussions, but suffice it to say that I find it terribly amusing to hear the various forms of the argument that you can worry about scalability later – i.e. it’s not something that has to be designed in from the start. (Arguments of the form “don’t worry about scalability because no one is going to use your application anyway” are perfectly fine, however). As the history of MMOG has shown, the application’s architecture has a huge impact on the ability to scale. As many gaming platforms have discovered, scalability isn’t something you can simply “add on” after you “get things right”. Anyone who thinks that this doesn’t apply to other network application architectures amuse me to no end, given as if they actually produce something of value, it will fall over when it hits the natural scalability limit of their crappy architecture.
In any event, there’s a couple of basic problems with MMOG that limit scalability. The first has to do with what is known as “Area Of Interest”. The idea here is familiar enough to anyone who has done any distributed communication in that the gaming platform doesn’t want to find itself in an N2 connection topology. In MMOG, the entities (gamer avatars, NPC, etc) have to communicate with other entities in the game. If you can’t find a way to limit the communication to the entities in the area of interest – i.e. the other entities that the entity in question is limited to communicating with – then you have a huge scalability issue due to sending messages to entities that simply don’t care about the communication because it can’t possibly affect them. This not only wastes bandwidth and precious OS network resources but causes a host of other issues having to do with the time ordering of distributed events and filtering our events that aren’t relevant. It’s a mess.
Well, that was fun. I just spent a decent chunk of my weekend swapping out the fundamental mechanism in the Prime Mover event driven simulation framework. Between taking care of the little one and the massive allergy attack I was suffering (thanks, Mom Nature!), I think I should get an award or something. Wait! Here’s a gold star I can put on my laptop case.
In any event, the changes to the underlying framework are actually quite cool. What had happened was that Prime Mover was finally getting used in anger – well, gently played with, really – and this exposed some serious short comings in the data flow analysis I was using to perform the byte code rewriting which made the magic happen under the hood. Over a few beers at Bucks on Friday afternoon, we mulled over a few different strategies for fixing the issue – none of which were particularly appetizing.
So, let me step back a bit and lay out what happened to the framework and why.
Most everyone who works with server infrastructure has figured out how to use NIO – in one form or another – to transform the standard blocking threaded server model into a far more scalable beast. Without going into boring details (see here, for example, if you need to investigate), the executive summary of the transformation is that previously servers had to pretty much dedicate a thread per inbound socket which would basically block on the socket between requests, waiting in desperate silence for some bytes to show up from the client. The reason why this had scalability issues is that the server had to dedicate a thread per client and threads are really frickin’ expensive resources – especially if most of the time they are blocked waiting for the client to do something.
What NIO allowed server architects to do was – at the simplest level – dedicate only a single thread to handle the waiting for a client socket to become active. Once a client socket was found which needed attention, this socket could be handed off to a thread pool where a thread could be picked up to handle the incoming request. This has a quite amazing effect on the scalability of the server as most of the times clients don’t do anything at all – something captured by “think time” in simple benchmarks. And since you’re blocked doing nothing, you can’t use the valuable thread resource to actually accomplish work for some client which actually does need your attention.
Now, this is actually pretty good stuff and has enabled infrastructure to scale far beyond where the old, blocking, architecture could ever touch. However, what’s pretty much universal in these server architectures – and perhaps is a well known dirty secret – is that once you get into the actual request processing, the system is back into the old same blocking regime. You can see this trivially, for example, in the Servlet API. When you – the application code – try to read some bytes from the client using the ServletInputStream, you’ll find that if the client hasn’t sent the bits, or the network is slow in getting them, or any number of reasons, your code will block waiting for those bits to show up. Likewise, if you try to write some output bytes to the client, using your ServletOutputStream, you’ll find that your code will be blocked until the OS can buffer those bytes for later output to the network. If those buffers are full you’ll simply block until they become available, or the client disconnects.
This isn’t any big revelation, but it is surprising to listen to people wonder why we can’t use NIO to take care of these issues so we can have even more scalability in the server. I mean, if the API is blocking – as all java.io.Stream APIs are – it’s kind of baffling to think that someone believes that we can magically make that blocking action go away and somehow eliminate the thread which is holding the execution state of the application code (and server invocation code) up until the point where the blocking read or write occurs.
Well, I’m pleased to say that I believe that I’ve discovered a way to magically make the blocking reads and writes “disappear” without having to change any of the JEE Servlet APIs. Well, “discovered” really isn’t the right word, seeing as how the technique is not really any new revelation to people “in the know”. However, doing this magic for pure and simple Java without redefining the existing APIs, nor requiring the application code to be rewritten in any way is something that I think is pretty magical and certainly at least deserves the characterization of a “discovery”. Further, this technique can be applied to JDBC or any other high value blocking API to transparently transform it into a threadless, non blocking API.
I’ve been having a great time working on my MMOG framework, 3rd-Space. Since December, I’ve been focussing on the actual event driven simulation framework, as this framework is key to the system’s performance and scalability. The work is based on the simulation framework described by Rimon Barr in his thesis, An Efficient, Unifying Approach to Simulation Using Virtual Machines – something he calls “JiST” (Java in Simulation Time). It’s a very cool event driven simulation framework that uses Java, itself, as the simulation scripting language. He turns the Java classes into a simulation by transforming the Java classes using a byte code rewriting framework. The result is a very easy to use, completely type safe simulation framework that completely blows the doors off the closest competing C++ event simulation frameworks in terms of raw performance and scalability. The performance is so good because the framework uses the Java virtual machine as the engine for running the simulation.
So, as many of you may already know (and if you don’t, then you’re way out of date, dude), Oracle acquired the Tangosol company and their Coherence product some time ago. Over the last half year, I’ve been working on some very cool stuff related to the whole data grid aspect of Coherence which I must say is pretty much the most bitchin’ stuff I’ve done in quite a while.
Well, that was until I started my hobby project.