Programming is fun. That’s why I decided it would be a good career for me. I love solving puzzles, and I love the concept of making computers, these mysterious boxes of my childhood, do things. However, when you finally get to working as a professional programmer, the types of projects you get to work on are often not as interesting as you’d like. This is why I am interested in AI. AI presents new and unique challenges, and techniques of approaching these challenges. I said in a previous post that AI was more commonplace than you’d think, but didn’t really follow it up with suggestions of where you can work on AI related programs.
I’ve recently stumbled on an interesting AI competition (before it got big with the video – I’ve been writing this post for about a month), based off of an open source Java project called Inifine Mario, called Mario AI Competition 2009. The competition is to create an AI agent that can navigate randomly generated Mario levels, pitting each agent against each other. The first round of this competition has already closed, however there is another round that closes September 3.
I won’t be attending the conferences nor will I be submitting agents to the competition. I will however work on a few agents, as this is probably the most fun application of AI that I’ve seen in a while, and it’s a small enough environment to enable me to get very familiar with it very quickly, and hence get into making solutions very quickly.
On the surface, this seems a trivial, niche problem. An agent that can navigate a video game level doesn’t really have a lot of diverse applications. However, it’s a great way to begin practicing some fundamental concepts of AI. The parameters of the competition don’t limit the techniques you can use to create the agent. In the video below, Robin Baumgarten is using A* path finding techniques.
However, A* is just the start, and is a fairly basic approach (though it works pretty well by the demos), and there are many more options, including machine learning and neural networks. Unfortunately, I won’t be covering any of that yet. The goal of this post is to get myself, and anyone else interested, up to speed on the environment, and the basics of creating an agent. I will be creating a very simple reactive agent that has no planning, and is quite flawed. However, you should be able to walk away with a firm grasp of how the environment works and what tools you have at your disposal.
Before I go on, a disclaimer is necessary: I haven’t worked with Java since my university days, so about 6 years or so. I will post some code, and that code will probably suck. You have been warned.
The Game Environment
The details on the code available from the competition website are fairly vague, so I’m posting this to perhaps help some people who are interested in getting into it, and also for my reference to help while working on it. Firstly, obviously, you’re going to have to download the code available on the competition website. The code is relatively stable at this point, however it was constantly being updated with fixes for bugs posted on the Google Group for the competition. These changes would be posted on the group, so I’d check it regularly, or grab it from Google Code with svn, and just update your repo regularly. Once you’ve downloaded it, open it in your IDE/editor of choice, and roll up your sleeves.
Firstly, I’m going to reference classes by their fully qualified class names; ie: ch.idsia.scenarios.Play, which references Play.java located in the src/ch/idsia/scenarios folder. This way you can play along at home.
According to the online documentation, you can fire up a run by typing
java ch.idsia.scenarios.Play
from within the “classes” directory. This will start a copy of the Infinite Mario game with a human player. To start with a pre written agent, type
to start with an agent that will run forward and jump as soon as it can.
The ch.idsia.scenarios.Play class is a simple class that sets up the agent, whether it’s human controlled or automated, and then it sets some play environment variables and runs the application. If an argument is supplied, it will attempt to instantiate an instance of the class and pass it through to the play environment.
I won’t be covering the the details of the simulation code, as it’s all pretty logically separated and you don’t really need to know how to do that. Information is passed through to the agent through an implementation of the interface ch.idsia.mario.environments.Environment. This argument to the getAction function provides the agent with a serialised view of the terrain and the location of enemies, along with information about Mario. Basically, any information you can get by looking at the scene.
This information is provided in the form of a byte matrix, through several functions. observation.getLevelSceneObservation() will return a matrix with details of the terrain. It will also tell you what the type of the terrain is, such as a pipe or a one way platform. This detail is found in ch.idsia.mario.engine.LevelScene. You can see the number codes for the different types of terrain. It’s broken down into gaps, hard borders, half borders and flower pots.
The observation.getEnemiesObservation() is very similar to observation.getLevelSceneObservation(), except it addresses enemy locations. Again, you’ll also get the type of the enemy from the matrix in the form of an integer code, which you can find in ch.idsia.mario.engine.sprites.Sprite.
I’ve written a debugging function that will output both the enemy and terrain matricies to give you a code view of what’s going on. It’s interesting and enlightening to watch the code view of the simulation next to the simulation itself, and it will help us write code to fit our design. The function is as follows:
byte[][] levelState = observation.getLevelSceneObservation();
byte[][] enemyState = observation.getEnemiesObservation();
//Debug information
for (int y = 0; y < 22; y++)
{
for (int x = 0; x < 22; x++)
{
//Check for enemies first
if (enemyState[y][x] > 0)
{
String enemyValue = Integer.toString(enemyState[y][x]);
int padding = 3 - enemyValue.length();
if (enemyState[y][x] == 1)
System.out.print("mmm");
else if (padding == 2)
System.out.print(" " + enemyValue + " ");
else if (padding == 1)
System.out.print(" " + enemyValue);
else
System.out.print(enemyValue);
}
else
{
//Check for terrain next
String terrainValue = Integer.toString(levelState[y][x]);
int padding = 3 - terrainValue.length();
if (padding == 2)
System.out.print(" " + terrainValue + " ");
else if (padding == 1)
System.out.print(" " + terrainValue);
else
System.out.print(terrainValue);
}
System.out.print("|");
}
System.out.print("\n");
}
System.out.print("---------------\n");
I put the above code into a function in the agent itself, and called the function from the getAction function of the agent, purely because I know this is going to get called once per tick. You can put it wherever you want to put it.
Creating a Dumb Agent
As I mentioned at the start of this post, I’m just going to start with creating a very simple, dumb agent that will attempt to traverse the level, and even then not get it right all the time for reasons I will cover. On the surface, Mario’s tasks for navigating a stage without enemies is pretty straightforward. All he needs to do is to get himself around the level and jump over pits.
However, sometimes Mario can find himself falling into a gap, or over-jumping his mark and jumping into a hole. So basically, we need to check if there’s something in Mario’s path that needs to be jumped over, or if there’s a gap that we need to jump over. But, we can’t just jump – we also need to check if jumping is going to put is into a gap and kill Mario, and if Mario runs off of where ever he is, he might fall into a hole.
This would be fairly straight forward if we had a robust method of predicting where Mario’s jumps will take him, but unfortunately we don’t officially get this information.
There is a movement algorithm in ch.idsia.mario.engine.sprites.Mario, which starts at the move method, however, not only is this not in the spirit of the competition, but more importantly, it’s very hard to figure out, for me at least. I’ve not been able to reliably modify the movement code to act as a predictor of Mario’s path if you plug-in how long to hold each key for.
Hence, at this stage, we’ll just be making Mario go right and try and jump over obstacles. Seems fairly simple.
You’ll see in the provided agents that the function getAction is the heart of the agent, and an instance of the Environment interface discussed before. This function is the work horse of your agent, as it will tell the simulation what the agent is going to do.
Although the observation.getLevelSceneObservation() and observation.getEnemiesObservation() methods return the full view of the level, all the features are jumbled together. Later on, we’ll need to know what’s a cannon and what’s a pipe, and what’s raised ground, and maybe even what’s a block we can punch. We’ll need to know this because cannons shoot Bullet Bills and enemy flowers live in pipes, and these can appear at any time, and if Mario is near by, he’s going to get hurt.
I’ve created the following function to categorize the environment into pipes, cannons and pits:
private Boolean[] pits;
private Vector cannons;
private Vector pipes;
private void processTerrain(byte[][] state)
{
//Init features
pits = new boolean[22];
cannons = new Vector();
pipes = new Vector();
//Init pits
for (int i = 0; i < 22; i++)
{
pits[i] = true;
}
for (int y = 21; y >= 0; y--)
{
for (int x = 0; x < 22; x++)
{
//Look for gaps
pits[x] = pits[x] && state[y][x] == 0;
//Look for cannons and pipes
if (state[y][x] == 20)
{
boolean pipeFound = false;
//Check if x,y is in pipes
int pipeLength = pipes.size();
for (int pi = 0; pi < pipeLength; pi++)
{
int[] coords = (int[])pipes.elementAt(pi);
if (coords != null && coords[0] == x && coords[1] == y)
{
//Pipe found, skip it
pipeFound = true;
}
}
if (!pipeFound)
{
//Peek ahead to determine if it's a pipe or a cannon
if(x < 21 && state[y][x + 1] == 20)
{
//It's a pipe, add it and the next one to the pipe array
pipes.add(new int[]{x + 1, y});
}
else
{
//It's a cannon
cannons.add(new int[]{x, y});
}
}
}
}
}
}
The three variables are defined as private members of the agent, and calling this function will populate them with tile locations in the case of pipes and cannons, and tile columns in the case of pits. You don’t have to do this with such a simple agent – you can simply use similar techniques to parse over the whole level state.
I use this information to determine how far away obstacles are. At first I was going to make the distance between obstacles affect the jumping distance and height, however without an accurate way to determine where jumps are going to land, this can actually cause more harm than good. So I ignore it, and just use a fairly blunt approach.
Firstly, another helper function:
private int setDeltaForObstacles(Vector tiles)
{
int length = tiles.size();
int buffer = 0;
int localDeltaX = 0;
//Loop through every tile
for (int counter = 0; counter < length; counter++)
{
//Get coords of tile
int[] coords = (int[])tiles.elementAt(counter);
//Calculate the delta, then check to see if it's good enough
localDeltaX = coords[0] - 11;
if (action[Mario.KEY_RIGHT] && coords[0] > 11)
{
//If the tile is after mario, and we're heading right, delta is good
break;
}
else if (action[Mario.KEY_LEFT] && coords[0] < 11)
{
//If the tile is before mario and we're heading left, add to buffer
if (coords[0] > buffer)
buffer = coords[0];
}
else if (action[Mario.KEY_LEFT])
{
//If the tile is after mario, but we're heading left, get the last tile loc from the buffer
localDeltaX = buffer - 11;
break;
}
}
return localDeltaX;
}
This function takes a Vector and will return the closes tile to Mario, depending on his direction. It’s a very coarse way to determine the closes obstacle to Mario. Bear in mind that it doesn’t recognise when obstacles are put close together, thus doesn’t optimise Mario’s jumping.
The goal of the agent is to jump over the closest obstacle. That’s it. Considering our obstacles are now in three different datastructures (cannons in one, pipes in another, and pits and raised terrain in the original byte array), we need to store the delta (distance between Mario and the closes obstacle) and update it only when we’ve found a smaller delta. Also, I want to keep the delta of any pits separate, as I’m going to have different rules for pits than other obstacles (because pits are the only thing that can kill Mario at this stage).
To this end, I create two integer variables at the top of the getAction method. They are intitialized to 11 because this is where Mario is:
int deltaX = 11;
int deltaJump = 11;
Here is the code to extract the delta of the nearest obstacle, and the delta of the nearest pit. Note that levelState is the results from a call to observation.getLevelSceneObservation:
//Analyse pipes
int pipeDelta = setDeltaForObstacles(pipes);
if (pipeDelta < deltaX && pipeDelta > 0)
deltaX = pipeDelta;
//Analyse cannons
int cannonDelta = setDeltaForObstacles(cannons);
if (cannonDelta < deltaX && cannonDelta > 0)
deltaX = cannonDelta;
//Analyse the gaps
boolean gap = false;
for(int y = 0; y < 22; y++)
{
if (pits[y])
{
if (action[Mario.KEY_RIGHT] && y > 11)
{
int localDelta = y - 11;
if (localDelta < deltaJump)
deltaJump = localDelta;
}
else if (action[Mario.KEY_LEFT] && y < 11)
{
int localDelta = 11 - y;
if (localDelta < deltaJump)
deltaJump = localDelta;
}
gap = true;
}
}
//Analyse ground above Mario
boolean raisedGround = false;
if (action[Mario.KEY_RIGHT])
{
for (int x = 12; x < 22; x++)
{
for (int y = 12; y >= 0; y--)
{
if (levelState[y][x] == -10)
{
int localDelta = x - 11;
if (localDelta < deltaX)
deltaX = localDelta;
raisedGround = true;
break;
}
}
if (raisedGround)
break;
}
}
Once we have the deltas, we can use this information to decide what to do. I mentioned above treating obstacle deltas different from jump deltas. This is an attempt to make Mario take smaller jumps when he can see a pit is after an obstacle. This is controlled by two instance members that keep track of jump height and jump length, called jumpHeight and jumpLength respectively. This is very hit and miss. Notice that Mario does not respond to anything more than 4 tiles away, and tiles that are behind him. This is to prevent him from going jump crazy:
jumpHeight = 7;
jumpLength = 14;// * (deltaX / 11);
if (((pipes.size() > 0 || cannons.size() > 0 || raisedGround) && deltaX >= 0 && deltaX < 5))
{
if (observation.mayMarioJump() && !action[Mario.KEY_JUMP])
action[Mario.KEY_JUMP] = true;
}
//If mario is jumping, there must be an obstacle in the way
if (action[Mario.KEY_JUMP])
{
if (gap && deltaJump >= 3)
{
jumpHeight = 7;
jumpLength = 1;
}
}
//No obstacles, but there is a gap
else if (gap && deltaJump >= 0 && deltaJump <= 3)
{
if (observation.mayMarioJump() && !action[Mario.KEY_JUMP])
action[Mario.KEY_JUMP] = true;
}
maintainJump();
Finally, the maintainJump method is to ensure we’re holding down the jump key for the optimal amount of time. It is also coded to provide rudimentary control over jump height and length, however this isn’t being taken advantage of.
private void maintainJump()
{
if(action[Mario.KEY_JUMP] && (jumpCount > jumpHeight))
{
action[Mario.KEY_JUMP] = false;
jumpCount = 0;
}
// otherwise you're in the middle of jump, increment jumpCount
else if (action[Mario.KEY_JUMP])
{
jumpCount++;
}
if (jumping)
{
if (action[Mario.KEY_RIGHT] && (rightCount > jumpLength))
{
action[Mario.KEY_RIGHT] = false;
rightCount = 0;
}
else if (action[Mario.KEY_RIGHT])
}
}
}
The complete agent is at the bottom of this post for you to download and include in your project. Simply put it in the path src/timgittos, configure the ch.idsia.scenarios.Play class to have no enemies, with options.setPauseWorld(true), and run any level you want (passable result with difficulty 3). Then build the project and run it with java ch.idsia.scenarios.Play timgittos.BlogAgent.
As you will observe, the agent will make Mario go towards the right as fast as possible, jumping over obstacles as it meets them. For the most part, it works well enough. However, there are quite a few fatal flaws, and I will explain why it fails in this way.
Most obviously, the agent frequently falls down gaps. It doesn’t fall down all gaps – which would indicate that it’s not a failure in the gap detection, but rather something else. In fact, what’s going on is that the gaps that Mario tends to fall down are those that enter his field of consideration (4 tiles ahead of him) while he’s in the air. The agent makes no effort to check whether it’s jump will land Mario in a pit (since it can’t predict where Mario will land), and my previous efforts at trying to make the agent force Mario to the left when it detects he’s going to go down a pit have failed.
Mario will also fall down pits if there are multiple pits in a row. Again, this is because the agent doesn’t exert much control over the size and length of Mario’s jumps ,and so can’t pin point that it wants to land in the middle of the two pits, so it tends to overshoot the jump and fall down the second pit.
Most annoyingly, the agent will sometimes get stuck. If it reaches an obstacle, such as a cannon or a bit of wall or a pipe and jumps at the wrong angle, it will enter a weird wall jump loop, where it jumps, gets stuck on the obstacle, jumps off, jumps again, and gets stuck again. This will loop until there is no time. This is a side effect of the environment allowing wall jumps, which is cool when you can get the agent to make Mario jump out of a pit he’s falling into, but annoying when he gets stuck.
In any case, this agent is far from complete, and just serves to demonstrate how the simulation environment works, and how to interact with it. Once you’re clear how to work with the simulation, you can concentrate on building a useful agent, either by starting from scratch (like you would if you were going to use a machine learning approach), or by building on top of this base (for a determinate approach like A*).
Firstly, a disclaimer: I don’t have a lot of experience working with AJAX enabled WCF services, and from reading some of Rick Strahl’s posts on using JQuery with ASP.NET, I’m doing things in a really hackish and terrible manner. Hopefully my mistakes won’t impact the usefulness of this short tip.
When working with JQuery and AJAX enabled WCF services, I recently encountered the following error, through Firebug and Fiddler:
The maximum string content length quota (8192) has been exceeded while reading XML data.
This quota may be increased by changing the
MaxStringContentLength property on the XmlDictionaryReaderQuotas
object used when creating the XML reader.
This is caused by WCF webHttpBinding default having fairly draconion limitations on XML message length and depth. This thread on MSDN helped diagnose the problem, however it didn’t do much to help me solve my particular problem. Ultimately, I stumbled upon the solution quite accidentally.
Firstly, as per the MSDN thread, you need to add the custom binding configuration underneath the configuration section:
At first I thought this was a custom binding, and tried to replace the service endpoint binding to the custom binding defined above, however this was causing an “System.ServiceModel.ServiceActivationException” exception.
The above binding declaration is actually a configuration for the webHttpBinding. As such, you need to add the BindingConfiguration property onto your service endpoint, along side your binding declaration. This is as follows:
With Snow Leopard being released at the end of September this year, Tiger is definitely starting to show it’s age. I’ve been putting off upgrading to Leopard for quite some time, not being able to justify the down time associated with upgrading an OS. This has had an unpleasant side effect of not being able to do things quite as easily as I want, and finding information is hard.
Due to the recommendations in the comments of my Ruby CMS article, I have decided to start playing with the Camping Ruby microframework backed by CouchDB. Getting Camping up and running was easy enough, and I managed to get some trivial functionality going. Camping uses SQLite for it’s database, which is fine if I wanted to massage my data into a relational form. However, I have models with optional attributes, and rather than making nullable database types, I want to leverage CouchDB’s schema-less, RESTful document storage. Each persisted object in my application will be available on a unique url, so CouchDB is a nice fit for what I want to do.
The Great Installation
I found this great article about getting Camping talking to CouchDB, and it seemed easy enough, so I decided to try it. Since I already had Camping working, all I needed was to get CouchDB installed on my Macbook. As much as I’m not scared of compiling applications from source, I’ve had issues in the past with OS X and compiling, so I decide to find a simple package or disk image to install. I found CouchDBX, however checking the requirements showed that it supports Leopard only.
With a little more research, I found out I had to use Macports to install CouchDB. Macports relies on XCode Tools which come available on the DVDs that came with OS X. If you haven’t previously installed XCode Tools, then you should install them before you install Macports. Something that is buried in a support ticket, however, is that Macports targets XCode tools 2.5. My XCode tools was something like 2.4.8. While I did manage to get Macports installed, the installation for CouchDB fails when it tries to build tk. If you, like me, have an older XCode Tools than 2.5, you’ll need to download 2.5, which you can find deep inside the Apple website. This does require an Apple Developer Connection account to get, so if you don’t have one, you’ll need to create one.
Installing CouchDB is as simple as issuing
sudo port install couchdb
and Macports will take care of the rest. CouchDB lists it’s dependencies as Spidermonkey, Erlang and a few others, however each of those has dependencies, and each of those dependencies has dependencies, and it ends up taking quite a long time and downloading a fair few packages. The upside is that if you didn’t have Erlang previously installed, you do now. Erlang is something I’ve been wanting to take a look at.
Start The Server Before You Leap
While the article I linked to above about getting Camping and CouchDB talking is good, it’s not exactly the most verbose explanation especially if you haven’t read the documentation for CouchDB and prefer to just dive in. For those who are still a little unsure like I was about how exactly to do this, I’ll outline my approach below.
Before you get started writing any code at all, you need to start your CouchDB server. This is something that escaped me for the longest time, and when I finally twigged, it gave me a bit of grief. To start CouchDB, issue the following command:
sudo couchdb
If you try this without escalating privilege, it will return an error “{“init terminating in do_boot”,{{badmatch,{error,shutdown}},[{couch_server_sup,start_server,1},{erl_eval,do_apply,5},{erl_eval,exprs,5}, {init,start_it,1},{init,start_em,1}]}}”, which searching for will bring you to the CouchDB wiki page for error messages. This page will tell you the problem is an unavailable port. For me, at least, this is not true. CouchDB runs on port 5984 which as far as I know is not used for any system services or other software. The reason I was getting this error was that I wasn’t running it with enough privilege.
Once you get CouchDB started, you’ll notice it blocks the terminal. I don’t like to have too many windows open at once, so I’d prefer to have it run as a background process. Fortunately CouchDB will let you do that with a command line switch option, along with changing the location of the pid file. I’ll leave figuring out how you want to start it up to you.
Once you have CouchDB started, you’ll have access to a web-based administration panel called Futon. Futon is available to you at http://localhost:5984/_utils/, assuming you’re running CouchDB on your local host. The Futon utility will allow you to create databases and insert documents in preparation to test your connectivity with Camping.
Camping Time
There were two Ruby gems I had considered when deciding how to connect Ruby to CouchDB. I should point out that due to the RESTful nature of CouchDB, you strictly don’t need any Ruby gems and could roll your own fairly easily, if you wanted to. I didn’t want that level of control, personally. The two gems I considered were CouchRest and RelaxDB. CouchRest is a simple Ruby wrapper around the CouchDB REST API, whereas RelaxDB is more abstracted, bringing ActiveRecord-like functionality to CouchDB. While before I would have chosen RelaxDB, I’m intentionally trying to work a little closer to the raw APIs here, so I chose CouchRest.
You can install CouchRest through RubyGems:
sudo gem install couchrest
however I found that the gem version wouldn’t work and kept throwing errors when I started my Camping application. To overcome this, I just cloned it from the GitHun repository and placed it in my Camping app directory and referenced it that way.
The first thing to do would be to create a normal Camping sample application, if you haven’t done already. For the purposes of this post, I’m going to use the “skeletal Camping blog” application used in the documentation, and edit that to work with CouchDB. I figure this will give everyone a relatively common base from which to start. Import CouchRest just below where the application imports Camping:
require 'camping'
require 'couchrest' # or 'couchrest/couchrest' or similar if you've cloned from Github
Given that the purpose of this article is to demonstrate connecting to CouchDB, and not the design of a framework around it, we can safely ignore defining a model for now, and rely on CouchDB as our model. So if you’re following along from the example application, you can safely delete the classes from Blog::Models module. We’re not going to delete the whole module, because of the create method.
The Camping create method that is run when the server starts up your Camping app. From the documentation, “This is a good place to check for database tables and create those tables to save users of your application from needing to manually set them up.” Instead, we’re going to use this method to set up CouchRest in our application:
module Blog::Models
def Blog.create
db_url = 'http://localhost:5984/'
storage = CouchRest.database("#{db_url}blog")
Blog::Controllers::Index.set_storage(storage)
end
end
Next, because we’re now calling a new method on the controller class, we need to modify that. Change the Blog::Controllers::Index class to the following:
module Blog::Controllers
class Index < R '/(\w+)'
def Index.set_storage(storage)
@@storage = storage
end
def get(id)
@post = @@storage.get('posts/' + id)
render :index
end
end
end
What I’ve done here is given the controller class a static reference to my CouchDB database (from the create method). Then the controller was changed to respond to a regex, which is passed into the get method. We use the id passed in to retrieve a document from CouchDB, which we assign as an instance variable.
Lastly, we need to modify the Index function view to pull data from the CouchDB database:
def index
h1 @post[:title]
end
This will output the title of the test document created at the start of the article.
If you haven’t already, start up Futon and create a database for the purposes of this demo. Call the database “blog”, and fill it with a test document with the id of “posts/test”, and at least a “title” property with the value “Test Post”, and any other properties you wish.
Start the Camping application with camping blog.rb, and point a browser to http://localhost:3301/test, and you should see “Test Post” in a header tag, rendered out to the browser, which means that Camping has successfully communicated with CouchDB.
I didn’t cover putting data into CouchDB, however you would do it in a similar vein, writing a post method inside your routes to create data and save it to CouchDB. From here, personally, I’m going to create some wrapper classes for requests and models and build up something which is a little DRYer, however as far as a demo, this is good enough.
I’ve got a few posts planned about AI and HCI, however these require a lot of time to research and write code for, time that I don’t have given I’m flying to the US on the 21st of April. So I apologise to people who saw my last AI article on Hacker News and Reddit and subscribed to my feed expecting more of the same. There will be more, I promise. I just need more time. For now, just some musings.
I’ve been thinking a lot lately about solving problems. I read a number of programming aggregators and startup blogs, and the term “solve a problem” keeps nagging at me. To be successful, at programming and business, you need to solve a problem. I’d love to eventually run a development company, but I don’t know what problem to solve. So, I consider the problem my current employer is solving.
I currently work for a web development company that is doing fairly well. We have a suite of products that we sell to our clients, and sell services related to those products. They’re doing well, so obviously, they must be solving a problem. So, from “client” point of view, the problem we solve is that of letting people who don’t know anything about websites build and run complex websites.
We have a CMS product that allows non web geeks manage websites. This includes writing new pages and editing content on existing pages, publishing news items, distributing these to an email list, managing uploaded files and inserting modules of dynamic content to their hearts desire. The modules we offer include contact forms, random image rotators, resume upload controls, auto-generating bread crumbs and menus, and so on. These modules aim to reduce the complexity of managing dynamic sites, so that ordinary people can do it.
On top of that, we do custom applications built on top of that CMS site. Clients have a site they run themselves with the CMS, but some also have dedicated applications with a separate interface. These are separate because they’re custom jobs. They provide functionality that is not available in the CMS (yet), or functionality that would be too complex to maintain using the CMS. So again, we’re reducing the complexity of running a custom application.
So, on first looks, we build websites and applications that non-geeks can use. But looking further, we manage complexity. We reduce the complexity for the client, so that the client can do difficult things with ease, and don’t have to know how it’s done, only that they can do it.
On the other hand, good programming is about managing complexity. That is, trying to keep your code as free from complexity as you can get away with while still doing the job. Complex code is buggy and unmaintainable.
Complexity exists, and it is immutable. Once you’ve simplified a task to the point that the only steps involved are those that accomplish that task, and that task cannot be accomplished without those steps, you have your complexity, and it isn’t going anywhere. The task cannot be made more simple, because to do that we’d have to leave out steps, which would not accomplish the task.
What we do from then on is delegate complexity, shift it around and make it other people’s problem. Take for example, the task of creating and publishing a blog post. Our software has tagging, categories, post titles and post content. Creating a post involves writing a post, giving it a title, adding tags, selecting categories and publishing it.
Breaking it down, we need to create a body, create a title, create all categories, create all tags, link the tags to the post, link the post to the categories and publish it. Each link has a display name and a URL slug, and each category has a display name and a URL slug. URL slugs cannot contain certain characters, however their display names can.
Now, we can delegate this complexity to the user, which will result in simple code. The user will be responsible for creating tags and categories, and giving them URL slugs that fit our rules, while the application will just take the data and store it, and retrieve it when asked. The user will be responsible for assigning tags to the post and the categories, and the application just stores it. This is obviously an unacceptable level of complexity for the user.
We could also delegate all the complexity to the application, making the user experience simple. The user just types in the tags they want, picks the categories, and the application will link the tags if they exist, create them if they don’t, automatically strip unwanted characters from URL slugs and hide a lot of the complexity. In more complicated cases, this will result in very complex code that will be bug prone.
So the question is, where do we draw the line? What balance of code complexity and user interface complexity is successful? I guess you could say “make it 50/50″ and think that’s good enough, but I don’t agree. I think the balance is very contextual, and often leans in favour of code complexity. In simple examples like above, taking on all the complexity of the task in the code is a trivial thing and easily done. For nuclear reactor control software, things might not be straight forward.
This is the key to a successful software product/business, I think. Finding the right balance between complexity. If you put too much complexity on the user, user’s won’t be able to use it, and your software won’t sell, and the whole thing will bomb. If you put too much complexity on the software, it will be buggy and difficult to maintain, and will eat up development time and money, and possibly even end up costing more than it earns, and then it will bomb. That’s not even including the more personal aspects of programmer job satisfaction and sanity.
Success is finding the sweet spot when balancing complexity.
Updated on May 28, 2009, due to partial fix. The fix should now be a complete fix.
–
Apologies for the somewhat confusing title.
I’ve been doing a lot of jQuery work, replacing all the AJAX.NET crap in my project at work to speed the interface up. I’ve started using the awesome jQuery Context Menu plugin. However, I found it was also messing around with jsTree which I am using on the same page.
The problem is that the jQuery Context menu overzealously unbinds all click events from the document. This means that it’s going to break all your other Javascript that relies on clicking. Checking the documentation for jQuery’s unbind function, you can see that it accepts a function as a second parameter. This is the function that will be unbound if you pass that in as a parameter.
So, I fixed the jQuery Context Menu by changing the following:
In the main contextMenu function, near the end, just above the return statement, I added the following:
// External click event for document
function onDocumentClick(e) {
var menu = $('#' + o.menu);
$(document).unbind('click', onDocumentClick).unbind('keypress');
$(menu).fadeOut(o.outSpeed);
return false;
}
Next, I change the function that assigns the click listener to the document to assign the defined function, instead of an anonymous function. So, change:
// Hide bindings
setTimeout( function() { // Delay for Mozilla
$(document).click( function() {
$(document).unbind('click').unbind('keypress');
$(menu).fadeOut(o.outSpeed);
return false;
});
}, 0);
to
// Hide bindings
setTimeout(function() { // Delay for Mozilla
$(document).click(onDocumentClick);
}, 0);
Lastly, I remove all instances of:
$(document).unbind('click');
That’s it. That fixed the problem for me. You’ll notice I don’t bother with the keypress bindings. That’s because I’m not using them, so I’m not bothered. I’m assuming the fix for that will be a similar strategy, for those who are bothered.
I’ve uploaded my version for those who can’t get it working using the tips in this post, but remember the copyright remains with Cory S.N. LaViska over at A Beautiful Site. jquery.contextMenu.js
For the longest time, I have been interested in artificial intelligence. The idea of computers that could make decisions and think independently fascinated me. Of course, I knew that AI wasn’t quite that advanced, but the idea still captivated me. However, the barrier to learning AI was too high. It was too complex and too academic. Just looking at the mathematical notation involved made my head spin.
I eventually took the plunge and convinced my family to buy me Artificial Intelligence: A Modern Approach for Christmas, and 12 months later, I actually started reading it. I’m still reading it, and will be the first to admit I don’t understand everything that is discussed, however there are a few things that I’m realising about AI.
It’s All About Searching
Searching is a constant in AI techniques. Problems are defined and solution spaces are created. Problems can be represented in a number of ways (graphs, trees, logic knowledge bases), however in the end, it always comes back to searching.
This is important because searching is easy. Searching is something that a lot of programmers already know, even if it’s only at a most basic, brute force level. We do searching all the time. We loop through arrays searching for values, we use regular expressions to match string patterns, we retrieve records from databases. We search.
At it’s simplest, you can search using brute force searching, iterating over all combinations and permutations of solutions looking for one that satisfies the problem, but beyond that, you can involve tricky heuristics to make optimal decisions about how to search. You can have local searches, which will pick a solution space and search it for local maximums, such as hill climbing, and searches that will find global minimums, such as simulated annealing. But it’s still search. It’s still something you can do.
It’s More Common Than You Think
AI is a hell of a lot more prevalent than most people realise. I didn’t think AI had much commercial application before I started learning about it, but luckily that didn’t dim my interest. For those who are interested, but holding back because they don’t see how it would benefit them, here’s some good news. AI techniques are used everywhere.
That international 4 city flight you booked used AI techniques. A constraint satisfaction problem solver took all the constraints about needing to be at this city by that time, flying on this airline for that much, and creating a plan for you. When Amazon recommends products you might be interested in, it calculates this with Bayesian networks and classifiers, a method of probabilistically linking a set of variables. Circuit design, product manufacturing, supply chain optimisation, all of these things use techniques that AI use. They’re not the sole domain of AI, but learning AI will cause you to learn these too.
You Can Use It Today
Whatever you’re working on, you can probably use AI techniques in it. Even some of the more exotic sounding techniques like neural networks can be of use to you. Self healing databases? Hell yeah. Even Bayesian classifiers to catalog and categorise products, heuristic searches to mine data in databases, hierarchical task planners to plan that holiday or manage that Gantt chart.
Can you use it anywhere? No. Your simple CRUD app probably won’t benefit from a wizz-bang heuristic search. But if you’re doing anything that involves large amounts of data, interacting with people, predicting trends and recognising patterns, you can use it.
It’s In Demand
You may have heard of the Netflix Prize. Guess what? That’s AI. Google is the biggest search engine around, and index billions upon billions of pages on the internet, and can get you relevant results to a question in a matter of seconds. That’s one hell of a big knowledge base, and one smart search algorithm. Amazon sell products all over the world, and aggressively upsell and cross sell. I get emails about related products I might like based on my wishlist and purchase history, and they’re actually pretty accurate.
Also, computer games. Enough said.
AI skills are in demand. Not huge demand, but probably more that you would have guessed. These skills are hugely profitable in the right hands, and big companies want to extract every single little morsel of useful information about your browsing, shopping, eating, travelling, viewing and reading habits in order to market to you more effectively. Now that sounds a little creepy to me, but if that doesn’t bug you, more power.
There’s a Lot of Information Out There
AI isn’t some weirdo niche science topic. There’s actually quite a lot of information out there, once you start going down the rabbit hole. “AI: A Modern Approach” cites hundreds of papers and books. There’s thousands of websites out there on the subject. There are many academic papers that are made available for free online. There are communities, like AIGameDev, dedicated to spreading that delicious knowledge.
I think the hardest part about finding information is getting the terminology. It’s pretty dense when you first get into it, especially when talking about acyclic directed graphs, and your idea of a graph was like mine was about a year ago, namely a few bars on a 2D Cartesian axis. But once you’ve got a foot in the door of the lingo, it can become pretty accessible, and information starts becoming more bountiful. That foot in the door can be either a good, basic website, or in my case, a university level textbook designed to introduce people to AI.
AI is a big field, full of fascinating and interesting concepts and techniques, and it’s a young field that’s still full of potential. It’s not as complex or confusing as film and television would have you think. That’s not to say it’s a walk in the park, as I stated earlier, I’m probably running a 70% rate of understanding what I’m reading, but I’m managing. And if I can manage, so can you. So if you’re interested, there’s no better time to start than now.
For the last couple of months I have been writing a series of posts describing a personal Ruby on Rails based CMS. I have been writing tutorial style posts outlining what I was doing, and why I was doing it. Don’t bother trying to look for those posts, because I’ve archived them.
This decision was pretty easy after I slowed down and reviewed the situation objectively for a few minutes. That clarity, coupled with a few truths that hit home while reading the source of a few other Rails applications pretty much sealed the deal.
The next post in the series was going to be an overview of another Ruby CMS project, BrowserCMS. I saw a video from RailsConf about this project, and recognised a lot of their goals as coinciding with mine. So I was going to poke through their code, see how they’ve done things, compare it to how I did the same things, or was planning to do them.
While I was reading through their source, it occurred to me just how much I don’t get Rails. I have a bit of a poke around with scaffolding and get familiar enough with the generators and I decide that I’m ready to tell people how do create a CMS in Ruby, because I’ve nearly finished creating a CMS in ASP.NET.
It’s clear to me, now, how silly that was.
Rails isn’t just a framework for Ruby, it’s a whole change in paradigm. Intellectually, I know Rails is “opinionated software”, however I don’t think I understood what that really means. I tried to make my code as flexible and configurable as possible, and I was struggling with all but the most basic CRUD tasks.
So, first I realised I wasn’t ready to tell anyone how to do anything in Rails.
But I have a series of about 5 posts doing just that. What do I do with those? Do I continue on, plugging away at learning while trying to instruct? I decided the answer to that depended on what I had achieved with my posts. What was the value of them?
Then I realised that I had covered very little other than simple CRUD. Sure, I had some unusual object relations, like trees and self referential comments, and I implemented some business logic like compiling pages into templates, however the total sum of the non-CRUD related content could have fit into 1 post. So I spent 4 posts blathering away at how to achieve the same task as running script/scaffold.
What’s worse is that I realised I was back in the rut of creating a CMS. How on earth did I do that again, after stating loudly and proudly that I don’t even like programming websites.? Well, I told myself, you are planning on using this on your own sites, your many, many online business ideas. Sometimes you have to do the same old thing to earn money. Which is true, if I had done anything of worth, but I hadn’t, I was a long winded scaffold script. I got carried away in the joy of learning a new language/framework (which isn’t a bad thing), and fell into the familiar territory of doing what I always do (which is a bad thing).
So, now I realise that I’m not only trying to teach people something I don’t understand, I’m trying to teach them how to do something I don’t even like doing.
That’s stupid.
I’ve archived the posts. I may revisit, as I still have grand ideas of what a CMS should be, but for now I’m shelving the whole thing. Luckily I have such a small readership (read: none) at this stage, nobody will be affected. For that I’m grateful.
And yet, through all this stupidity, I have learnt a few things of value.
I’ve learnt that Ruby on Rails is more complex than I thought, and will hit the books again to pick up some more advanced techniques.
I’ve learnt that I’m scared to push myself in programming. This revelation is a pointy one. I love programming, it’s my job, it’s my hobby and it’s my passion. The fact that I’m scared to push myself to innovate, hiding behind the excuse that I’m not smart enough or I’m not creative enough is double edged. On one hand, it’s sad that I’m not as ambitious as I thought. On the other, it’s great that I know now, so that I can get stuck into remedying that.
I’ve also learnt a lot about blogging. Those posts of mine weren’t really providing much value. I don’t have many readers because I’m not saying anything new, and I’m not showing anybody cool things they can’t find in a hundred other blogs. I also didn’t have much of a personal voice, and was writing them like I would a textbook, which is missing the point of a blog. It’s supposed to be informal, and I’m supposed to show my personality. So, I’ve found my voice, while realising that I’m not providing value. Another pointy idea, but ultimately good, because that too is something I can work on.
So, inevitably, I have to question the value of this post. What could this post provide to someone who stumbles across this website? Well, a few things.
Firstly, self awareness is a great thing to possess as a programmer. If you know where your weaknesses are, you can train in them, and get better. But that’s not enough, you also need to know what you’re afraid of, so you can recognise that you don’t even know that it’s a weakness. That fear hides the existence of the things you’re not good of, and that’s a major roadblock to improving.
And lastly, if you don’t have anything to add to the conversation, don’t say anything. The whole internet is a conversation, and I’ve just been babbling to myself in a corner the whole time. If you want to contribute, add your own thoughts, your own interpretation on subjects that are well known. Go into new levels of detail on old ideas and technologies. Introduce new ideas or technology, or modify existing ones. Contribute, don’t just talk for the sake of talking.
Do you spend hours wrangling with your “Interactive” Development Environment, trying to prevent it from deleting your boiler plate .designer file for your LINQ DBML? Do you get baffled watching Visual Studio delete said file from your software version control system, and then struggle to get the two back into sync?
Then you might be encountering this bug right here. The reason for this post is because this bug seems very hard to find information on in Google. So hopefully I can raise some awareness, and at least have it pop up on a Google search.
Typical symptoms are the designer file for the DBML being deleted when the DBML is updated. If you’re also using source control, Visual Studio will attempt to delete it from your source control at the same time. Renaming the DBML file will cause it to regenerate, however it can cause major headaches when combined with source control, removing it, adding it, pending delete actions hidden in a project you thought you had unchecked out.
The solution is the suitably moronic action of moving all your using statements inside your namespace declaration. A co-worker found this solution in this MSDN forum thread. Once you move your usings, Visual Studio happily stops deleting your designer file, and you can finally get back to work programming instead of fighting your tools.
Not just any jerk, but the jerk who knows his stuff. The technically strong, socially weak programmer who does not play well with others.
During a recent employee review, I was told that I need to work on a few areas, namely the way I communicate with my co-workers, and my tendencies to shoot down ideas without supplying alternatives. I was also told that my technical proficiency was more than sufficient.
The technically capable but socially awkward programmer is a cliche in every day society, and the rogue programmer is a cliche in development teams. Seth Godin’s post, and another one that cropped up in Hacker News reminded me of this, and made me think. I realised that, to some extend, I am that jerk.
The point of this isn’t to toot my own horn, rather to try to offer an explanation on behalf of all well-meaning jerks in software development teams. Now, I’m not talking about those jerk jerks, who code maze-like code to secure themselves a job, those who are needlessly rude and abrasive, and those that ignore the other members of the team to the detriment of the whole project.
I’m talking about the jerks who are otherwise good employees. We’re not trying to sabotage the project, to exert our will over the project, to make anyone feel bad or insufficient.
We just care about the code.
At least, I do. I care about what I’m doing. I care about the project. I want my code to be the best code I’ve ever written, I want this project to be the most successful project the company has. I own my code, and take pride in it.
When somebody suggests something that I know won’t work, that will negatively affect the quality of the code, I freak out, somewhat. The first thing on my mind is to shoot down this idea, because I don’t want it reducing the quality of the project.
If my manager is planning on putting another programmer on the project with me, and I don’t believe they’re quite up to standard, I’m going to try to limit the damage they do to the application. I’m going to try and push them to work in a relatively isolated corner of the application. Unfortunately, I know this is a bad team work based attitude, however I can’t help it.
Now, I’m not defending jerks. Being a jerk is a bad thing, I know this. I agree with what my boss said, and agree that I need to work on these areas. However, instantly firing the jerk, if they’re a well meaning jerk, is not a good idea, because you’ll lose someone who is passionate about what they do, and constantly striving to improve them.
Instead, just try to work things through, and soften the edges of the jerk.