Psychochild's Blog

A developer's musings on game development and writing.

11 February, 2007

Weekend Design Challenge: Applying data mining

Last week we looked at information about data mining. There are some good resources there; but more are always appreciated.

So, now we’re going to talk about applying this knowledge. Let’s talk about what you can do with data mining.

Paul Barnett posted a comment to discuss this topic a bit. He wrote:

Odd thing data mining. In my ten year holiday from computer games (94-2004) I went off to be a creative consultant with bricks and industry. There is a wealth of data mining out there and most of it is, frankly, worthless.

A great example are the loyalty cards that superstores have in the UK. Every single transaction recorded. I mean that amount of data must be worthwhile, right? I mean after you pay to make the cards, record them, adjust your cash tills to scan the cards in, send new cards to people, change them into keyfob swipable ones, set up direct mailing, figure out how to produce vouchers, advertise the heck out of the fact you have a loyalty card and then spend a ton of money gleaming all the information.

Turns out people shop about every few days, they buy milk a lot and now and then buy washing powder.

Of the people I ended up working with, almost all of the big and clever companies just knew their market. I mean they actually just knew it. They could tell you what would and wouldn’t work, they basically understood what they had to focus on. And when I ran into a company that didn’t, the only thing I used to find with any regularity was this..

They had just lost faith in their gut
They had just lost the core people that ‘knew’ the market

So when people talk about data mining I am sort of preprogrammed to raise an eyebrow. Is there really data out there that we don’t know? I mean really, is there? And if so what bleeding use is it going to be for us?

I replied:

I think you’re right in that most of the good designers know what a typical player does in a typical play session. If not, time to pick a new profession. I also think anyone with an IQ over room temperature knows the whole grocery “loyalty” card thing is just dumb because it’s not going to reveal much of interest.

But, when we collect metrics on an online game, we’re actually not particularly interested in “normal” activity. No, what we really want to see is abnormal activity. This gives us more insight into what’s going on in our game.

For example, say that the average number of experience points (or in-game currency) earned per player per hour jumps suddenly one day. This probably indicates either a location that is rewarding too many experience points is being farmed hard-core, or an exploit was found. In either case this can be a good reason to dig deeper into the data to find out what the cause of this is. Continuing the example, if use of a particular zone also increased, you know where to start looking for your problems. Observing individual players (or reviewing individual logs if you record them) can help pinpoint a problem instead of hoping for a lucky break.

The goal of data mining is to allow you see these problems with a glance at a summary instead of pouring through individual logs to figure out if someone’s cheating or not. Having good data mining helps you catch issues sooner rather than later.

Paul followed up with:

So the data mining we are talking about is actually how to catch stuff that is throwing the game balance out of wack?

Might be interesting to ask how people think we are best off doing that, perhaps that will generate the data mining answers that will be helpful?

From what I have seen people just get better at finding ways to game the system, should we even be bothered that it happens?

If you set general time to get to level X at Z hours. How do you adjust when people have hint books, web sites, guild members, item flow down, buffing and all manner of other stuff.

Wouldn’t the limits become worthless after a while? Or are these metrics just for early play balance and are there to be discarded as the game matures?

Do you really need metrics to know where the current population gravity is within WOW?

So, let’s discuss this issue a bit more. Participation required! :) I’m interested in hearing everyone’s thoughts about this. Is data mining useful, or is it just modern snake oil used to sell middleware? Can we rely on designer instinct as we have before, or has the time come to actually have supporting data for the things like balance?


  1. A simple idea to kickstart things: Store data about overall kills between classes in pvp. i.e. is a rogue always winning? This might suggest an imbalance between classes in games where the idea is that in a 1v1 either class should be able to win.

    Personally, I think data mining is only half the puzzle, the other half is a human with some common sense and a strong ‘gut’. If the two are put together, that’s when you start finding the gems.

    Also – visualisation! I do not think piles and piles of data are very useful if you don’t design tools to visualise it.

    Comment by Jpoku — 12 February, 2007 @ 2:00 AM

  2. As far as I can understand data mining generally has a bunch of standard uses for figuring out how an mmo is making money and where things are balanced poorly. The perspective that should naturally be added is that of emergence, and here I’ll have to use Chris Crawfords definition which reads:

    “Emergence is a property of a complex system that strikes when the designer of the system writes code that operates at a higher level of abstraction than the designer understands.”

    This becomes interesting as generally “emergent gameplay” is considered as bonus content, rather than exploits and problems and perhaps the word emergence needs to be split into two different meaning, emergence_good and emergence_bad. Most of the mmorpg’s I have played has had their developers a few years behind the players on seeing the patterns of emergent gameplay within their games.

    To counter this focus your data mining to help with something that looks like this:

    “Get familiar with the parts of the game which operates at a higher level of abstraction than the designer understands.”

    Comment by Wolfe — 12 February, 2007 @ 3:31 AM

  3. Jpoku wrote:
    Personally, I think data mining is only half the puzzle, the other half is a human with some common sense and a strong ‘gut’.

    The second half of the puzzle is understanding. All the data in the world is useless if you don’t understand the context. My fear is that some people will use data mining as a replacement for understanding. Or, as you and Paul point out, as a replacement for “gut”.

    Using your example about the rogue winning significantly more PvP battles, the designer still needs to understand why the rogue is winning so many battles. Is it insane burst damage? Opportunistic use of stealth? Bullshit dodge factor? All of the above? You need to know what to nerf (or, better, what counters to put in) if you really want to balance out that scenario.

    Further thoughts,

    Comment by Psychochild — 12 February, 2007 @ 3:39 AM

  4. Data mining is/can be mana from the Heavens.

    I don’t have much ‘internal/industry’ knowledge on the data that is mined yet, but when I worked for Alliteration Electronics, they were huge on data mining, and they understood it. Every single person who walks in the store is counted (as an anonymous number). The number of ‘people through the door’ is measured against sales transactions to tell them how many people actually buy something in the store. The stuff each person buys is looked at to get an idea of each stores average items in a bag, price per bag, and then some other stuff that is useful to the retailer (profit per customer, etc.).

    At the end of the day the company can tell the store manager exactly how many customer opportunities they had, how well they took those opportunities on, how much the customers were spending, and how profitable that customer was.

    Retail stores are measured on hitting their budgets and profitability, so this information is HUGE in the hands of a good store manager. If your customer count dips or the number of buying customers dips it’s an indication that your employees may be ignorning/not helping people (or over helping). You can work on that to ensure that people feel welcome in the store and relaxed.

    If people don’t buy a lot of stuff, or don’t buy high profit items, you know that to meet your budget and profit goals you are going to have to work hard to get as many sales in at the register each day as possible (going for volume). If your customers are big spenders (every store is different), and you sell lots of TV’s and high end stuff, you can focus on making sure those departments have amazing sales people who are well trained to maximize that.

    It’s feeling like babble, so I’ll get to my point: As Psychochild noted, if you can understand the data, it’s amazingly useful. Knowing player trends (where most tend to go, how long they spend, what they do, etc.) you can really get some amazing insight on your game. Especially when you use the knowledge and then look at the reversed info (where they don’t go, where they skim through for 30 seconds, what they don’t do, etc.)

    It teaches you what grabs people in a game, and what does not grab them. Then you can work to maximize the positive hits, and revisit the areas that aren’t successful (by areas I mean zones, classes, creatures, whatever) and tweak them based on the knowledge gained from the areas that are working.

    Comment by Grimwell — 13 February, 2007 @ 9:39 AM

  5. This is my very first quote to your blog, so please be nice :-)

    I see two problems in the whole data mining thing:

    1) What should I log?
    2) How can I take advantage of that data?

    I want to focus my thoughts on the second point (for now). It has already been said that the right tools have to be designed and developed to visualize the data you have mined. I my eyes that point is as important as point one. The art of that one is to putting the data into a context that is much more informative than the pure data itself, or a plain representation of it (e.g. by just doing clever statistic functions).
    An example:

    Logs say that player went from coordinate A to coordinate B in the amount of time T and the map internally numbered as map M. So far, so good. So you could try to detect if the player has used a Speedhack-like cheat to run faster than normal. You could simply compute the distance from A to B and by setting a threshold you could say “Hey, that player went too far in a too short amount of time”. Well, this would work, more or less. What about lag? It could mess up the movement commands of that player (depending on the quality of your network code).
    Something else: Could you tell if the player has teleported from A to B? With just that data, you couldn’t (at least not to 100%).
    Now let’s do the following: From another table you reference the map number M to get the actual name of the map to raise the amount of information. Now get a graphic of the minimap of that map, and plot the way the player went onto that graphic. “Huh, what’s that? He went from one side to the other side of a wall/door that he had to open first!”. Then you may have a log that says if that door or wall has been opened at this time, and there you have caught your first cheater :-)

    The trick is to connect the information you have (logs, for example) with other data (mapnames, minimap graphics) to create a context in which your data make sense. That all seems pretty obvious, sure. But the way you do it convert your plain more or less useless data into VERY useful information.

    Comment by Elendil — 13 February, 2007 @ 5:52 PM

  6. I think data mining is probably capable of the following;

    Help red flag any game breaking developments so the Dev detective can look into it.

    Confirm ‘gut feel’ by proving the point with targeted data gathering

    Waste a lot of time if done badly

    Become an obsession and time consuming in and of its self

    Hide the conclusion in a forest of data

    Helpful for rolling back a game world state once you have tracked a problem

    Convince people that there really is an economy in a game

    Prove that you did in fact waste your time and effort with 80% of your game world/careers/challenges.

    Prove that route one gaming is in fact the be all and end all of most players.

    Will be disputed by your player base unless you publish the data for them to read

    Will be pulled apart by some players and used against you if you actually do release it to them

    Will be worth a fortune as long as you keep it hidden from people

    Be worth next to nothing once you try to realise the fortune.

    Is almost always only capable of generating hindsight

    When it is useful it then shows you more data you should have captured

    Comment by Paul Barnett — 14 February, 2007 @ 8:43 PM

  7. Now the topic of how to catch game breakers is something I am interested in. Of course it then leads to the tricky question of how much change you should bring to your game design once it it live.

    But is checking for game breakers the same as data mining? I am not so sure. I mean I want game breakers found straight away, and I want any fault flashing on my Dev dashboard. I want to know what area I need to look in so my Dev Detectives can track it down.

    Is that data mining or just overwatch?

    If I do data mining after the fact don’t I run the risk of closing a loop hole long after it was useful to close?

    And while I take the point about games being very complicated and I understand that it can seem a little daunting, I like to think about my car. It’s very complicated yet my mechanic often knows the problem right away. They may use a computer to check on the cars brain, but that’s not data mining thats just a systems check. Actually the bloke who fixes my computer can also do this, he also deals with a very complicated thing. Same for my Doctor, I mean when you hear hoof beats think horses not Zebras (well, unless your in africa). My recent electrics in my house went up the spout and the spark who came around just knew where to look.

    More I think on it the more sure I am that data mining is at best useful to confirm a ‘gut opinion’ and that’s about all.

    I would add it to a few of my pet peeves. People obbsessed with economy, real weather, real physics, rmt’s, instancing, trying to make mmo’s art, anyone who laments MMO’s are failing to deliver, girl based MMO’s, raid end game, second life being treated as special, bashing WOW, getting caught up in next gen thinking of anything and any other such stuff.

    Coo, I sort of drifted off into cloud cuckcoo land, sorry about that.

    Comment by Paul Barnett — 14 February, 2007 @ 8:55 PM

  8. One opinion which may or may not be popular is that data mining is above a diagnostic tool. Logging is a preparative tool. One logs, that one may mine.

    However, it is not useful to ask “what is data mining for?” and ask for specific examples without first having a question which can be answered by mining data.

    In order to understand the Answer to Life, the Universe and Everything, one must first understand the question.

    Comment by Cael — 15 February, 2007 @ 4:41 AM

  9. Conferences and Metrics

    [...] had an interesting discussion about metrics and whether they’re worth anything at all on his blog a few months back. [...]

    Pingback by — 11 September, 2007 @ 11:01 AM

  10. Hello! May I summarize yet? :-)

    It sounds like the uses of data mining are clustering around the following:

    1. Balance a game in development by comparing changes to assets of target actors.
    2. Keep an operational game in balance by watching for outliers that may signal exploits.
    3. Provide feedback to players of in-game accomplishments and statuses.

    Those sound pretty useful to me. Did I miss anything obvious?

    OTOH, I remember what Raph said in his “Tired of hearing about the NGE” blog entry:

    … regardless of how I feel about the NGE itself, I think that the real lessons of it are mostly stuff that isn’t even visible to the public. (For example, IMHO the real lesson is about data mining, and not about any of the stuff that gets talked about).

    Well, that’s pretty scary. Where does that cautionary note lead us when talking about data mining?

    What about the perils of data mining? What steps can/should developers take to guard against misusing this capability to see inside the machine?


    Comment by Bart Stewart — 17 January, 2008 @ 1:44 PM

Leave a comment

I value your comment and think the discussions are the best part of this blog. However, there's this scourge called comment spam, so I choose to moderate comments rather than giving filthy spammers any advantage.

If this is your first comment, it will be held for moderation and therefore will not show up immediately. I will approve your comment when I can, usually within a day. Comments should eventually be approved if not spam. If your comment doesn't show up and it wasn't spam, send me an email as the spam catchers might have caught it by accident.

Line and paragraph breaks automatic, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Email Subscription

Get posts by email:

Recent Comments


Search the Blog


January 2020
« Aug    



Standard Disclaimer

I speak only for myself, not for any company.

My Book


Around the Internet

Game and Online Developers

Game News Sites

Game Ranters and Discussion

Help for Businesses

Other Fun Stuff

Quiet (aka Dead) Sites

Posts Copyright Brian Green, aka Psychochild. Comments belong to their authors.

Support me and my work on