Psychochild's Blog

A developer's musings on game development and writing.

13 November, 2015

Measuring raid performance

As I’ve posted before, I play FFXIV as one of my primary MMOs. I’ve been playing the new “Heavensward” expansion recently, and the first major patch lands this week. With the patch comes the first of the new raids.

As I might dip my toe into some of these raids, I was thinking about how to measure raid performance. How do you identify a good raider from a poor one?

Raids in FFXIV

FFXIV does a pretty good job of making raids accessible. You can join a raid via “duty finder” interface and have a decent chance of success. There is usually a progression: you generally have to do earlier raids to open quests to unlock the later raids. Plus, you have gear level requirements (institutionalized gear score) before you can queue for the raid. But, these work fairly well to make sure that people have a minimum amount of competence. But, while you can keep the unqualified away from a raid, you can’t keep the unmotivated; some people just want to be carried.

FFXIV also limits gear progression on the far edge; the most recent raids only allow you to get a reward once per week. At least, until the new raids come out, then the restrictions are usually lifted to allow people to power through intermediate content to get to the far edge. This creates a very interesting situation where you can expect to invest a lot of time in acquiring gear in the most recent raids. This can be especially wearing if you want to gear up multiple classes (since a character can gain levels in every unlocked class.)

I’m not a hard-core raider in FFXIV, I mostly play because a friend of mine does. I dip my toe into the raids I can do via the duty finder, and that’s it. I might go back and do some of the Binding Coils of Bahamut unsynced now. But, I’ll hardly claim to e a raiding expert. But, I am a game designer, and sometimes questions lodge into my mind.

So, let’s say you want to be a good raider; how do you measure performance?

Why measure?

Let me do a bit of philosophical pondering here: why measure raid performance?

There are good reasons and bad. The negative reasons for measurement include exclusion and bragging. Gear score was used early on to measure if someone was going to make a PUG easy or not; people only wanted others with a minimum gear score in order to weed out people being “carried”. Ignoring the actual performance of a person, where a slacker with a higher gear score might do worse than an expert with a more modest score. People will do what gets them their rewards the fastest and easiest, though. And the stories of “meter maids”, people who crow about their DPS meter position, is well known.

Of course, there are also positive reasons. If you don’t measure, you can’t improve. To know if one rotation is better than another, you need to measure the results. This can be hard to do in the heat of battle, and target dummies don’t really give you good feedback on realistic combat situations. So, a meter can be a useful tool for improving your ability to perform well in a raid situation.

So there are many motivations for measuring raid performance. Let’s assume it’s not just for assholes to exclude others or brag about their e-peen, even if this has been one of the more visible aspects in the past.

So, what are we going to measure? Since I started talking about FFXIV, a game that uses the modern trinity of roles, let’s look at those roles.

Measuring DPS

Measuring DPS is the easiest, because it’s already pretty common. DPS meters were an early UI addition to WoW. Figuring out how to maximize damage done (and often brag about it) has long been a goal of the DPS classes. This makes sense; the faster you can down an enemy, the less time there is for something to go horribly wrong. And, some bosses have an enrage timer, so insufficient DPS will cause failure for the raid.

Of course, there are other measurements that don’t get so much focus. For example, time alive is pretty important. That character that has 10% more DPS than anyone else for the first minute of the raid but who gets killed from not watching the environment is not as useful as someone who doesn’t stand in the fire. Measuring amount of damage is another interesting statistic, as a DPS who takes more damage runs more risk of dying or requiring more attention from the healers. But, a character who takes a lot of damage without diminishing the healers’ ability to keep the tank up isn’t necessarily harming the raid as a whole. Changing mechanics so that a character does reduced damage if they lose a lot of hit points might be an interesting mechanic for raiding. So, some measure of survivability might be important to measure as well.

Measuring tanking

Measuring tanking is quite a bit harder. There’s an obvious fail condition for the tank: dying. But, in most raiding games this is not the tank’s responsibility alone; the healer is expected to help the tank stay alive. Understanding who is responsible for the tank’s death is a good way to understand how well the tank is doing. A tank who is able to use his or her abilities to avoid lethal damage is better than one who relies solely on healing.

The secondary failure mode is for the tank to lose control of the raid boss and for it to kill other party members, particularly the healer. As such, a tank needs to know how to maintain threat from the boss. A boss that requires a tank swap, where another tank takes over so the first tank can recover from some debilitation, requires more mastery of threat mechanics. But, once again this isn’t necessarily a tank-specific mechanic; a good DPS needs to know when to lay on the damage and when to pull back and let the tank maintain threat, particularly with mechanics like tank swaps.

In FFXIV, the current tank meta for raiding also includes another aspect: damage. A recent /r/ffxiv/ thread on Reddit discusses a lot of interesting ideas about tanking. But, it talks about how important tank DPS is to the current raid setup. Measuring tank damage might also be another way to measure a tank’s ability in some situations.

So, what should you measure? Damage taken is one measurement, although that can be inaccurate since damage sometimes depends on the random number generator for things like critical hits. Measuring damage done to others as a negative measure is potentially interesting, but could penalize a tank with sloppy or lazy members who stand in damage fields or a raid boss’s cleave attack. Lots of options, and I think a lot of them really depend on the encounter.

Measuring healing

What about healers? Again, they have a pretty obvious failure conditions: the tank dying. If the tank dies it’s often the healer that gets blamed, although in some cases the tank might also to be blamed. To a lesser extent there’s the failure of letting other raid members dying, but unless you’re the tank, defense and not taking excessive damage is a personal responsibility. The healer drawing too much aggro from healing threat is also a danger, but this is usually caused by a tank taking too much damage and not generating enough threat. Or from DPS taking too much damage requiring use of high yield and high threat healing abilities.

How do you measure the quality of a healer? It’s really hard, because more healing done usually indicates a poor group rather than a good healer. Measuring something like mana efficiency is possible; the healer who knows when to use the right heal is better than one who spams a single heal re

Like tanks, the current FFXIV meta also emphasizes healers being able to do some damage. This is an interesting mechanic because a healer must choose between doing damage and healing; using a global cooldown on an attack spell might delay that healing spell enough to cause problems. Of course, a tank needs to use abilities to reduce damage and other raid members to not take unnecessary damage for the healer to contribute sufficient damage. So, in a way, the DPS of a healer can be a reflection of the quality of the performance of the raid group as a whole if the raid is successful.

Measuring hybrids

This isn’t necessarily important to FFXIV, but still something that interests me; how do you measure someone who doesn’t fit within just one role? When I raided in WoW, I played a Feral Druid that could just as easily fill in as DPS as an off tank. If my DPS was less than optimal because I had some off-tanking options instead of concentrating entirely on DPS abilities, what was an acceptable tradeoff? 80%? 90%? 95%? In a large guild, would I get benched if they didn’t need my off-tanking ability?

What about some utility abilities. Druids could do an in-combat rez, allowing them to get a fallen raider back in during a fight. From a balance point of view, an in-combat rezzer has to be less effective (do less damage, be a lesser tank, do less healing, etc.) otherwise the combat-rezzing class is going to always be preferred to any other class. But, that margin becomes a hindrance if the raid doesn’t need any combat rezes. So, what is the proper tradeoff? Can a designer make a class with some unique utility, like an in-combat rez, more demanding to play in order to make up for that unique utility?

Measuring within a discipline

During the TBC era of WoW raiding, each class within a role usually had a specialization. Warrior tanks were the master of single-target threat and taking large hits. Paladin tanks, on the other hand, were exceptional at tanking groups of enemies with their AoE threat. Druid tanks had more hit points and armor value, but were more susceptible to spike damage from crits. Sometimes people would pick certain tanks for certain encounters based on what the encounters required. Other times, more limited groups had to make due with what they had.

How do you measure performance when different characters in the same roles have very different strengths? How good does an AoE tank have to be against a single target to be useful in general? Is a group on the cutting edge only ever going to want the specialized tank? It gets tough because people will get sidelined if your particular strength isn’t required for a raid or for a particular encounter.

This is complicated by the fact the grass is always greener on the other side. When I did healing as a Druid, my friend who played a Priest was jealous of my instant cast HOTs and in-combat rez. I wanted his mana-efficient quick heals and shields. Even though we had our individual strengths, there was always the suspicion that some other specialization in the same role had it “better”.

Measuring the group

There are tons of individual measurements for each role, but what about measuring the group as a whole? The ultimate measurement is “did the group kill the raid boss?” But, what about comparing performance between groups? Whereas currently groups compete on “world first” or “server first” completions, what if there were a way to measure how well a group did overall and rank them?

You could measure time taken, although this emphasizes DPS over the other roles. Unless the tanks and healers can provide significant damage as well, as in the current FFXIV raid meta. But, this feels like optimizing for one specific stat, and as that Reddit post I linked above states, some tanks want to feel more like they’re protecting everyone else rather than that they’re just another person who the healer prioritizes in healing.

You could measure damage taken, with less damage taken being better. This, on the other hand, emphasizes preparation and stats. A strong tank is better than a moderate tank and DPS focusing on damage.

The right answer is probably some balance. A carefully designed equation that balances all the measurements of the different roles and overall performance. The specific equation is left as an exercise to the reader, or for me when someone is willing to pay me to do that bit of hard work properly. :)

The last thing to consider is how special group mechanics could be used as measurement. For example, a group able to pull off LotRO’s fellowship manoeuvre, particularly the right one to match the situation, is better than a group unable to pull this off. Could the execution of other, similar mechanics be an indication of group performance?

What do you think?

So, how do you measure raid performance? What do you think is a good way to measure individual performance? What about group performance?

« Previous Post:


  1. I’ll admit, I’m a bit surprised there’s no interest in discussing this topic. Perhaps I should go a bit more basic in the topic.

    Syl wrote on Google+:
    Laidback as I have become about these matters, I feel there’s no reason to measure any performance as long as it’s good enough and people are enjoying themselves. There’s no need to fix what ain’t broken and no measuring for measure’s sake – why do we always need to improve? I mean, that is a serious and philosophical question. We’re driven towards this mindset IRL because the economy wants us to perform more and more and more…but in games? No thanks. :)

    Of course anyone that enjoys this in MMOs is free to do so, as long as they don’t make others miserable. Personally, the only time where I feel measuring has a purpose in MMOs is when you’re unable to accomplish your goals – in which case you can have a look at your performance on individual or group level. In FFXIV I am super uninterested to do so because all I ever have is successful groups, anyway. That’s why the game is so bloody great. ;)

    Well, as you well know, not everyone plays for the same reason. There’s a reasonable active raiding segment in FFXIV, and I think FFXIV does raiding fairly well. I’ve just never gotten very deep into it because my FC is pretty small.

    The question assumes that there are people out there who want to improve their raiding capability. For DPS, it’s pretty obvious: install some sort of parser and see how big your numbers can get. But, what about tanks and healers? Is there an objective way to tell if one tank is better than another? Or one healer superior to another? Besides the obvious case of “the raid wiped”, which is rarely the fault of a single person.

    Out of curiosity, did you do the Binding Coils of Bahamut before the Echo?

    Comment by Psychochild — 17 November, 2015 @ 6:45 PM

  2. As I said, if it’s for raiding purposes it makes sense. And when I raided back in WoW, the healers did in fact measure their healing too – we used wowstats (or whatever it was called) to parse our raids and for me personally as a healing officer, stuff like healing focus / spread was a way of seeing if healers followed my assignments. There used to be a time when overhealing was an issue too but that changed a lot later.
    I have played catch-up for most of the old FFXIV content and only got a few WoD and Coil runs in before the expansion. I did do Coil 1+2 mostly, just one run for 4 but it was difficult to coordinate in a PuG. Same for few other fights such as Ultima and Titan EX, those are tough to find a PuG willing to learn together.

    Comment by Syl — 18 November, 2015 @ 1:02 PM

  3. Honestly, I’m not too surprised there isn’t any discussion. Let me give you my reasons, and maybe roundabout that way end up leaving my opinion. Though I’m not sure that will help foster any more discussions.

    1) When I read this piece, everything seemed quite obvious; I thought “I could’ve written _that_, too!”. Now the point is: First, like art… yes, I maybe could’ve, but I didn’t, and that’s the point, right? Just writing down the observation is important. And second, one of my favorite quotes of all times is from Frank Herbert, hidden somewhere within the 1000 pages of Dune: “Science is made up of so many things that appear obvious after they are explained”. Everything you say in the post is immediately clear. Which brings me to point 2…

    2) You presented your thoughts very well. It’s a hard topic with no easy answers: there are many factors that depend on each other (is it easy for a healer to appear capable if the group performs badly? Are tanks best when they make the healer die from boredom? Is it preferable to have everybody perform their specific roles, leading to high control over encounters, or should tanks and healers focus more on DPS, trading time-to-death-of-boss for safety margin and potential loss of control? In fact, there may not only be no easy answers; there might be no right answer at all. And while one might think that this fosters open discussion…

    3) … it might not foster _passionate_ discussion. You presented the topic in a very analytical way that I enjoyed reading. However, it is much easier to write a comment if you strongly agree or disagree with a specific position of the author. You do not have a strong opinion on this topic though, so the urge to “show him how wrong he is” isn’t there to entice people to discuss.

    In fact, you may have noticed that, in the end, I didn’t say much about the topic of the post myself. This is not because it’s not interesting; it very much is. It is just that I have a hard time thinking about additional contributions: there is nothing wrong that I want to point out. There is also no additional thought that I can propose, because you seem to have touched on all the important points.

    Again, while some of what I’ve written above might sound critical, I very much liked the post; I just couldn’t think of anything to contribute to its content in the form of a comment.

    Comment by flosch — 19 November, 2015 @ 11:44 AM

  4. Yeah, that’s fair. I was hoping someone might have some insight that I missed though.

    One tactic I’ve used in older posts is to put a glaring logical error in there. That would get people to comment. Perhaps I need to start doing that again. :)

    Comment by Psychochild — 27 November, 2015 @ 8:52 AM

  5. I think you didn’t attract controversy because you wrote a comprehensive look at a somewhat academic topic. The majority of raiders are DPS and can get close enough to measuring their performance with a DPS meter, some tips from the forums, and maybe even asking for advice. Tanking and healing positions are more limited, and in general your counterpart will be able to tell qualitatively if you’re doing poorly (if not necessarily why).

    But here’s a nitpick boiled down from your section on “why measure”. Perhaps I’m reading into what you wrote, but I think it is a misunderstanding to brand people who want to exclude others as “assholes” without considering why they are doing what they are doing (and how that colors the discussion of raid performance).

    Automated group finders like the one WoW added late in Lich King depend on bribing players with inappropriate rewards to continue running content they no longer want or enjoy. Given how strict the vertical progression is in most MMO’s today, there’s just no way to assemble enough players to fill groups in an acceptable amount of time if you are depending entirely on people who actually still need the content.

    Thus, Blizzard’s design choice to offer second and third tier raid tokens for trivial pre-raid 5-man content, in order to remain sure it remained possible to run that content and gear up for entry level raids. Excluding “weak” players – agreed that gearscore was a crude way to implement this, albeit one of few that could be observed immediately before investing time in a group – was not the desired outcome, but it is players doing exactly what the incentives told them to do.

    If the only reason you are doing the content at all is to get the reward, every extra minute the group takes (e.g. because it includes people who actually still need the content) delays the time when you are released to go do something you actually want to be doing. If the group fails altogether, that means you put in the hours, had a frustrating time, didn’t get what you wanted, and now potentially will have to do it another time with no guarantee of a better outcome to try and get your tokens for the week.

    So getting back to your question about why you should be measuring player performance. One answer is to tune content more precisely, but bear in mind that the majority of your customers actively and intentionally seek out ways to defeat your tuning (e.g. immediately finding the easiest dungeon and only farming that one dungeon, demanding that people show up multiple tiers overgeared for the content, cramming in more players than the instance was tuned for, etc) because they don’t want to be challenged. Blizzard tried adjusting difficulty back down to the “correct” level in Cataclysm and the entire system ground to a halt because tanks and healers boycotted and formed their own exclusionary groups rather than accept the possibility of failure through the group finder.

    Or in theory you could go the other direction and use what you learned from monitoring player performance to try and change behavior in the hopes of improving the experience for your players. The challenge there is that incentives are almost always far more successful at changing player behavior than underlying player preference, as described above. For example, say you went ahead and explicitly stated that X slots in the automated raid groups are there to carry people who need the gear, and will receive additional rewards if they meet X and Y metrics. The problem there is that you will have people standing in the fire to keep their DPS at the required level, or not doing enough DPS because they are focused on cashing in the “don’t take fire damage” bounty, or if you actually completely prevent people from phoning it in they might just decide to call your bluff and not show up.

    Really, us players are an unruly bunch with a bad habit of doing what you said and not what you meant, complaining about you all the whole. Sorry about that?

    Comment by Green Armadillo — 2 December, 2015 @ 5:32 PM

  6. The easier solution may be to remove individual performance from the equation entirely. I would refer you to skill rating systems as used in competitive games such as Chess, Halo, or League of Legends. These systems (Elo, TrueSkill, Glicko, …) eschew measuring individual performance and instead reduce one’s skill to a heuristic value derived from wins and losses (and draws) over time; in short, personal skill only matters inasmuch as it contributes to outcomes. In this context, we can say that one’s skill as a raider is wholly determined by one’s ability to complete raids.

    Themeparks MMOs and the raids contained therein could be summarized as follows: Raids are static content meant to challenge a party of players at a prescribed statistical level (further referred to as iLvl, a la FFXIV). Obviously, this summary is somewhat vague, and further analysis of the topic would require strict definitions for many of the key terms (what does it mean to ‘challenge’ someone? how much do statistics matter?). I make an assumption here that an individual player’s potential is wholly or primarily derived from gear benefits, as is the case in FFXIV.

    Given the expected iLvl of players in the party, the actual iLvl of players in the party, and the outcome of any given raid, we have everything we’d need to rate a given player’s performance over time; this could be further differentiated by role, a trivial thing in FFXIV, though not in all MMOs. A player would see large gains to their rating for successfully completing raids above their iLvl and conversely, see little to no gain for successfully completing raids below their iLvl. Over time, we derive a single value that represents, in abstract, a player’s raiding ability.

    Obviously, many of the strengths and weaknesses of using such rating systems in competitive play would exist in cooperative play. Measuring over time reduces or eliminates outlier biases, providing a more accurate result, but eliminating the details of individual performance could be reductive and less representative of actual skill. However, given we accept these systems as valid in competitive play, so too do I think we would accept these systems as valid in cooperative play. In fact, this is not unprecedented: although I don’t know the details, Puzzle Pirates utilized a heuristic rating system to rate a player’s skill at its puzzles.

    This is not to say, of course, that measuring individual performance isn’t a desirable exercise, nor is it to suggest that it’s an impossible one. League of Legends recently(-ish) implemented their Mastery system, which purports to measure an individual’s performance with a given character and role. I’m replying merely to further the discussion by providing another point of view. The assumption made by the popular heuristic rating systems is that only outcomes matter in the long run; perhaps that’s the case here as well.

    Comment by Etomyutikos — 2 December, 2015 @ 6:55 PM

  7. Quick note that this got covered on Massively OP:

    I’ll respond to the new comments here sometime later.

    Comment by Psychochild — 3 December, 2015 @ 11:08 PM

  8. Sorry for the tardy responses here. The holiday season has kept me busy.

    Green Armadillo wrote:
    Perhaps I’m reading into what you wrote, but I think it is a misunderstanding to brand people who want to exclude others as “assholes” without considering why they are doing what they are doing (and how that colors the discussion of raid performance).

    Yeah, tone doesn’t carry over text so well. As I said on Massively, that was a bit of snark to try to direct the conversation. My goal was to avoid getting mired into a conversation about how measuring performance leads to the assholes excluding others. Obviously this didn’t work perfectly. But, I wanted to focus on more positive reasons for measuring performance.

    Really, us players are an unruly bunch with a bad habit of doing what you said and not what you meant, complaining about you all the whole. Sorry about that?

    Part of the reason for writing this post was to bring some interesting things to light.

    Etomyutikos wrote:
    (Elo, TrueSkill, Glicko, …)

    When I talk about “individual measurement” I mean more an individual person. The problem with most of these systems is that they would measure a team as a whole. While I’m sure at the higher end you’d have static groups which could benefit from such systems, a person in a PUG may not want their overall measurement negatively affected because some random person in the party was not playing well enough to win.

    What I’d really like is some way to say “this group didn’t finish the encounter, but the tank performed very well overall.” Looking at sports, the best way to do that is probably with a whole battery of statistics so that you can say “the tank did X well, even if the team lost” just as you can say “the quarterback did X well, even if the team lost.”

    Comment by Psychochild — 29 December, 2015 @ 6:26 PM

Leave a comment

I value your comment and think the discussions are the best part of this blog. However, there's this scourge called comment spam, so I choose to moderate comments rather than giving filthy spammers any advantage.

If this is your first comment, it will be held for moderation and therefore will not show up immediately. I will approve your comment when I can, usually within a day. Comments should eventually be approved if not spam. If your comment doesn't show up and it wasn't spam, send me an email as the spam catchers might have caught it by accident.

Line and paragraph breaks automatic, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Email Subscription

Get posts by email:

Recent Comments


Search the Blog


June 2020
« Aug    



Standard Disclaimer

I speak only for myself, not for any company.

My Book


Around the Internet

Game and Online Developers

Game News Sites

Game Ranters and Discussion

Help for Businesses

Other Fun Stuff

Quiet (aka Dead) Sites

Posts Copyright Brian Green, aka Psychochild. Comments belong to their authors.

Support me and my work on