Mikołaj Koziarkiewicz
intro splash
Figure 1. Source: SD 2.1+own work.

Spring has sprung in full force in the Northern hemisphere, so it’s time to dust off the old "submit publication" button! In this entry, we’ll look into ways of extracting training data from computer-generated data sources without breaking too much of a sweat. Specifically, we’ll introduce the data source we’ll work on, and define a problem statement. This will form a basis for later blog entries that will tackle various aspects of said problem. Read on!


One of the interesting aspects of our times is the increasing importance of data derived from "human-made" sources. Early automation was almost always primarily grounded in the physical reality. As time went by, systems that built upon data delivered by other systems become more worthwhile: analysis of social media trends, electronic sales projections, and so on. And, well, there’s also lots of stuff built on top of data originating from video games. Anti-cheat systems, usage analytics – used both for objectively valuable insights like improving accessibility, or less so, like "optimizing" microtransactions – with the normalization of gaming as a pastime for all ages, there’s a gold mine of opportunity here. And we’ll take some advantage of it; but first, a bit more about our subject.

The game in question is MechWarrior Online, subsequently referred to as MWO. As the name suggests, it is a multiplayer-only fair set in the Battletech/MechWarrior universe. It offers several game modes, virtually all of them centered around two teams fighting each other to destruction, while also trying to balance achieving auxiliary objectives.

intro pgi promo screenshot 1
Figure 2. Promotional screenshot of MechWarrior Online. An Ebon Jaguar heavy mech engaging a target (possibly an armless Bushwacker) with support weapons. Source: Piranha Games.

Everyone pilots a Battlemech (no combined arms here) : a customizable, heavily armed, ludicrously armored, single-seat, bipedal[1] combat vehicle. While it sounds like a First-Person Shooter with some sci-fi bling thrown on, that couldn’t be further from the truth. Mechs have multiple, independently destroyable components, each potentially housing multiple armaments and pieces of equipment. Players need to manage ammunition, heat generated by own (or hostile[2]) weapon fire, damage distribution, terrain use, team coordination, and so on.

The general vibe feels less like a squad-based FPS, and more like somewhere between naval combat and simultaneously controlling an armored platoon. Mad aiming skillz are much less important here than situational awareness, forethought, planning, and team coordination. Not only that, but the mechs themselves are – as previously stated – highly customizable, to the point that a significant amount of player’s time is spent tweaking and trying out different weapon/equipment configurations. All this coalesces into a unique experience, and so a unique data landscape to work on.

Possible premises and their choice

Having explained the circumstances we have on our hands, let’s see what we can do with the source material. Since many players occasionally record their games for later analysis (and some for streaming), the original idea was to create a "virtual coach" that would, based on said recordings, call out possible improvements in the player’s style.

Positioning (w.r.t. friendlies and the likely enemy placement) is probably the most important learnable skill in MWO - but getting full data for that (known friendly and enemy positions at a given point of time) is difficult to extract just from existing footage. Not only that, developing the model itself would be decidedly non-trivial. In other words – assisting positioning in MWO is an intriguing challenge, but complex enough to warrant a whole separate series.

So – at least for this blog series – let’s try something more manageable: situational awareness. With all the aspects of managing a mech occupying the player brain’s processing power, slipping up and being completely oblivious to a hostile mech[3] running through the field of vision is surprisingly common. However, such mistakes can usually prove fatal, as said overlooked opponent can easily get behind the player’s mech and start tearing them apart. Moreover, the initial situation often constitutes the opponent’s error, and would be prime time to engage.

Having contextualized our circumstances, we now need a problem statement with specific requirements and goal conditions. Here it is:

Develop a model that detects enemies in-view that could have been targeted, but weren’t, and mark them on the footage. Bonus goal: mark situations where the player was not actively engaging any target, but could.

OK, looks good and relatively manageable. From our problem statement, we can work out that we need some way of determining:

  1. the positions of mechs in a given frame of the footage;

  2. that a given mech is friendly or hostile;

  3. that a given hostile mech is "non-targeted";

  4. that the player is in a "non-engaging" state.

We can also declare a couple of non-functional requirements:

  • our solution does not have to be especially performant – we’re operating on pre-recorded footage;

  • our solution should weigh recall over precision — we’re fine if e.g., a friendly mech is falsely marked as a "non-targeted" hostile mech, as it should be apparent from the context of the footage, and the recommendation simply discarded.

Now, we’ll take a brief look at the game’s UI to determine what, exactly, we want to train.

Identifying screen UI elements relevant for machine learning model training

Let’s examine some screenshots, enhanced with context markings. Examples are demonstrated in the slideshow below:

  1. Targeting reticles: yes, the mechs have multiples thereof, for different types of weapons, and their hardpoint locations.

  2. Unmarked hostile mech.

  3. Target designator on a marked, hostile mech: the player can mark at most one hostile mech at a time.

  4. Targeted mech’s scan result, showing weapon and component status[4].

This is just some information relevant to the player on the screen, but pretty much all that we need for our purposes.

In end effect, we have two types of data to extract from the video’s frames:

  • UI-derived information, such as weapon firing state,

  • detectable objects.

While the former is extractable using simple image manipulation and "classic" computer vision techniques, this is not so with detectable objects, i.e. the mechs. For that, we need to train some sort of object detection model.

We could go over each recording and meticulously mark each and every mech. But who has the time (or money to hire annotators) for that?

We might consider "traditional" motion detection techniques, used widely in consumer IP cameras (and explained in a myriad of online tutorials), but that option also falls flat. Why? Because both the objects and the camera are moving – sometimes quite vigorously. So that’s one possible free lunch out of reach. We will, however, consider the potential to exploit research into movement detection on mobile cameras, but that’ll come later on.

Now, take another look at the screenshot: see how the hostile mech is nicely marked[5]? And how about that nice bracketing of the actively targeted mech? Almost like a bounding box, right?

Well, it looks like we have a way out – we’ll try to automatically extract detection boxes by annotating targeted hostile mechs as objects to be detected. We can use that data as inputs for subsequent training of our "primary" detection models.

intro screenshot processed
Figure 3. For completeness, a screenshot that’s more representative of situations in the game – an evidently more complex scene. The view is unzoomed, meaning the full interface, including the quasi-diegetic cockpit displays, is visible. The player, and two teammates, are engaging an (unlucky) hostile mech that just rounded a corner. The opponent’s readout is showing damage to the torso components, including actively receiving fire to the central portion. PPI has been edited out.


In this entry:

  • we’ve introduced the use case we’re going to handle in the blog series initiated by this entry: extracting training data from video games, and putting it to use.

  • We’ve chosen MechWarrior Online (MWO) as our exemplary data source.

  • We’ve also examined the automation problem landscape in MWO:

    • we considered several potentials, such as assisting in player positioning,

    • but, for the immediate future, settled on a more manageable problem: helping with situational awareness.

  • Finally, we also identified the screen elements that are relevant for our model training.

In the upcoming several follow-up entries of the series, we’ll explore how to obtain training data for the defined task, by way of identifying and extracting the relevant UI elements. We’re going to use, and eventually compare, several different methods to accomplish this task. And yes, that means we’ll actually start writing some code. Stay tuned!

1. Battletech fans will be quick to note that the last two points aren’t always the case in the setting, but it is in MWO, so you can safely ignore them (unless they’re in your rear arc).
2. …​or sometimes friendly…​
3. Especially a small and fast one that is equipped for stealth.
4. This one in particular is at full health.
5. That’s the weight class marker. The cross-hatched diamond signifies the "assault" class, the heaviest one in MWO.
Mikołaj Koziarkiewicz


One of Python’s probably most "wow, this is great!" features for newcomers is destructuring in assignments, whether of tuples, lists, or – indeed – any iterables:

a, b = ("a", "b")
[zero, one, two] = range(3)

For the longest time, this destructuring, and for comprehensions, were, sadly, all that Python had to offer in terms "structurally-oriented" syntactic sugar. In contrast, Scala’s pattern matching, while arriving a bit later in the game, has always been more feature-packed and functional.

Fortunately for Python, it has been eventually extended with structural pattern matching, which brings it way closer to the expression power of Scala’s pattern matching. "Almost" for several reason, the largest being that structural pattern matches are statements, not expressions. Why this was done was actually addressed in the PEP here; personally, the argument, possibly paraphrased as "it wouldn’t be Pythonic" is as lost on me as the opportunity that was available here.

In any case…​ this post is not about structural patching matching, but closing another type of expression power gap between Python’s and Scala’s destructuring semantics, also becoming a reality relatively recently: with Python 3.8.

By the way: this is definitely a shorter entry, written simply because I stumbled upon the relevant problem (and solution) while writing a considerably longer series of blogs. Hopefully, they’ll start appearing this quarter – but until then, let me show you something…​

Schrödinger’s destructuring

We’ll start with an example in Scala[1]. Let’s say we have a three-element list, and we want to assign said elements into separate variables. We end up with something like this:

val List(one, two, three) = List(1,2,3)

The code above maps easily to Python, with the addition of the val keyword (for final, i.e. non-reassignable variables) and the syntax of defining a list being the only two differences. Indeed, the corresponding Python code looks like this:

[one, two, three] = [1, 2, 3]

OK, let’s say we know want to store both the individual elements and the entire list. In Scala, this is done easily enough with the addition of an @ assignment:

val result@List(one, two, three) = List(1,2,3)

// In the console, this prints out:
// result: List[Int] = List(1, 2, 3)
// one: Int = 1
// two: Int = 2
// three: Int = 3

So what about Python? Until a couple of years ago, all one could do is something like this:

result = [1, 2, 3]
[one, two, three] = result

Not exactly the end of the world, but a bit annoying.

Things have changed with the introduction of assignment expressions, which you may be familiar with, by way of :=, i.e. the walrus operator[2]. As the name suggests, these are simply assignments (=) that are also expressions – meaning they evaluate to some value, unlike the standard assignment statement.

Let’s have a first go at trying to use an assignment expression in our test case:

result = ([one, two, three] := [1, 2, 3])

The snippet emulates Scala’s ordering of assignments, and looks like it could work, except the interpreter prints out:

  File "<stdin>", line 1
    result = ([one, two, three] := [1, 2, 3])
SyntaxError: cannot use assignment expressions with list

The same (or similar) gets printed out for tuples, sets, and so on. Using destructuring in an assignment expression is simply not supported. Not all is lost, however. All we have to do is reverse the sequence of assignments:

[one, two, three] = (result := [1, 2, 3])

And this works! Here’s the corresponding, verifying console output:

>>> one
>>> two
>>> three
>>> result
[1, 2, 3]

Collapsing the wave function

To summarize – for a given value, what we wanted is to assign:

  • that full value, and

  • a destructured version of the value,

  • in one step.

Our solution boils down to using an assignment expression together with good-old destructuring, represented by the general pattern below:

<destructuring_vars> = (<whole_value_vars> := <value_to_destructure>)

Again, this is possible in any version of Python starting from 3.8, inclusive.

One important remark is that, like in the "old" two-liner version, we actually reference the original value within the assignment expression. Said identity has consequences when the value is mutable, such as the list in the example below:

org_value = [1, 2, 3]
[one, two, three] = (result := org_value)
# prints:
# [1, 2, 3, 4]

This might be a disadvantage when rewriting code where, originally, the "whole result" is reconstructed from the destructured, constituent parts, like so:

[one, two, three] = [1, 2, 3]
result = [one, two, three]

creating a brand-new list. In other words, be wary of the quirk when rewriting code to this pattern.

Now, to get to a final talking point, one that’s probably on a number of the readers' heads: should this kind of "trick"/pattern be used?

For starters, I am definitely not a "pure" Python programmer (nor, TBH, do I aim to be one), and so no authority of whether something is "Pythonic" or not.

Otherwise, it can be argued that this pattern sacrifices readability for the sake of terseness – but the same could be said about any use of the assignment expression. The syntax in question is indeed a very sharp tool in the developer’s drawer, able to muddle up the codebase if abused. However, it still exists for a reason, and that reason is that sometimes these "shortcuts" do improve readability by increasing proximity of the assignment to other, relevant parts of the code.

For me specifically, the need arose when I was transforming OpenCV’s bounding boxes. The BBs are encoded as a 4-tuple[3]; I had a scenario where I needed to process both meaningfully-named, individual components of the BB, and the entire BB tuple.

The function in question contained less than 10 LoCs, so the call, in my mind, was simple. And this is probably the answer to the question posited above: when following general software engineering practices (including considering your team’s style and abilities), it’s OK to use the pattern whenever necessary and practical.

Hope this window into the possibilities offered by Python’s expanding syntax will prove useful to at least some readers. Happy coding!

PS. To be completely clear: I categorically do not claim "discovery" of this pattern, and I frankly strongly doubt I was the first to describe it. I simply haven’t seen – or don’t remember seeing – it being discussed anywhere, hence this blog entry :).

1. Don’t worry, the examples in Scala will be simple and completely explained.
2. To finalize my rant from the intro, note how this PEP – introducing a brand-new way of "converting" what was always a statement in Python to an expression – was defined and implemented much earlier than the structural pattern matching one.
3. X, Y, width, height