Locality
Accelerate memory access by arranging data to take advantage of CPU caching.
While Moore’s law has been making processors faster, memory access has plateaued in speed: we can process data faster with each passing year, but we can’t get it much faster.
With today’s hardware, it can take hundreds of cycles to fetch a byte of data from RAM.
As memory access speeds stalled, hardware engineers found CPU caching to make memory access faster. Modern computers have memory inside the chip. It’s extremely small (sometimes called SRAM, for Static RAM, or L1 cache). It might hold 128 bytes. When the CPU seeks for data in RAM, it pulls the data you want along with its adjacent 128 bytes of data. If the next instruction operates on data in the cache, it’s called a cache hit – it already has the data it needs in its cache and is ready to move on. If it doesn’t, it has to go back to RAM, which can take a few hundred cycles:
If we get a cache hit, our code looks like this:
for (const auto& item: items) {
.doStuff();
item}
If we get a cache miss, it’s more like this:
for (const auto& item: items) {
();
sleepFor500Cycles.doStuff();
item}
It goes without saying that we want to get rid of that
sleepFor500Cycles
call.
The author wrote the same programs, but with one missing the cache and the other hitting the cache. The result? The programs that hit the cache were 50x faster.
Data is performance.
This is why people prefer arrays to lists – the next item of a list might be in a random place in memory, which causes a cache miss while iterating. A vector contiguously lays out its items, resulting in fewer cache misses.
Modern CPUs have caches to speed up memory access adjacent to recently accessed memory. Keep the data in contiguous memory in the order that you process it.
You want to use it to speed up memory accesses, but profile before you employ this technique. Cachegrind is a good profiling tool to check for cache usage.
In C++ and higher level languages, abstractions generally cost pointer accesses. A pointer access is almost always a cache miss. Thus, you’ll have to sacrifice some inheritance, interfaces, and other niceties to use this pattern. It’s a tradeoff of simplicity for performance.
Instead of doing something like this:
class GameEntity { /* Implementation here */ }
class AIComponent {}
class PhysicsComponent {}
class RenderComponent {}
while (!gameOver) {
// Process AI.
for (int i = 0; i < numEntities; i++) {
[i]->ai()->update();
entities}
// Update physics.
for (int i = 0; i < numEntities; i++) {
[i]->physics()->update();
entities}
// Draw to screen.
for (int i = 0; i < numEntities; i++) {
[i]->render()->render();
entities}
// Other game loop machinery for timing...
}
Which thrashes the cache, we’ll put everything in an array of
AiComponents
, PhysicsComponents
and
RenderComponents
.
* aiComponents = new AIComponent[MAX_ENTITIES];
AIComponent* physicsComponents = new PhysicsComponent[MAX_ENTITIES];
PhysicsComponent* renderComponents = new RenderComponent[MAX_ENTITIES]; RenderComponent
Instead of using pointers, we’ll directly update each.
while (!gameOver) {
// Process AI.
for (int i = 0; i < numEntities; i++) {
[i].render();
aiComponents}
// Update physics.
for (int i = 0; i < numEntities; i++) {
[i].render();
physicsComponents}
// Draw to screen.
for (int i = 0; i < numEntities; i++) {
[i].render();
renderComponents}
// Other game loop machinery for timing...
}
We can do the same with our Particles too:
class Particle {}
Instead of adding a flag if its active and checking on that flag:
for (int i = 0; i < numParticles_; i++) {
if (particles_[i].isActive()) {
particles_[i].update();
}
}
We sort the active particles to the beginning of the buffer (or we have two sets, the active and inactive)
Then, we have no branching at all:
for (int i = 0; i < numActive_; i++) {
[i].update();
particles}
What happens if you have a game entity, where it has some commonly used fields and some uncommonly used fields?
class AIComponent {
public:
void update() {}
private:
// commonly used
* animation_;
Animationdouble energy_;
goalPos_;
Vector // uncommonly used
drop_;
LootType int minDrops_;
int maxDrops_;
double chanceOfDrop_;
};
Split it into two separate pieces, and point to the cold part with a pointer:
class AIComponent {
public:
// Methods...
private:
* animation_;
Animationdouble energy_;
goalPos_;
Vector
* loot_;
LootDrop};
class LootDrop {
friend class AIComponent;
drop_;
LootType int minDrops_;
int maxDrops_;
double chanceOfDrop_;
};
This can be helpful, but for less clear cut cases, less useful.
This pattern is about a mindset, and applying it to code that’s slow.
Don’t
Use Separate arrays for each type
Use a collection of pointers
If game entities are classes with pointers to their components
If game entities are classes with IDs for their components
If the game entity itself is just an ID
Prev: service-locator Next: dirty-flag