11
Sep

# Performance Optimization, SIMD and Cache

Hello! My name is Sergiy Migdalskiy. I’m a programmer at Valve. I’m going to talk about Performance Optimization Strategies today This is a 10’000 foot overview of the little things that have big impact on performance. I will never forget my first computer. Like the Soviet reality behind my window, it was very austere. It didn’t even have a real keyboard. It felt very fast.
can pipeline well with computations, some gather-scatter activity can hide behind the math The 1 op per byte rule is handy here. If you limit how much you load, you also limit how much you shuffle But it’s still best to avoid it as much as you can. The inputs and outputs of SIMDized algorithm are good candidate for gather-scatter Because it’s usually unreasonable to build an API with pre-interleaved structures. But when I have full control over design of data structure, and API is not an issue, [and perf is] I interleave the data as much as I can. After I do this, I replace all operations with their SIMD equivalents. This is the part where I actually start using SIMD operations. Sometimes I templatize code and data by type float. This function computes an approximate Givens rotation If you pass float4 as template argument, it will compute 4 Givens rotations. I wouldn’t recommend it if you are not very familiar with SIMD. Also, there’s a tool that does it for you, it’s called ISPC I take the scalar data structures, and I replace all types and operations with their SIMD equivalents. Float becomes 4 floats, int becomes 4 ints and so on. And it just works! This is an example of a scalar-path vs SIMD-path struct in a cloth solver. Of course, when you compute 4 of something in parallel, there better be no dependencies between those 4 things. It takes some forethought. Some problems are notoriously hard, for example some matrix factorizations. And some are easy, like this one Here, one quad is solved per lane as a single block. Same with rods. I just make sure the quads and rods don’t touch, that’s all. Works like a charm. So, to SIMDize something, you need 4 or 8 items that need similar processing, and they must be independent of each other There’s a myth I’ve heard: that SIMD is always a premature optimizatio You shouldn’t write SIMD right away but when designing data structures and algorithm without SIMD in mind it’s hard to separate data into independent homogenous lanes cleanly afterwards. If you don’t do that cleanly, you have to waste time shuffling and preparing data which destroys the benefits of SIMDizing. There’s no easy recipe for SIMDizing 4 things with differing computations. But here are a couple of recipes of how to handle branches. First is the most commonly used and that’s what happens on GPUs. If you have a condition, you execute both the “if” and “else” branches for all elements, and then select one of them. Everything that’s not selected is wasted computation but it can easily be faster than running the scalar path with branches Here’s a another branch trick. You just sort elements by conditional. One list for “if”, one list for “else”. You can even SIMD sort it 8 at a time, without branches. Then you run “if” and “else” code only on elements that need it. No wasted computations. Obviously, the same caveats about ordering and dependent elements apply. This is harder to pull off, but it’s a much more powerful technique. You’ll often have almost 4x perf with SIMD Or 8 times in case of AVX. That is, if your problem is separable And if you lay out your data right. And if you plan for SIMD ahead of time, both code and data SIMDization is very problem dependent. Compilers are very bad at auto-vectorizing nontrivial code. It’s very hard to SIMDize something that wasn’t meant for it. Or if the problem is not separable by itself. But 2/3 of CS:GO players have CPUs with AVX support. So the future is bright for SIMD techniques. I’d like to extend the Special Thanks to my art team. Which is :3D Art by Anna Bibikova And 2D Art by Heather Campbell Thank you very much!

• iestyn Bleasdale-Shepherd says:

Bravo! Best animations for a technical talk that I have seen 🙂

• Paulo Zaffari says:

Really good presentation.
All who want to work as engine programmers should see it.
It wouldn't be bad if game programmers saw it too.

• Andy Robbins says:

Great job!

• Sigi says:

Wonderful easy to understand video with nice visualisation. Now when do you do a follow up talk on the hard parts? 🙂

• Alexandru Ene says:

A really nice and informative presentation. Thanks for taking the time and recording this :).

• Matthew Booth says:

Really interesting talk by a valve developer on the importance of optimising for cache. Python must make CPU developers cry.

• Wesley Robb says:

Great presentation. Thank you for taking the time to share these valuable insights and your experience with us! Some 5 min talks on specific topics would be awesome.

• Trogzul says:

Must say, great video. Especially for me as someone who is studying game programming, this was very enlightening, and much easier to follow than a lot of the other stuff I have read/watched. So, thank you very much 🙂

• András Gajdács says:

Someone got something working @31:56 😀

• Gordon Hanzmann-Johnson says:

31:50 "So we talked about how CPU works and then about a weird little trick that can help you lose 32 bytes in a day or less." haha!

Thanks for the video, I learned a lot.

• Gordon Hanzmann-Johnson says:

31:50 "So we talked about how CPU works and then about a weird little trick that can help you lose 32 bytes in a day or less." haha!

Thanks for the video, I learned a lot.

• Валентин says:

Сергей, наверное круто работать в valve..

• Tamas Ionut says:

@Sergiy Migdalskiy @19:30 How can you check if a cache line is being used by another core? Is there a way to check for the MESI/MOESI/MESIF bits?
PS: Great presentation, btw.

• Sandeep Chandappillai says:

Highly Enlightening !
Thank you .

• Popescu Alexandru-Cristian says:

Great talk. I especially liked the relative pointers and the Big O sections.

• Dieter Plaetinck says:

Wow, thanks for sharing, Sergiy and Valve. I'm a big fan of valve games.

• Олег says:

Thanks, really useful. Is any chance to see more such kind of videos in future?

• YouTube SEO Agency says:

Fun vids, you make my day.
Keep creating vids, need more like you

• Hydroper Hy says:

Thanks 😀 This will help me understand SIMD.

• SAS1122334455 says:

классное видео
pls do more like this!

• Jasper Law says:

Thank you so much! This is not only invaluable, but also really, really interesting.

• Bestmann3n says:

Do you have any recommended in-depth resources for the topics you covered? Great talk by the way, thank you!

• Mike Garcia says:

Good talk!

• satellite964 says:

teach us master!