High-performance computing patterns for C++20 including cache-friendly data structures, SIMD vectorization, memory management, thread parallelism, lock-free data structures, and NUMA-aware allocation.
View on GitHubysyecust/everything-claude-code
everything-claude-code
skills/hpc-patterns/SKILL.md
January 25, 2026
Select agents to install to:
npx add-skill https://github.com/ysyecust/everything-claude-code/blob/main/skills/hpc-patterns/SKILL.md -a claude-code --skill hpc-patternsInstallation paths:
.claude/skills/hpc-patterns/# HPC Patterns for C++20
Domain knowledge for building high-performance computing applications with optimal hardware utilization.
## Cache-Friendly Data Structures
### Structure of Arrays (SoA) vs Array of Structures (AoS)
```cpp
// BAD: Array of Structures (poor cache utilization for position-only access)
struct ParticleAoS {
double x, y, z; // position
double vx, vy, vz; // velocity
double fx, fy, fz; // force
double mass;
int type;
bool active;
};
std::vector<ParticleAoS> particles(N); // Stride = sizeof(ParticleAoS)
// GOOD: Structure of Arrays (contiguous access per field)
struct ParticlesSoA {
std::vector<double> x, y, z;
std::vector<double> vx, vy, vz;
std::vector<double> fx, fy, fz;
std::vector<double> mass;
std::vector<int> type;
std::vector<bool> active;
explicit ParticlesSoA(size_t n)
: x(n), y(n), z(n), vx(n), vy(n), vz(n),
fx(n), fy(n), fz(n), mass(n), type(n), active(n) {}
};
```
### Cache Line Alignment
```cpp
// Align to cache line boundaries
struct alignas(64) CacheAlignedBlock {
std::array<double, 8> data; // 64 bytes = 1 cache line
};
// Padding to avoid false sharing in multithreaded code
struct alignas(64) ThreadLocalCounter {
std::atomic<int64_t> count{0};
char padding[64 - sizeof(std::atomic<int64_t>)]; // Fill cache line
};
```
### Tiling for Cache Reuse
```cpp
// Cache-oblivious matrix multiply (blocked)
void MatMulBlocked(std::span<const double> A, std::span<const double> B,
std::span<double> C, int N, int block_size = 64) {
for (int ii = 0; ii < N; ii += block_size) {
for (int jj = 0; jj < N; jj += block_size) {
for (int kk = 0; kk < N; kk += block_size) {
int i_end = std::min(ii + block_size, N);
int j_end = std::min(jj + block_size, N);
int k_end = std::min(kk + block_size, N);
for (int i = ii; i < i_end; ++i) {
for (int k = kk; k < k_end; ++k) {
double a_ik = A[i * N + k];