Description
Key Skills: GPU Architecture, Performance Modeling, C++, Python, Hardware Simulation, Silicon Characterization, GPU Optimization, Performance Analysis, Pre-silicon Modeling, Post-silicon Debug
Good to Have Skills: Experience with AI/LLM workloads, graphics pipelines, compute shaders, ray tracing, rasterization, Transformer models, Large Language Models, Vision models, PyTorch, Triton, Linux tracing frameworks, performance counters, register dumps, PCIe, SIMD/SIMT models, cache hierarchy management, memory technologies, high-bandwidth interconnects, parallel training/inference strategies, kernel execution graphs.
Roles & Responsibilities:
- Leverage and maintain highly accurate, modular cycle-accurate or cycle-approximate simulators for key GPU subsystems including Shader Engines, Cache Hierarchies, Memory Subsystems, and Interconnects.
- Define and execute rigorous simulation experiments to evaluate proposed GPU configurations, scaling limits, and trade-offs while providing data-driven recommendations backed by thorough sensitivity analyses.
- Trace, analyze, and profile complex workloads to extract structural execution footprints and translate these insights into microarchitectural bottlenecks.
- Profile and optimize performance for advanced generative AI and LLM topologies while identifying bottlenecks across the compute engine, local memory hierarchy, and SoC fabrics.
- Analyze execution efficiency across graphics shaders and compute-heavy pipelines to maximize execution unit utilization and minimize latency.
- Partner with compiler, runtime, and software framework teams to implement and recommend optimizations for better performance.
- Lead efforts to execute workloads on early silicon, capture performance telemetry, and systematically correlate results back to pre-silicon performance models.
- Root-cause hardware-software execution mismatches and unexpected performance drops on physical silicon using low-level performance counters and debugging tools.
- Act as the technical bridge between hardware design and software stacks, translating high-level workload requirements into clear hardware architectural constraints.
Experience Required: 5 to 10 years of industry experience in silicon performance engineering, GPU modeling, or microarchitecture design.
Education: B.E. / B.Tech. / M.Tech. in relevant engineering discipline