Description
Key Skills: C, C++, Python, Shell Scripting, AI Model Inference, AI Compiler, TVM, TensorRT, XLA, Glow, OpenVINO, Quantization, AI Deployment, Embedded Systems, ONNX, PyTorch, TensorFlow, FPGA, Runtime Optimization, Performance Profiling, Embedded Boards
Roles & Responsibilities:
- Develop and optimize AI model inference pipelines for embedded platforms
- Work on AI model compilation, runtime execution, and embedded software stack integration
- Optimize performance, latency, and resource utilization for AI workloads on embedded devices
- Integrate and work with AI compilers such as TVM, TensorRT, XLA, Glow, or OpenVINO
- Implement and evaluate quantization techniques and accuracy-performance tradeoffs
- Debug and resolve performance bottlenecks across compiler, runtime, and model layers
- Support AI deployment stacks and optimize inference execution workflows
- Work with ONNX, PyTorch, and TensorFlow model export and inference pipelines
- Run, profile, and benchmark AI models on embedded boards and FPGA platforms
- Analyze profiling and performance data to improve runtime efficiency and model behavior
- Collaborate with cross-functional engineering teams for embedded AI solution development
- Maintain technical documentation and support continuous optimization initiatives
Experience Required:
- 4+ years of experience in Embedded AI, AI Compiler, or AI Runtime Engineering
- Strong proficiency in C, C++, Python, and Shell Scripting
- Deep understanding of AI inference flow from model compilation to runtime execution
- Hands-on experience with AI compilers such as TVM, TensorRT, XLA, Glow, or OpenVINO
- Experience with quantization and inference optimization techniques
- Knowledge of ONNX, PyTorch, and TensorFlow inference export pipelines
- Hands-on experience with embedded boards or FPGA platforms
- Strong analytical, debugging, and performance optimization skills
Education: Any Graduation