一部长达 1000 页的 O’Reilly 新书《AI Systems Performance Engineering》已荣登 Amazon “computer hardware & architecture” 类别榜首。这也是 O’Reilly 历史上最长的一本书,全面解读了现实 AI 系统中硬件与软件如何协同工作,直击鲜少讨论但影响深远的性能痛点。书中重点内容包括:
1. Introduction and AI System Overview
2. AI System Hardware Overview
3. OS, Docker, and Kubernetes Tuning
4. Tuning Distributed Networking Communication
5. GPU-based Storage I/O Optimizations
6. GPU Architecture, CUDA Programming, and Maximizing Occupancy
7. Profiling and Tuning GPU Memory Access Patterns
8. Occupancy Tuning, Warp Efficiency, and Instruction-Level Parallelism
9. Increasing CUDA Kernel Efficiency and Arithmetic Intensity
10. Intra-Kernel Pipelining and Cooperative Thread Block Clusters
11. Inter-Kernel Pipelining and CUDA Streams
12. Dynamic and Device-Side Kernel Orchestration
13. Profiling, Tuning, and Scaling PyTorch
14. PyTorch Compiler, XLA, and OpenAI Triton Backends
15. Multi-Node Inference Parallelism and Routing
16. Profiling, Debugging, and Tuning Inference at Scale
17. Scaling Disaggregated Prefill and Decode
18. Advanced Prefill-Decode and KV Cache Tuning
19. Dynamic and Adaptive Inference Engine Optimizations
20. AI-Assisted Performance Optimizations
作者曾在 AWS 和 Databricks 担任工程领导,业内多位专家均给予高度好评。无论是从硬件调优、操作系统与容器编排,到 GPU 编程和 PyTorch 优化,全方位解析了现代 AI 系统性能的每个环节,为企业实现技术突破和成本节约提供了宝贵指导。现已开启预订,实体书将于下月发布,O’Reilly 订阅用户也可提前阅读电子版。
原推文链接:x.com/Hesamation/status/1980758359226593518
这本书不仅回答了现代 AI 系统在性能上几乎所有的疑问,也为关注 AI 经济学转型带来新的思考和突破。
