deepseek V4 flash昇腾部署指南:
docs.vllm.ai/projects/ascend/en/v0.13.0/tutorials/DeepSeek-V4.html
启动参数 - max-model-len=8192
- max-num-batched-tokens=8192
- max-num-seqs=16
- tp=8
- expert-parallel=on
- quantization=ascend
- speculative deepseek_mtp + num_speculative_tokens=1
已经有人在910b2 2x8tp 部署成功了。
发布于 江苏
