Supported Models

Supported Models#

Generative Models#

Model

Support

INT8(W8A8)

AWQ(W4A16)

GPTQ(WNA16)

LoRA

Tensor Parallel

Expert Parallel

Data Parallel

Kunlun Graph

Qwen2

✅

✅

✅

✅

✅

✅

✅

✅

Qwen2.5

✅

✅

✅

✅

✅

✅

✅

✅

Qwen3

✅

✅

✅

✅

✅

✅

✅

✅

Qwen3-Moe

✅

✅

✅

✅

✅

✅

✅

✅

Qwen3-Next

✅

✅

✅

✅

✅

✅

✅

✅

MiMo-V2-Flash

✅

✅

✅

✅

✅

✅

✅

Llama2

✅

✅

✅

✅

✅

✅

✅

✅

Llama3

✅

✅

✅

✅

✅

✅

✅

✅

Llama3.1

✅

✅

✅

✅

✅

✅

✅

gpt-oss

✅

✅

✅

✅

✅

GLM4.5

✅

✅

✅

✅

✅

✅

✅

GLM4.5Air

✅

✅

✅

✅

✅

✅

✅

GLM4.7

✅

✅

✅

✅

✅

✅

✅

GLM5

✅

✅

✅

✅

✅

✅

✅

Kimi-K2

✅

-

✅

-

✅

✅

✅

DeepSeek-R1

✅

✅

✅

✅

✅

✅

✅

DeepSeek-V3

✅

✅

✅

✅

✅

✅

✅

DeepSeek-V3.2

✅

✅

✅

✅

✅

✅

✅

Multimodal Language Models#

Model

Support

INT8(W8A8)

AWQ(W4A16)

GPTQ(WNA16)

LoRA

Tensor Parallel

Expert Parallel

Data Parallel

Kunlun Graph

Qwen2-VL

✅

✅

✅

✅

✅

✅

✅

✅

Qwen2.5-VL

✅

✅

✅

✅

✅

✅

✅

✅

Qwen3-VL

✅

✅

✅

✅

✅

✅

✅

✅

Qwen3-VL-MoE

✅

✅

✅

✅

✅

✅

✅

✅

Qwen3-Omni-MoE

✅

✅

✅

✅

✅

✅

✅

✅

InternVL-2.5

✅

✅

✅

✅

✅

✅

✅

InternVL-3.5

✅

✅

✅

✅

✅

✅

✅

InternS1

✅

✅

✅

✅

✅

✅

✅