Environment Variables#
vllm-kunlun uses the following environment variables to configure the system:
Environment Variables |
*Recommended value* |
*Function description* |
|---|---|---|
|
*Unsets* |
|
|
|
*Specify visible XPU Devices*. Here, 8 devices (0 to 7) are specified for inference tasks. This is required for multi-card or distributed inference. |
|
|
Enables the Moe Model *Sort Optimization*.Setting to |
|
|
Enables the *Fast SwiGLU Ops*. SwiGLU is a common activation function, and enabling this accelerates model inference. |
|
|
Enables the *Fast SwiGLU Ops*. Similar to |
|
|
Enables XMLIR (an intermediate representation/compiler) to use the *cuDNN compatible/optimized path* (which may map to corresponding XPU optimized libraries in the KunlunCore environment). |
|
|
Sets the XPU to use the default context. Typically used to simplify environment configuration and ensure runtime consistency. |
|
|
*Forces the enablement of XPU Graph mode.*. This can capture and optimize the model execution graph, significantly boosting inference performance. |
|
|
*Sets the host IP address for the vLLM service*. This uses a shell command to dynamically get the current host’s internal IP. It’s used for inter-node communication in a distributed environment. |
|
|
*Disable Mock Torch Compile Function*. Set to |
|
|
*Control whether to use the Fused QK-Norm and RoPE implementation*. Default is |