llama.cpp
Ollama is an open-source AI model server. It can get and run large language models (LLMs) locally on your machine.
Install
sudo pacman -Syuu base-devel cmake gcc python3 rocm-hip-sdk rocm-opencl-sdk rocm-opencl-runtime rocm-ml-libraries rocm-device-libs
sudo usermod -a -G render,video $(whoami)
sudo reboot
rocminfo
pamac install llama.cpp-bin
# pamac install llama.cpp-hip
# pamac install llama.cpp-vulkan-bin
Old GPUs
# If you are using RDNA or RDNA 2 architecture like AMD Radeon RX 6500 XT, you may need to follow this step:
# Add the following lines:
nano ~/.zshrc
export HSA_OVERRIDE_GFX_VERSION=10.3.0
export ROC_ENABLE_PRE_VEGA=1
export ROCM_PATH=/opt/rocm
export VLLM_USE_TRITON_FLASH_ATTN=0
export TORCH_USE_HIP_DSA=1
export HIP_VISIBLE_DEVICES=0
export PYTORCH_ROCM_ARCH=gfx1030
source .zshrc
Run
llama-server -hf unsloth/GLM-4.7-Flash-GGUF:Q2_K_XL