短笔记2026-06-151 分钟

魔塔

记录一个主题的核心概念、示例和延伸问题。

阅读与写作学习, 笔记

魔塔部署Gemma4

1、注册魔塔

魔塔

2、注册amd开发者

amd

3、魔塔配置和启动云环境

兑换amd算力

4、部署&运行 Gemma4 大模型

确认云环境和模型目录
下载 Gemma4 模型
启动 vLLM 服务
打开新终端进行对话测试

确认云环境和模型目录

进入终端Terminal
粘贴运行指令，检查当前 GPU 是否可用 amd-smi
复制 Python 命令，粘贴命令运行，确认 PyTorch 能识别 AMD GPU

text

python -c "import torch; print('PyTorch:', torch.__version__); print('ROCm available:', torch.cuda.is_available()); print('Device:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'N/A')"

下载 Gemma4 模型、

为了提升国内环境下的依赖下载速度，先把 pip 源切换到腾讯云镜像

text

pip config set global.index-url https://mirrors.cloud.tencent.com/pypi/simple/

安装魔搭 ModelScope

text

pip install modelscope

复制命令，粘贴命令运行，下载 Gemma4 模型到当前目录

text

modelscope download --model google/gemma-4-E4B-it --cache_dir "./models"

复制命令，粘贴命令运行，确认 Gemma4 模型模型文件完整下载成功

text

ls -lh ./models/google/gemma-4-E4B-it/

启动 vLLM 服务

text

uv pip uninstall --system torchvision torchaudio

uv pip install --system 'vllm==0.23.0+rocm723' torchvision torchaudio 'fastapi[standard]==0.136.0' \
  --no-cache \
  --index-url https://mirrors.aliyun.com/pypi/simple/ \
  --extra-index-url https://wheels.vllm.ai/rocm/ \
  -U

text

vllm serve ./models/google/gemma-4-E4B-it/ --served-model-name gemma-4-E4B-it

打开新终端进行对话测试

text

vllm chat --url http://localhost:8000/v1 --model gemma-4-E4B-it