Minikube 在 Apple Silicon 上的 AI 游乐场
本教程演示了如何在 Apple Silicon 设备(如 MacBook Pro)上使用 minikube 创建一个 AI 游乐场。我们将创建一个集群,该集群使用 krunkit 驱动程序共享 Mac 的 GPU,部署两个大型语言模型,并使用 Open WebUI 与模型进行交互。

先决条件
- Apple Silicon Mac
- krunkit v1.0.0 或更高版本
- vmnet-helper v0.6.0 或更高版本
- generic-device-plugin
- minikube v1.37.0 或更高版本 (仅限 krunkit 驱动程序)
安装 krunkit 和 vmnet-helper
安装最新的 krunkit
brew tap slp/krunkit
brew install krunkit
krunkit --version
安装最新的 vmnet-helper
curl -fsSL https://github.com/minikube-machine/vmnet-helper/releases/latest/download/install.sh | bash
/opt/vmnet-helper/bin/vmnet-helper --version
有关更多信息,请参阅 krunkit 驱动程序文档。
下载模型
将一些模型下载到本地磁盘。将模型保留在 minikube 外部,您可以快速创建和删除集群,而无需再次下载模型。
mkdir ~/models
cd ~/models
curl -LO 'https://hugging-face.cn/instructlab/granite-7b-lab-GGUF/resolve/main/granite-7b-lab-Q4_K_M.gguf?download=true'
curl -LO 'https://hugging-face.cn/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q8_0.gguf?download=true'
重要提示:模型必须是 GGUF 格式。
启动 minikube
使用 krunkit 驱动程序启动一个 minikube 集群,将 ~/models 目录挂载到 /mnt/models
minikube start --driver krunkit --mount-string ~/models:/mnt/models
输出
😄 minikube v1.37.0 on Darwin 15.6.1 (arm64)
✨ Using the krunkit (experimental) driver based on user configuration
👍 Starting "minikube" primary control-plane node in "minikube" cluster
🔥 Creating krunkit VM (CPUs=2, Memory=6144MB, Disk=20000MB) ...
🐳 Preparing Kubernetes v1.34.0 on Docker 28.4.0 ...
🔗 Configuring bridge CNI (Container Networking Interface) ...
🔎 Verifying Kubernetes components...
▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟 Enabled addons: storage-provisioner, default-storageclass
🏄 Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
验证 GPU 是否可用
krunkit 驱动程序将您的主机 GPU 暴露为 virtio-gpu 设备
% minikube ssh -- tree /dev/dri
/dev/dri
|-- by-path
| |-- platform-a007000.virtio_mmio-card -> ../card0
| `-- platform-a007000.virtio_mmio-render -> ../renderD128
|-- card0
`-- renderD128
部署 generic-device-plugin
为了在 pod 中使用 GPU,我们需要 generic-device-plugin。使用以下命令进行部署:
cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: generic-device-plugin
namespace: kube-system
labels:
app.kubernetes.io/name: generic-device-plugin
spec:
selector:
matchLabels:
app.kubernetes.io/name: generic-device-plugin
template:
metadata:
labels:
app.kubernetes.io/name: generic-device-plugin
spec:
priorityClassName: system-node-critical
tolerations:
- operator: "Exists"
effect: "NoExecute"
- operator: "Exists"
effect: "NoSchedule"
containers:
- image: squat/generic-device-plugin
args:
- --device
- |
name: dri
groups:
- count: 4
paths:
- path: /dev/dri
name: generic-device-plugin
resources:
requests:
cpu: 50m
memory: 10Mi
limits:
cpu: 50m
memory: 20Mi
ports:
- containerPort: 8080
name: http
securityContext:
privileged: true
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
- name: dev
mountPath: /dev
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: dev
hostPath:
path: /dev
updateStrategy:
type: RollingUpdate
EOF
注意:此配置允许最多 4 个 pod 使用 /dev/dri。您可以增加 count 以运行更多使用 GPU 的 pod。
等待 generic-device-plugin DaemonSet 可用
% kubectl get daemonset generic-device-plugin -n kube-system -w
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
generic-device-plugin 1 1 1 1 1 <none> 45s
部署 granite 模型
要使用您下载的 granite 模型,请启动一个 serving 该模型的 llama-server pod 和一个使该 pod 可供其他 pod 使用即可的服务。
cat <<'EOF' | kubectl apply -f -
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: granite
spec:
replicas: 1
selector:
matchLabels:
app: granite
template:
metadata:
labels:
app: granite
name: granite
spec:
containers:
- name: llama-server
image: quay.io/ramalama/ramalama:latest
command: [
llama-server,
--host, "0.0.0.0",
--port, "8080",
--model, /mnt/models/granite-7b-lab-Q4_K_M.gguf,
--alias, "ibm/granite:7b",
--ctx-size, "2048",
--temp, "0.8",
--cache-reuse, "256",
-ngl, "999",
--threads, "6",
--no-warmup,
--log-colors, auto,
]
resources:
limits:
squat.ai/dri: 1
volumeMounts:
- name: models
mountPath: /mnt/models
volumes:
- name: models
hostPath:
path: /mnt/models
---
apiVersion: v1
kind: Service
metadata:
labels:
app: granite
name: granite
spec:
ports:
- protocol: TCP
port: 8080
selector:
app: granite
EOF
等待部署可用
% kubectl get deploy granite
NAME READY UP-TO-DATE AVAILABLE AGE
granite 1/1 1 1 8m17s
检查 granite 服务
% kubectl get service granite
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
granite ClusterIP 10.105.145.9 <none> 8080/TCP 28m
部署 tinyllama 模型
要使用您下载的 tinyllama 模型,请启动一个 serving 该模型的 llama-server pod 和一个使该 pod 可供其他 pod 使用即可的服务。
cat <<'EOF' | kubectl apply -f -
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: tinyllama
spec:
replicas: 1
selector:
matchLabels:
app: tinyllama
template:
metadata:
labels:
app: tinyllama
name: tinyllama
spec:
containers:
- name: llama-server
image: quay.io/ramalama/ramalama:latest
command: [
llama-server,
--host, "0.0.0.0",
--port, "8080",
--model, /mnt/models/tinyllama-1.1b-chat-v1.0.Q8_0.gguf,
--alias, tinyllama,
--ctx-size, "2048",
--temp, "0.8",
--cache-reuse, "256",
-ngl, "999",
--threads, "6",
--no-warmup,
--log-colors, auto,
]
resources:
limits:
squat.ai/dri: 3
volumeMounts:
- name: models
mountPath: /mnt/models
volumes:
- name: models
hostPath:
path: /mnt/models
---
apiVersion: v1
kind: Service
metadata:
labels:
app: tinyllama
name: tinyllama
spec:
ports:
- protocol: TCP
port: 8080
selector:
app: tinyllama
EOF
等待部署可用
% kubectl get deploy tinyllama
NAME READY UP-TO-DATE AVAILABLE AGE
tinyllama 1/1 1 1 9m14s
检查 tinyllama 服务
% kubectl get service tinyllama
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
tinyllama ClusterIP 10.98.219.117 <none> 8080/TCP 23m
部署 Open WebUI
Open WebUI 项目提供了一个易于使用的 Web 界面,用于与 OpenAI 兼容的 API(例如我们的 llama-server pod)进行交互。
要部署 Open WebUI,请运行:
---
cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: open-webui
spec:
replicas: 1
selector:
matchLabels:
app: open-webui
template:
metadata:
labels:
app: open-webui
spec:
containers:
- name: open-webui
image: ghcr.io/open-webui/open-webui:dev-slim
ports:
- containerPort: 8080
env:
# Preconfigure OpenAI-compatible endpoints
- name: OPENAI_API_BASE_URLS
value: "http://granite:8080/v1;http://tinyllama:8080/v1"
volumeMounts:
- name: open-webui-data
mountPath: /app/backend/data
volumes:
- name: open-webui-data
persistentVolumeClaim:
claimName: open-webui-data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: open-webui-data
spec:
storageClassName: standard
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
name: open-webui
spec:
ports:
- protocol: TCP
port: 8080
nodePort: 30080
selector:
app: open-webui
type: NodePort
EOF
我们使用 OPENAI_API_BASE_URLS 环境变量配置了 OpenAI 兼容的 API 端点。请查看 Open WebUI 文档,了解如何使用管理面板进行配置。
等待部署可用
% kubectl get deploy open-webui
NAME READY UP-TO-DATE AVAILABLE AGE
open-webui 1/1 1 1 69s
与模型交互
打开一个带有 Open WebUI 控制台的浏览器
open $(minikube service open-webui --url)
创建一个管理员帐户以开始使用 Open WebUI。
与 granite 模型聊天
您可以开始与 “ibm/granite:7b” 模型进行聊天。
输入一个提示
> Write a very technical haiku about playing with large language models with Minikube on Apple silicon
Mighty model, Minikube,
Silicon-powered speed,
Learning's dance, ever-changing.
Through data streams it weaves,
Inference's wisdom, vast and deep,
Apple's heartbeat, in code, resounds.
Exploring AI's vast frontier,
Minikube, language model's playground,
Innovation's rhythm, forever.
与 tinyllama 模型聊天
点击左侧的 “New Chat” 按钮,然后从左上角的模型菜单中选择 “tinyllama” 模型。
输入一个提示
> How do you feel inside this fancy Minikube cluster?
I do not have a physical body. However, based on the given text material, the
author is describing feeling inside a cluster of Minikube, a type of jellyfish.
The use of the word "fancy" suggests that the author is impressed or appreciates
the intricate design of the cluster, while the adjective "minikube" connotes its
smooth texture, delicate shape, and iridescent colors. The word "cluster"
suggests a group of these jellyfish, while "inside" implies being in the
vicinity or enclosed within.