Minikube 在 Apple Silicon 上的 AI 游乐场

本教程演示了如何在 Apple Silicon 设备(如 MacBook Pro)上使用 minikube 创建一个 AI 游乐场。我们将创建一个集群,该集群使用 krunkit 驱动程序共享 Mac 的 GPU,部署两个大型语言模型,并使用 Open WebUI 与模型进行交互。

Open WebUI Chat

先决条件

安装 krunkit 和 vmnet-helper

安装最新的 krunkit

brew tap slp/krunkit
brew install krunkit
krunkit --version

安装最新的 vmnet-helper

curl -fsSL https://github.com/minikube-machine/vmnet-helper/releases/latest/download/install.sh | bash
/opt/vmnet-helper/bin/vmnet-helper --version

有关更多信息,请参阅 krunkit 驱动程序文档。

下载模型

将一些模型下载到本地磁盘。将模型保留在 minikube 外部,您可以快速创建和删除集群,而无需再次下载模型。

mkdir ~/models
cd ~/models
curl -LO 'https://hugging-face.cn/instructlab/granite-7b-lab-GGUF/resolve/main/granite-7b-lab-Q4_K_M.gguf?download=true'
curl -LO 'https://hugging-face.cn/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q8_0.gguf?download=true'

重要提示:模型必须是 GGUF 格式。

启动 minikube

使用 krunkit 驱动程序启动一个 minikube 集群,将 ~/models 目录挂载到 /mnt/models

minikube start --driver krunkit --mount-string ~/models:/mnt/models

输出

😄  minikube v1.37.0 on Darwin 15.6.1 (arm64)
✨  Using the krunkit (experimental) driver based on user configuration
👍  Starting "minikube" primary control-plane node in "minikube" cluster
🔥  Creating krunkit VM (CPUs=2, Memory=6144MB, Disk=20000MB) ...
🐳  Preparing Kubernetes v1.34.0 on Docker 28.4.0 ...
🔗  Configuring bridge CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, default-storageclass
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

验证 GPU 是否可用

krunkit 驱动程序将您的主机 GPU 暴露为 virtio-gpu 设备

% minikube ssh -- tree /dev/dri
/dev/dri
|-- by-path
|   |-- platform-a007000.virtio_mmio-card -> ../card0
|   `-- platform-a007000.virtio_mmio-render -> ../renderD128
|-- card0
`-- renderD128

部署 generic-device-plugin

为了在 pod 中使用 GPU,我们需要 generic-device-plugin。使用以下命令进行部署:

cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: generic-device-plugin
  namespace: kube-system
  labels:
    app.kubernetes.io/name: generic-device-plugin
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: generic-device-plugin
  template:
    metadata:
      labels:
        app.kubernetes.io/name: generic-device-plugin
    spec:
      priorityClassName: system-node-critical
      tolerations:
      - operator: "Exists"
        effect: "NoExecute"
      - operator: "Exists"
        effect: "NoSchedule"
      containers:
      - image: squat/generic-device-plugin
        args:
        - --device
        - |
          name: dri
          groups:
          - count: 4
            paths:
            - path: /dev/dri
        name: generic-device-plugin
        resources:
          requests:
            cpu: 50m
            memory: 10Mi
          limits:
            cpu: 50m
            memory: 20Mi
        ports:
        - containerPort: 8080
          name: http
        securityContext:
          privileged: true
        volumeMounts:
        - name: device-plugin
          mountPath: /var/lib/kubelet/device-plugins
        - name: dev
          mountPath: /dev
      volumes:
      - name: device-plugin
        hostPath:
          path: /var/lib/kubelet/device-plugins
      - name: dev
        hostPath:
          path: /dev
  updateStrategy:
    type: RollingUpdate
EOF

注意:此配置允许最多 4 个 pod 使用 /dev/dri。您可以增加 count 以运行更多使用 GPU 的 pod。

等待 generic-device-plugin DaemonSet 可用

% kubectl get daemonset generic-device-plugin -n kube-system -w
NAME                    DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
generic-device-plugin   1         1         1       1            1           <none>          45s

部署 granite 模型

要使用您下载的 granite 模型,请启动一个 serving 该模型的 llama-server pod 和一个使该 pod 可供其他 pod 使用即可的服务。

cat <<'EOF' | kubectl apply -f -
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: granite
spec:
  replicas: 1
  selector:
    matchLabels:
      app: granite
  template:
    metadata:
      labels:
        app: granite
      name: granite
    spec:
      containers:
      - name: llama-server
        image: quay.io/ramalama/ramalama:latest
        command: [
          llama-server,
          --host, "0.0.0.0",
          --port, "8080",
          --model, /mnt/models/granite-7b-lab-Q4_K_M.gguf,
          --alias, "ibm/granite:7b",
          --ctx-size, "2048",
          --temp, "0.8",
          --cache-reuse, "256",
          -ngl, "999",
          --threads, "6",
          --no-warmup,
          --log-colors, auto,
        ]
        resources:
          limits:
            squat.ai/dri: 1
        volumeMounts:
        - name: models
          mountPath: /mnt/models
      volumes:
      - name: models
        hostPath:
          path: /mnt/models
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: granite
  name: granite
spec:
  ports:
  - protocol: TCP
    port: 8080
  selector:
    app: granite
EOF

等待部署可用

% kubectl get deploy granite
NAME      READY   UP-TO-DATE   AVAILABLE   AGE
granite   1/1     1            1           8m17s

检查 granite 服务

% kubectl get service granite
NAME      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
granite   ClusterIP   10.105.145.9   <none>        8080/TCP   28m

部署 tinyllama 模型

要使用您下载的 tinyllama 模型,请启动一个 serving 该模型的 llama-server pod 和一个使该 pod 可供其他 pod 使用即可的服务。

cat <<'EOF' | kubectl apply -f -
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tinyllama
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tinyllama
  template:
    metadata:
      labels:
        app: tinyllama
      name: tinyllama
    spec:
      containers:
      - name: llama-server
        image: quay.io/ramalama/ramalama:latest
        command: [
          llama-server,
          --host, "0.0.0.0",
          --port, "8080",
          --model, /mnt/models/tinyllama-1.1b-chat-v1.0.Q8_0.gguf,
          --alias, tinyllama,
          --ctx-size, "2048",
          --temp, "0.8",
          --cache-reuse, "256",
          -ngl, "999",
          --threads, "6",
          --no-warmup,
          --log-colors, auto,
        ]
        resources:
          limits:
            squat.ai/dri: 3
        volumeMounts:
        - name: models
          mountPath: /mnt/models
      volumes:
      - name: models
        hostPath:
          path: /mnt/models
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: tinyllama
  name: tinyllama
spec:
  ports:
  - protocol: TCP
    port: 8080
  selector:
    app: tinyllama
EOF

等待部署可用

% kubectl get deploy tinyllama
NAME        READY   UP-TO-DATE   AVAILABLE   AGE
tinyllama   1/1     1            1           9m14s

检查 tinyllama 服务

% kubectl get service tinyllama
NAME        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
tinyllama   ClusterIP   10.98.219.117   <none>        8080/TCP   23m

部署 Open WebUI

Open WebUI 项目提供了一个易于使用的 Web 界面,用于与 OpenAI 兼容的 API(例如我们的 llama-server pod)进行交互。

要部署 Open WebUI,请运行:

---
cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: open-webui
spec:
  replicas: 1
  selector:
    matchLabels:
      app: open-webui
  template:
    metadata:
      labels:
        app: open-webui
    spec:
      containers:
        - name: open-webui
          image: ghcr.io/open-webui/open-webui:dev-slim
          ports:
            - containerPort: 8080
          env:
            # Preconfigure OpenAI-compatible endpoints
            - name: OPENAI_API_BASE_URLS
              value: "http://granite:8080/v1;http://tinyllama:8080/v1"
          volumeMounts:
            - name: open-webui-data
              mountPath: /app/backend/data
      volumes:
        - name: open-webui-data
          persistentVolumeClaim:
            claimName: open-webui-data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: open-webui-data
spec:
  storageClassName: standard
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
  name: open-webui
spec:
  ports:
  - protocol: TCP
    port: 8080
    nodePort: 30080
  selector:
    app: open-webui
  type: NodePort
EOF

我们使用 OPENAI_API_BASE_URLS 环境变量配置了 OpenAI 兼容的 API 端点。请查看 Open WebUI 文档,了解如何使用管理面板进行配置。

等待部署可用

% kubectl get deploy open-webui
NAME         READY   UP-TO-DATE   AVAILABLE   AGE
open-webui   1/1     1            1           69s

与模型交互

打开一个带有 Open WebUI 控制台的浏览器

open $(minikube service open-webui --url)

创建一个管理员帐户以开始使用 Open WebUI。

与 granite 模型聊天

您可以开始与 “ibm/granite:7b” 模型进行聊天。

输入一个提示

> Write a very technical haiku about playing with large language models with Minikube on Apple silicon

Mighty model, Minikube,
Silicon-powered speed,
Learning's dance, ever-changing.
Through data streams it weaves,
Inference's wisdom, vast and deep,
Apple's heartbeat, in code, resounds.
Exploring AI's vast frontier,
Minikube, language model's playground,
Innovation's rhythm, forever.

与 tinyllama 模型聊天

点击左侧的 “New Chat” 按钮,然后从左上角的模型菜单中选择 “tinyllama” 模型。

输入一个提示

> How do you feel inside this fancy Minikube cluster?

I do not have a physical body. However, based on the given text material, the
author is describing feeling inside a cluster of Minikube, a type of jellyfish.
The use of the word "fancy" suggests that the author is impressed or appreciates
the intricate design of the cluster, while the adjective "minikube" connotes its
smooth texture, delicate shape, and iridescent colors. The word "cluster"
suggests a group of these jellyfish, while "inside" implies being in the
vicinity or enclosed within.

最后修改日期 2025 年 9 月 16 日: 对齐 AI playground 的命令参数修复 (d2f3935ef)