【Cloudflare Workers 全端架構師之路 09】智慧篇：在邊緣運行 Llama 3 與 RAG 向量搜尋

01. 前言：為什麼要用 Workers 跑 AI？

在過去，要部署一個 AI 模型，你通常需要：

租一台昂貴的 GPU 伺服器 (AWS EC2 p3.2xlarge)。
搞定 CUDA Driver、Python 環境、Docker 容器。
處理模型載入的冷啟動問題與 API 封裝。

這對全端工程師來說門檻太高了。

Cloudflare Workers AI 改變了遊戲規則。它將這些複雜性封裝成標準的 REST API。你只需要像呼叫資料庫一樣呼叫 env.AI.run()，就能使用全球節點上的 GPU 資源。而且，它支援 Llama 3、Stable Diffusion 等主流開源模型，並且提供非常大方的免費額度。

02. 架構比較：Workers AI vs. OpenAI API

特性	Cloudflare Workers AI	OpenAI API (GPT-4)	AWS SageMaker / Bedrock
模型選擇	開源模型 (Llama, Mistral, Gemma)	閉源模型 (GPT-3.5/4)	混合 (自家 Titan + 開源)
隱私性	高 (數據不離開 Cloudflare 網路)	資料需傳送至 OpenAI	高 (VPC 內)
計費模式	神經元運算量 (Neurons) / 免費額度高	Token 數 (Input/Output)	實例時間 + 推論費
延遲	視 GPU 節點距離而定	視 OpenAI 負載而定	視部署區域而定
客製化	支援微調 (LoRA) 與 RAG	僅支援 Fine-tuning / Assistants API	極高 (可自己訓練)

架構師筆記: Workers AI 的強項在於 Inference at the Edge。如果你的應用需要低延遲的語音辨識 (Whisper) 或即時翻譯，讓運算發生在使用者附近會比傳回美國快得多。

03. 環境設定：AI 與 Vectorize

我們要實作一個 「文件問答機器人 (RAG Bot)」。它不僅能聊天，還能根據我們提供的知識庫回答問題。

步驟 1: 建立 Vectorize 索引

Vectorize 是 Cloudflare 的向量資料庫，用來儲存文字的「語意向量 (Embeddings)」。

# 建立索引，指定維度 768 (對應 bge-base-en-v1.5 模型)
npx wrangler vectorize create my-knowledge-base --dimensions=768 --metric=cosine

步驟 2: 設定 wrangler.toml

[ai]
binding = "AI"

[[vectorize]]
binding = "VECTORIZE"
index_name = "my-knowledge-base"

04. 實作 Part 1: 資料嵌入 (Embedding)

首先，我們需要一個 API 把文字資料轉換成向量並存入資料庫。

import { Hono } from 'hono'
import { Ai } from '@cloudflare/ai'

type Bindings = {
  AI: any;
  VECTORIZE: VectorizeIndex;
}

const app = new Hono<{ Bindings: Bindings }>()

app.post('/knowledge/add', async (c) => {
  const ai = new Ai(c.env.AI);
  const { text, id } = await c.req.json();

  // 1. 將文字轉為向量 (Text Embeddings)
  // 使用 @cf/baai/bge-base-en-v1.5 模型
  const { data } = await ai.run('@cf/baai/bge-base-en-v1.5', {
    text: [text]
  });
  const values = data[0];

  // 2. 存入 Vectorize
  await c.env.VECTORIZE.insert([{
    id: id || crypto.randomUUID(),
    values: values,
    metadata: { text: text } // 把原始文字也存起來，方便之後取用
  }]);

  return c.json({ message: "Indexed successfully" });
})

05. 實作 Part 2: RAG 推論 (Retrieval Augmented Generation)

現在，當使用者問問題時，我們依序執行：

把問題轉成向量。
去 Vectorize 搜尋最相似的資料 (Context)。
把 Context + 問題丟給 Llama 3 產生回答。

app.post('/chat', async (c) => {
  const ai = new Ai(c.env.AI);
  const { question } = await c.req.json();

  // 1. 把使用者的問題轉成向量
  const { data } = await ai.run('@cf/baai/bge-base-en-v1.5', {
    text: [question]
  });
  const questionVector = data[0];

  // 2. 搜尋相關知識 (Top 3)
  const vectorQuery = await c.env.VECTORIZE.query(questionVector, {
    topK: 3,
    returnMetadata: true
  });

  // 整理搜尋到的知識片段
  const context = vectorQuery.matches
    .map(match => match.metadata?.text)
    .join("\n---\n");

  console.log("Found context:", context);

  // 3. 構建 Prompt 並呼叫 Llama 3
  const systemPrompt = `You are a helpful assistant. Use the following context to answer the user's question. If you don't know, say so.
  
  Context:
  ${context}`;

  const response = await ai.run('@cf/meta/llama-3-8b-instruct', {
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: question }
    ],
    stream: true // 啟用串流輸出，讓體驗更像 ChatGPT
  });

  // 回傳 Server-Sent Events (SSE) 串流
  return new Response(response, {
    headers: { 'content-type': 'text/event-stream' }
  });
})

export default app

06. 測試與驗證

餵食資料:

curl -X POST http://localhost:8787/knowledge/add -d '{"text": "Cloudflare Workers 是基於 V8 Isolates 的邊緣運算平台。"}'
curl -X POST http://localhost:8787/knowledge/add -d '{"text": "Durable Objects 提供強一致性的狀態管理。"}'

詢問問題:

curl -X POST http://localhost:8787/chat -d '{"question": "什麼是 Durable Objects?"}'

你會發現 Llama 3 能準確回答出你剛剛餵進去的定義，而不是胡言亂語。這就是 RAG 的威力。

07. 進階應用：多模態 AI

Workers AI 不只能處理文字。你還可以：

語音轉文字 (ASR): 使用 @cf/openai/whisper 處理上傳的錄音檔。
文字轉圖片 (Text-to-Image): 使用 @cf/stabilityai/stable-diffusion-xl-base-1.0 生成圖片並存到 R2。

// 簡單的繪圖 API
app.get('/imagine', async (c) => {
  const ai = new Ai(c.env.AI);
  const prompt = c.req.query('prompt');

  const inputs = { prompt: prompt };
  const response = await ai.run('@cf/stabilityai/stable-diffusion-xl-base-1.0', inputs);

  return new Response(response, {
    headers: { 'content-type': 'image/png' }
  });
})

08. 小結與下一步

我們現在已經具備了打造「智慧應用」的能力。透過 Workers AI 與 Vectorize，你可以在完全沒有 Python 背景知識的情況下，構建出具有語意搜尋、內容生成能力的複雜系統。

至此，我們已經學完了所有的積木：

核心: Workers, Hono
資料: KV, D1, R2, Vectorize
狀態: Durable Objects
非同步: Queues
智慧: Workers AI

是時候把這些積木組裝成一座城堡了。

在下一篇 Part 10: 實戰篇 (完結)，我們將進行終極整合。我們將採用 微服務 (Microservices) 架構，利用 Service Bindings 將 Auth、Order、AI Chatbot 等不同功能的 Worker 串聯起來，打造一個真正企業級的「全球化電商平台」。