通过 Ollama 运行本地 LLM 大模型，使用 JS 库调用，支持流式输出

发布于 2024-06-26, 更新于 2025-05-26

Ollama 用于启动并运行 LLM 大型语言模型，支持 Llama 3、Phi 3、Mistral、Gemma 和其他模型。通过本地调用大模型可以降低使用成本，以及保障数据安全。

下载 Ollama

https://ollama.com/download

按操作系统下载即可，下载后使用终端命令行调用。

运行大模型

安装后，使用命令行启动大模型。支持的大模型列表在 https://ollama.com/library/ 链接中。

如 mistral 就是 Mistral AI 提供的开源模型 Mistral 7B ，使用命令行启动：

ollama run mistral

如未下载时，会触发模型下载。

使用 JS 库调用

Ollama 提供了 JavaScript Library： https://www.npmjs.com/package/ollama ，可以很方便地在 Node.js 中使用。

model 指定模型
messages 指定发送的信息，跟 Mistral AI 自身的 JS 库格式一样。
format 指定返回格式，可指定为 json

import ollama from 'ollama'

const response = await ollama.chat({
    model: 'mistral',
    messages: [
        {
            // 通过系统角色向模型发送消息，为模型提供说明或提示
            role: 'system',
            content: `
你是一位专业的英语老师，面向中文学生，请回答问题。

请通过 json 对象格式返回，包含以下字段；
1. message 字段是一个字符串，是问题本身的答案，并使用中文来进行辅助性的解释，双引号 \`"\` 需要转义为 \`\\"\`；
2. words 字段一个对象数组，列出涉及的单词，音标和单词中文释义，字段为 word 单词本身、phonetic 音标、description 详细解释；
`
        },
        {
            // 用户角度发送的消息
            role: 'user',
            content: '请问“我今天开始学英语”用英语怎么说。'
        }
    ],
    format: 'json'
})
console.log(response.message.content)

这样就可以封装在 express 等服务中，通过本地调用大模型，而不需要使用 AI 服务商的在线服务。

流式返回

支持流式返回，提升 chat 类应用的用户体验。可结合流式响应返回给前端。

import ollama from 'ollama'

const response = await ollama.chat({
    model: 'mistral',
    messages: [
        {
            // 用户角度发送的消息
            role: 'user',
            content: '你是资深的前端工程师，请问如何学习 canvas ？'
        }
    ],
    stream: true
})

// 在控制台流式输出
for await (const part of response) {
    process.stdout.write(part.message.content)
}

通过 express 流式输出

可将以上封装到 express 接口中,接收消息，流式输出回复：

async function askAi(res, message) {
    const response = await ollama.chat({
        model: 'mistral',
        messages: [
            {
                // 用户角度发送的消息
                role: 'user',
                content: message
            }
        ],
        stream: true
    })

    for await (const part of response) {
        res.write(part.message.content); // 流式传输
    }

    res.end(); // 结束 HTTP 响应
}

app.get('/ai/ask', (req, res) => {
    res.setHeader('Content-Type', 'text/plain');
    res.setHeader('Cache-Control', 'no-cache');
    res.setHeader('Connection', 'keep-alive');
    askAi(res, req.query.message);
});