跨设备本地运行幻方量化的 DeepSeek-LLM-7B-Chat 大模型

DeepSeek-LLM-7B-Chat 是幻方量化成立的深度求索公司训练的一个包含70亿参数的高级大语言模型，它用英文和中文的两万亿个token进行了训练。DeepSeek LLM 7B/67B Base 和 DeepSeek LLM 7B/67B Chat 版本已开源，小伙伴们可以下载体验。

本文中，我们将介绍：

如何在自己的设备上运行 DeepSeek-LLM-7B-Chat
如何为 DeepSeek-LLM-7B-Chat 创建一个与 OpenAI 兼容的 API 服务

我们将使用 Rust + Wasm 技术栈开发和部署此模型的应用程序。不需要安装复杂的 Python 包或 C++ 工具链！了解我们选择这个技术栈的原因。

在自己的设备上运行 DeepSeek-LLM-7B-Chat 模型

步骤1：通过以下命令行安装 WasmEdge。

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml

步骤2：下载 DeepSeek-LLM-7B-Chat 模型 GGUF 文件。由于模型的大小为几个 GB，下载可能需要一定时间。

curl -LO https://huggingface.co/second-state/Deepseek-LLM-7B-Chat-GGUF/resolve/main/deepseek-llm-7b-chat.Q5_K_M.gguf

步骤3：下载用于聊天应用的跨平台可移植 Wasm 文件。该应用程序让你能用命令行与模型交谈。该应用的 Rust 源代码在这里。

curl -LO https://github.com/second-state/llama-utils/raw/main/chat/llama-chat.wasm

就是这样。可以通过输入以下命令在终端与模型进行聊天。

wasmedge --dir .:. --nn-preload default:GGML:AUTO:deepseek-llm-7b-chat.Q5_K_M.gguf llama-chat.wasm -p deepseek-chat --stream-stdout

可移植 Wasm 应用程序会自动利用我设备上的硬件加速器（如 GPU）。

在我M2 16G 内存的 Mac 上，速度约为每秒处理 14 个 token。

[You]: How can I download a Youtube video to my local device?

There are several ways to download a YouTube video to your local device, including using a browser extension, a dedicated YouTube downloader app, or a third-party software. Here are some steps to download a YouTube video using a browser extension:


1. Download and install a browser extension that allows you to download YouTube videos, such as Video DownloadHelper or SaveFrom.net.
2. Open the YouTube video you want to download in your web browser.
3. Install the browser extension and enable it.
4. Right-click on the video and select "Save Video As" or "Download" from the context menu.
5. Choose a location on your local device to save the video and select a format.
6. Wait for the download to complete.


Note that downloading YouTube videos may violate the site's terms of service, and some videos may be protected by copyright. Always make sure you have the necessary permissions and rights to download and use a video before doing so.

为 DeepSeek-LLM-7B-Chat 模型创建与 OpenAI 兼容的 API 服务

与 OpenAI 兼容的 Web API 允许该模型与大量 LLM 工具和代理框架（如 flows.network、LangChain 和 LlamaIndex）一起工作。

下载一个 API 服务器应用程序。它也是一个跨平台可移植的 Wasm 应用程序，可以在各种不同 CPU 和 GPU 设备上运行。

curl -LO https://github.com/second-state/llama-utils/raw/main/api-server/llama-api-server.wasm

然后，下载聊天机器人 Web UI，以通过聊天机器人 UI 与模型进行交互。

curl -LO https://github.com/second-state/chatbot-ui/releases/download/v0.1.0/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz

接下来，使用以下命令行启动模型的 API 服务器。然后，打开浏览器访问 http://localhost:8080 开始聊天！

wasmedge --dir .:. --nn-preload default:GGML:AUTO:deepseek-llm-7b-chat.Q5_K_M.gguf llama-api-server.wasm -p deepseek-chat

（该模型在中文方面表现出色，因此我们另外提了一个中文问题来测试该模型。）

还可以从另一个终端使用 curl 与 API 服务器交互。

  curl -X POST http://localhost:8080/v1/chat/completions \
    -H 'accept:application/json' \
    -H 'Content-Type: application/json' \
    -d '{"messages":[{"role":"system", "content": "You are a helpful assistant."}, {"role":"user", "content": "What's the capital of Paris"}], "model":"Deepseek-LLM-7B"}'

就这样。WasmEdge 是运行 LLM 应用程序最简单、最快速、最安全的方式。尝试一下吧！

加入 WasmEdge Discord 提问和分享见解。如果在运行这个模型时有任何问题，请访问 second-state/llama-utils 提 issue，或预约 demo。