Llama Stack入门安装指南[结合Ollama]-AI大模型

介绍

Llama Stack 是一组标准化和有主见的接口，用于如何构建规范的工具链组件（微调、合成数据生成）和代理应用程序。我们希望这些接口能够在整个生态系统中得到采用，这将有助于更轻松地实现互操作性。

Llama Stack 定义并标准化了将生成式 AI 应用程序推向市场所需的构建模块。这些模块涵盖整个开发生命周期：从模型训练和微调，到产品评估，再到在生产中调用 AI 代理。除了定义之外，我们还在开发开源版本并与云提供商合作，确保开发人员能够使用跨平台的一致、互锁的组件来组装 AI 解决方案。最终目标是加速 AI 领域的创新。

安装

你可以使用以下命令将此存储库作为软件包进行安装pip install llama-stack

如果您想从源安装：

mkdir -p ~/local
cd ~/local
git clone git@github.com:meta-llama/llama-stack.git

conda create -n stack python=3.10
conda activate stack

cd llama-stack
$CONDA_PREFIX/bin/pip install -e .

入门

CLI工具llama可帮助您设置和使用 Llama 工具链和代理系统。安装软件包后，它应该在您的路径上可用llama-stack。

本指南可帮助您在不到 5 分钟的时间内快速开始构建和运行 Llama Stack 服务器！

您还可以查看此笔记本来尝试演示脚本。

快速备忘单

使用我们的元参考实现为所有具有conda构建类型的 API 端点快速构建和启动 LlamaStack 服务器。llama stack build
系统将提示您以交互方式输入构建信息。

llama stack build

> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): my-local-stack
> Enter the image type you want your distribution to be built with (docker or conda): conda

 Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs.
> Enter the API provider for the inference API: (default=meta-reference): meta-reference
> Enter the API provider for the safety API: (default=meta-reference): meta-reference
> Enter the API provider for the agents API: (default=meta-reference): meta-reference
> Enter the API provider for the memory API: (default=meta-reference): meta-reference
> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference

 > (Optional) Enter a short description for your Llama Stack distribution:

Build spec configuration saved at ~/.conda/envs/llamastack-my-local-stack/my-local-stack-build.yaml
You can now run `llama stack configure my-local-stack`

llama stack configure

llama stack configure <name>使用您在步骤中先前定义的名称运行build。

llama stack configure <name>

系统将提示您输入 Llama Stack 的配置

$ llama stack configure my-local-stack

Could not find my-local-stack. Trying conda build name instead...
Configuring API `inference`...
=== Configuring provider `meta-reference` for API inference...
Enter value for model (default: Llama3.1-8B-Instruct) (required):
Do you want to configure quantization? (y/n): n
Enter value for torch_seed (optional):
Enter value for max_seq_len (default: 4096) (required):
Enter value for max_batch_size (default: 1) (required):

Configuring API `safety`...
=== Configuring provider `meta-reference` for API safety...
Do you want to configure llama_guard_shield? (y/n): n
Do you want to configure prompt_guard_shield? (y/n): n

Configuring API `agents`...
=== Configuring provider `meta-reference` for API agents...
Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite):

Configuring SqliteKVStoreConfig:
Enter value for namespace (optional):
Enter value for db_path (default: /home/xiyan/.llama/runtime/kvstore.db) (required):

Configuring API `memory`...
=== Configuring provider `meta-reference` for API memory...
> Please enter the supported memory bank type your provider has for memory: vector

Configuring API `telemetry`...
=== Configuring provider `meta-reference` for API telemetry...

> YAML configuration has been written to ~/.llama/builds/conda/my-local-stack-run.yaml.
You can now run `llama stack run my-local-stack --port PORT`

llama stack run

llama stack run <name>使用您之前定义的名称运行。

llama stack run my-local-stack

...
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
...
Finished model load YES READY
Serving POST /inference/chat_completion
Serving POST /inference/completion
Serving POST /inference/embeddings
Serving POST /memory_banks/create
Serving DELETE /memory_bank/documents/delete
Serving DELETE /memory_banks/drop
Serving GET /memory_bank/documents/get
Serving GET /memory_banks/get
Serving POST /memory_bank/insert
Serving GET /memory_banks/list
Serving POST /memory_bank/query
Serving POST /memory_bank/update
Serving POST /safety/run_shield
Serving POST /agentic_system/create
Serving POST /agentic_system/session/create
Serving POST /agentic_system/turn/create
Serving POST /agentic_system/delete
Serving POST /agentic_system/session/delete
Serving POST /agentic_system/session/get
Serving POST /agentic_system/step/get
Serving POST /agentic_system/turn/get
Serving GET /telemetry/get_trace
Serving POST /telemetry/log_event
Listening on :::5000
INFO:     Started server process [587053]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)

步骤 1. 构建

在以下步骤中，假设我们将使用一个Meta-Llama3.1-8B-Instruct模型。我们将命名我们的构建8b-instruct以帮助我们记住配置。我们将开始构建我们的发行版（以 Conda 环境或 Docker 映像的形式）。在此步骤中，我们将指定：

name：我们的发行版的名称（例如8b-instruct）
image_type：我们的构建图像类型（conda | docker）
distribution_spec：我们用于指定 API 提供商的分发规范
- description：分发配置的简短描述
- providers：指定为每个 API 端点提供服务的底层实现
- image_type: conda|docker指定是否以 Docker 镜像或 Conda 环境的形式构建分发。

在构建命令的末尾，我们将生成存储<name>-build.yaml构建配置的文件。

此步骤完成后，<name>-build.yaml将生成一个名为的文件，并保存在命令末尾指定的输出文件路径下。

从头开始构建

对于新用户，我们可以从运行开始llama stack build，这将允许您以交互方式输入向导，系统将提示您输入构建配置。

llama stack build

运行上述命令将允许您填写配置以构建您的 Llama Stack 发行版，您将看到以下输出。

> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): 8b-instruct
> Enter the image type you want your distribution to be built with (docker or conda): conda

 Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs.
> Enter the API provider for the inference API: (default=meta-reference): meta-reference
> Enter the API provider for the safety API: (default=meta-reference): meta-reference
> Enter the API provider for the agents API: (default=meta-reference): meta-reference
> Enter the API provider for the memory API: (default=meta-reference): meta-reference
> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference

 > (Optional) Enter a short description for your Llama Stack distribution:

Build spec configuration saved at ~/.conda/envs/llamastack-my-local-llama-stack/8b-instruct-build.yaml

Ollama（可选）

如果您计划使用 Ollama 进行推理，则需要按照这些说明安装服务器。

从模板构建

为了从替代 API 提供商进行构建，我们为用户提供了分发模板，以开始构建由不同提供商支持的分发。

以下命令将允许您查看可用的模板及其相应的提供程序。

llama stack build --list-templates

然后，您可以选择一个模板来根据自己的喜好使用提供商来构建您的发行版。

llama stack build --template local-tgi --name my-tgi-stack

$ llama stack build --template local-tgi --name my-tgi-stack
...
...
Build spec configuration saved at ~/.conda/envs/llamastack-my-tgi-stack/my-tgi-stack-build.yaml
You may now run `llama stack configure my-tgi-stack` or `llama stack configure ~/.conda/envs/llamastack-my-tgi-stack/my-tgi-stack-build.yaml`

从配置文件构建

除了模板之外，您还可以通过编辑配置文件来根据自己的喜好定制构建，并使用以下命令从配置文件进行构建。
配置文件的内容将类似于中的内容llama_stack/distributions/templates/。

$ cat llama_stack/distribution/templates/local-ollama-build.yaml

name: local-ollama
distribution_spec:
  description: Like local, but use ollama for running LLM inference
  providers:
    inference: remote::ollama
    memory: meta-reference
    safety: meta-reference
    agents: meta-reference
    telemetry: meta-reference
image_type: conda

llama stack build --config llama_stack/distribution/templates/local-ollama-build.yaml

如何使用 Docker 镜像构建发行版

要构建 docker 镜像，您可以从模板开始并使用标志--image-type docker指定docker构建镜像类型。

llama stack build --template local --image-type docker --name docker-0

或者，您可以使用配置文件并在我们的文件中设置image_type为，然后运行。将包含以下内容：docker<name>-build.yamlllama stack build <name>-build.yaml<name>-build.yaml

name: local-docker-example
distribution_spec:
  description: Use code from `llama_stack` itself to serve all llama stack APIs
  docker_image: null
  providers:
    inference: meta-reference
    memory: meta-reference-faiss
    safety: meta-reference
    agentic_system: meta-reference
    telemetry: console
image_type: docker

以下命令允许您构建一个名为的 Docker 映像<name>

llama stack build --config <name>-build.yaml

Dockerfile created successfully in /tmp/tmp.I0ifS2c46A/DockerfileFROM python:3.10-slim
WORKDIR /app
...
...
You can run it with: podman run -p 8000:8000 llamastack-docker-local
Build spec configuration saved at ~/.llama/distributions/docker/docker-local-build.yaml

步骤2.配置

在我们的发行版构建完成后（以 docker 或 conda 环境的形式），我们将运行以下命令来

llama stack configure [ <name> | <docker-image-name> | <path/to/name.build.yaml>]

对于conda环境：<path/to/name.build.yaml> 将是从步骤 1 保存的生成的构建规范。
对于docker从 Dockerhub 下载的图像，您也可以用作参数。
运行docker images以检查机器上可用的图像列表。

$ llama stack configure 8b-instruct

Configuring API: inference (meta-reference)
Enter value for model (existing: Meta-Llama3.1-8B-Instruct) (required):
Enter value for quantization (optional):
Enter value for torch_seed (optional):
Enter value for max_seq_len (existing: 4096) (required):
Enter value for max_batch_size (existing: 1) (required):

Configuring API: memory (meta-reference-faiss)

Configuring API: safety (meta-reference)
Do you want to configure llama_guard_shield? (y/n): y
Entering sub-configuration for llama_guard_shield:
Enter value for model (default: Llama-Guard-3-8B) (required):
Enter value for excluded_categories (default: []) (required):
Enter value for disable_input_check (default: False) (required):
Enter value for disable_output_check (default: False) (required):
Do you want to configure prompt_guard_shield? (y/n): y
Entering sub-configuration for prompt_guard_shield:
Enter value for model (default: Prompt-Guard-86M) (required):

Configuring API: agentic_system (meta-reference)
Enter value for brave_search_api_key (optional):
Enter value for bing_search_api_key (optional):
Enter value for wolfram_api_key (optional):

Configuring API: telemetry (console)

YAML configuration has been written to ~/.llama/builds/conda/8b-instruct-run.yaml

此步骤成功后，您应该能够找到包含~/.llama/builds/conda/8b-instruct-run.yaml以下内容的运行配置规范。您可以编辑此文件以更改设置。

可以看到，上面我们做了基本的配置，配置了：

在模型上运行推理Meta-Llama3.1-8B-Instruct（从中获得llama model list）
Llama Guard 安全盾牌（带模型）Llama-Guard-3-8B
Prompt Guard 安全防护罩带模型Prompt-Guard-86M

关于这些配置如何存储为 yaml，请查看配置末尾打印的文件。

请注意，所有配置和模型都存储在~/.llama

步骤3.运行

现在，让我们启动 Llama Stack 分发服务器。您将需要本步骤最后写出的 YAML 配置文件llama stack configure。

llama stack run 8b-instruct

您应该看到 Llama Stack 服务器启动并打印其支持的 API

$ llama stack run 8b-instruct

> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 19.28 seconds
NCCL version 2.20.5+cuda12.4
Finished model load YES READY
Serving POST /inference/batch_chat_completion
Serving POST /inference/batch_completion
Serving POST /inference/chat_completion
Serving POST /inference/completion
Serving POST /safety/run_shield
Serving POST /agentic_system/memory_bank/attach
Serving POST /agentic_system/create
Serving POST /agentic_system/session/create
Serving POST /agentic_system/turn/create
Serving POST /agentic_system/delete
Serving POST /agentic_system/session/delete
Serving POST /agentic_system/memory_bank/detach
Serving POST /agentic_system/session/get
Serving POST /agentic_system/step/get
Serving POST /agentic_system/turn/get
Listening on :::5000
INFO:     Started server process [453333]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)

配置已完成~/.llama/builds/local/conda/8b-instruct-run.yaml。请随意增加max_seq_len。

“本地”分布推理服务器目前仅支持 CUDA。它不适用于 Apple Silicon 机器。

您可能需要使用该标志--disable-ipv6来禁用 IPv6 支持

该服务器正在本地运行 Llama 模型。

步骤 4. 使用客户端进行测试

一旦服务器设置好，我们就可以用客户端进行测试以查看示例输出。

cd /path/to/llama-stack
conda activate <env>  # any environment containing the llama-stack pip package will work

python -m llama_stack.apis.inference.client localhost 5000

这将运行聊天完成客户端并查询分发的 /inference/chat_completion API。

以下是示例输出：

User>hello world, write me a 2 sentence poem about the moon
Assistant> Here's a 2-sentence poem about the moon:

The moon glows softly in the midnight sky,
A beacon of wonder, as it passes by.

类似地，您可以通过以下方式测试安全性（如果您配置了 llama-guard 和/或 prompt-guard 防护罩）：

python -m llama_stack.apis.safety.client localhost 5000

您可以在我们的llama-stack-apps仓库中找到更多带有客户端 SDK 的示例脚本，以便与 Llama Stack 服务器通信。

Llama Stack入门安装指南[结合Ollama]-AI大模型

Llama 3.2介绍最全技术报告-AI大模型

Llama3.2最新90b 11b 3b 1b模型介绍-AI大模型

小远

Related Posts

Llama3.2最新90b 11b 3b 1b模型介绍-AI大模型

Llama 3.2介绍最全技术报告-AI大模型

Reflection Llama-3.1 70B：测试和已知内容的总结

Llama3.2最新90b 11b 3b 1b模型介绍-AI大模型

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs[不要过度思考2+3等于几在类LLM的过度思考上]-AI论文

Slow Perception: Let’s Perceive Geometric Figures Step-by-step[缓慢感知：让我们逐步感知几何图形]-AI论文

Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning[结合大型语言模型与过程奖励引导的树搜索以提升复杂推理能力]-AI论文

Large Concept Models:Language Modeling in a Sentence Representation Space[大型概念模型：在句子表示空间中的语言建模]-AI论文

Claude大模型学习社区

分类

Welcome Back!

Create New Account!

Retrieve your password

Llama Stack入门安装指南[结合Ollama]-AI大模型

介绍

安装

入门

快速备忘单

步骤 1. 构建

从头开始构建

从模板构建

从配置文件构建

如何使用 Docker 镜像构建发行版

步骤2.配置

步骤3.运行

步骤 4. 使用客户端进行测试

Llama 3.2介绍最全技术报告-AI大模型

Llama3.2最新90b 11b 3b 1b模型介绍-AI大模型

Related Posts

Claude大模型学习社区

分类

标签

Welcome Back!

Create New Account!

Retrieve your password