当前TEI框架只支持在x86-64/aarch64架构上运行。
# 以bge-large-zh为例,可自行修改示例代码中的模型名称与路径,注意将编译后的pt文件保存在模型权重的第一级子目录(/home/data/embedding_models/bge-large-zh-v1.5)下 python compile.py bge-large-zh
# 对于ARM 64位CPU为aarch64,对于X86 64位CPU可将下面指令的aarch64替换为x86_64 wget https://static.rust-lang.org/dist/rust-1.81.0-aarch64-unknown-linux-gnu.tar.gz --no-check-certificate tar -xvf rust-1.81.0-aarch64-unknown-linux-gnu.tar.gz cd rust-1.81.0-aarch64-unknown-linux-gnu bash install.sh sudo apt update apt install pkg-config
首先在命令行运行Python,通过torch.__file__的路径确认protoc所在目录,以Python 3.10.2为例:
Python 3.10.2 (main, Sep 23 2024, 10:52:24) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> torch.__file__ '/usr/local/python3.10.2/lib/python3.10/site-packages/torch/__init__.py'
# Cargo编译出的可执行文件目录 export PATH=$PATH:~/.cargo/bin/ # protoc所在目录 export PATH=/usr/local/python3.10.2/lib/python3.10/site-packages/torch/bin:$PATH
cd ./text-embeddings-inference cargo install --path router -F python -F http --no-default-features cd ./backends/python/server make install
# 设置TEI运行显卡编号 export TEI_NPU_DEVICE=0 # 模型权重路径或在Huggingface代码仓中的位置 model_path_embedding=/home/data/models/bge-large-zh-v1.5 model_path_reranker=/home/data/models/bge-reranker-large # 以下启动参数与原生TEI一致 # Embedding模型 text-embeddings-router --model-id $model_path_embedding --dtype float16 --pooling cls --max-concurrent-requests 2048 --max-batch-requests 2048 --max-batch-tokens 1100000 --max-client-batch-size 256 --port 12347 # Reranker模型 text-embeddings-router --model-id $model_path_reranker --dtype float16 --max-client-batch-size 192 --max-concurrent-requests 2048 --max-batch-tokens 163840 --max-batch-requests 128 --port 8080
# Embed接口 curl 127.0.0.1:12347/embed \ -X POST \ -d '{"inputs":"What is Deep Learning?"}' \ -H 'Content-Type: application/json' # Embed_all接口 curl 127.0.0.1:12347/embed_all \ -X POST \ -d '{"inputs":"What is Deep Learning?"}' \ -H 'Content-Type: application/json' # Rerank接口 curl 127.0.0.1:12347/rerank \ -X POST \ -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is a sub-filed of Machine Learning.", "Deep learning is a country."]}' \ -H 'Content-Type: application/json'