chore: 添加虚拟环境到仓库

- 添加 backend_service/venv 虚拟环境
- 包含所有Python依赖包
- 注意:虚拟环境约393MB,包含12655个文件
This commit is contained in:
2025-12-03 10:19:25 +08:00
parent a6c2027caa
commit c4f851d387
12655 changed files with 3009376 additions and 0 deletions

View File

@@ -0,0 +1,56 @@
# Embedding Function Schemas
This directory contains JSON schemas for all embedding functions in Chroma. The purpose of having these schemas is to support cross-language compatibility and to validate that changes in one client library do not accidentally diverge from others.
## Schema Structure
Each schema follows the JSON Schema Draft-07 specification and includes:
- `version`: The version of the schema
- `title`: The title of the schema
- `description`: A description of the schema
- `properties`: The properties that can be configured for the embedding function
- `required`: The properties that are required for the embedding function
- `additionalProperties`: Whether additional properties are allowed (always set to `false` to ensure strict validation)
## Usage
These schemas are used by both the Python and JavaScript clients to validate embedding function configurations.
### Python
```python
from chromadb.utils.embedding_functions.schemas import validate_config
# Validate a configuration
config = {
"api_key_env_var": "CHROMA_OPENAI_API_KEY",
"model_name": "text-embedding-ada-002"
}
validate_config(config, "openai")
```
### JavaScript
```typescript
import { validateConfig } from '@chromadb/core';
// Validate a configuration
const config = {
api_key_env_var: "CHROMA_OPENAI_API_KEY",
model_name: "text-embedding-ada-002"
};
validateConfig(config, "openai");
```
## Adding New Schemas
To add a new schema:
1. Create a new JSON file in this directory with the name of the embedding function (e.g., `new_function.json`)
2. Define the schema following the JSON Schema Draft-07 specification
3. Update the embedding function implementations in both Python and JavaScript to use the schema for validation
## Schema Versioning
Each schema includes a version number to support future changes to embedding function configurations. When making changes to a schema, increment the version number to ensure backward compatibility.

View File

@@ -0,0 +1,27 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Amazon Bedrock Embedding Function Schema",
"description": "Schema for the Amazon Bedrock embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"session_args": {
"type": "object",
"description": "The arguments to pass to the boto3 session"
},
"model_name": {
"type": "string",
"description": "The name of the model to use for embeddings"
},
"kwargs": {
"type": "object",
"description": "Additional arguments to pass to the Amazon Bedrock client"
}
},
"required": [
"session_args",
"model_name",
"kwargs"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,26 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Embedding Function Base Schema",
"description": "Base schema for all embedding functions in Chroma",
"type": "object",
"properties": {
"version": {
"type": "string",
"description": "Schema version for the embedding function"
},
"name": {
"type": "string",
"description": "Name of the embedding function"
},
"config": {
"type": "object",
"description": "Configuration parameters for the embedding function"
}
},
"required": [
"version",
"name",
"config"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,52 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Chroma Cloud Qwen Embedding Function Schema",
"description": "Schema for the Chroma Cloud Qwen embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model": {
"type": "string",
"enum": ["Qwen/Qwen3-Embedding-0.6B"],
"description": "The specific Qwen model to use for embeddings"
},
"task": {
"type": "string",
"enum": ["nl_to_code"],
"description": "The task for which embeddings are being generated"
},
"instructions": {
"type": "object",
"description": "A mapping of tasks to instructions for targets (documents/queries)",
"properties": {
"nl_to_code": {
"type": "object",
"properties": {
"documents": {
"type": "string",
"description": "Instructions for embedding documents"
},
"query": {
"type": "string",
"description": "Instructions for embedding queries"
}
},
"required": ["documents", "query"],
"additionalProperties": false
}
},
"required": ["nl_to_code"],
"additionalProperties": false
},
"api_key_env_var": {
"type": "string",
"description": "Environment variable name that contains your API key for the Chroma Embedding API",
"default": "CHROMA_API_KEY"
}
},
"required": [
"model",
"task"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,26 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Chroma Cloud Splade Embedding Function Schema",
"description": "Schema for the Chroma Cloud Splade sparse embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model": {
"type": "string",
"enum": [
"prithivida/Splade_PP_en_v1"
],
"description": "The specific Splade model to use for sparse embeddings"
},
"api_key_env_var": {
"type": "string",
"description": "Environment variable name that contains your API key for the Chroma Embedding API",
"default": "CHROMA_API_KEY"
}
},
"required": [
"api_key_env_var",
"model"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,33 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Chroma BM25 Embedding Function Schema",
"description": "Schema for the Chroma BM25 sparse embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"k": {
"type": "number",
"description": "BM25 saturation parameter controlling term frequency scaling"
},
"b": {
"type": "number",
"description": "BM25 length normalization parameter"
},
"avg_doc_length": {
"type": "number",
"description": "Average document length in tokens used for normalization"
},
"token_max_length": {
"type": "number",
"description": "Maximum token length allowed before filtering"
},
"stopwords": {
"type": "array",
"description": "Optional custom stopword list (in lowercase) to override the defaults",
"items": {
"type": "string"
}
}
},
"additionalProperties": false
}

View File

@@ -0,0 +1,17 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Langchain Embedding Function Schema",
"description": "Schema for the langchain embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"embedding_function": {
"type": "string",
"description": "Parameter embedding_function for the langchain embedding function"
}
},
"required": [
"embedding_function"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,31 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Cloudflare Workers AI Embedding Function Schema",
"description": "Schema for the Cloudflare Workers AI embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model_name": {
"type": "string",
"description": "The name of the model to use for text embeddings"
},
"account_id": {
"type": "string",
"description": "The account ID for the Cloudflare Workers AI API"
},
"api_key_env_var": {
"type": "string",
"description": "The environment variable name that contains your API key for the Cloudflare Workers AI API"
},
"gateway_id": {
"type": "string",
"description": "The ID of the Cloudflare AI Gateway to use for a more customized solution"
}
},
"required": [
"api_key_env_var",
"model_name",
"account_id"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,22 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Cohere Embedding Function Schema",
"description": "Schema for the Cohere embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model_name": {
"type": "string",
"description": "The name of the model to use for text embeddings"
},
"api_key_env_var": {
"type": "string",
"description": "Environment variable name that contains your API key for the Cohere API"
}
},
"required": [
"api_key_env_var",
"model_name"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,10 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Default Embedding Function Schema",
"description": "Schema for the default embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {},
"required": [],
"additionalProperties": false
}

View File

@@ -0,0 +1,27 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Google Generative AI Embedding Function Schema",
"description": "Schema for the Google Generative AI embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model_name": {
"type": "string",
"description": "The name of the model to use for text embeddings"
},
"task_type": {
"type": "string",
"description": "The task type for the embeddings (e.g., RETRIEVAL_DOCUMENT)"
},
"api_key_env_var": {
"type": "string",
"description": "Environment variable name that contains your API key for the Google Generative AI API"
}
},
"required": [
"api_key_env_var",
"model_name",
"task_type"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,22 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Google PaLM Embedding Function Schema",
"description": "Schema for the Google PaLM embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model_name": {
"type": "string",
"description": "The name of the model to use for text embeddings"
},
"api_key_env_var": {
"type": "string",
"description": "Environment variable name that contains your API key for the Google PaLM API"
}
},
"required": [
"api_key_env_var",
"model_name"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,32 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Google Vertex Embedding Function Schema",
"description": "Schema for the Google Vertex embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model_name": {
"type": "string",
"description": "The name of the model to use for text embeddings"
},
"project_id": {
"type": "string",
"description": "The Google Cloud project ID"
},
"region": {
"type": "string",
"description": "The Google Cloud region"
},
"api_key_env_var": {
"type": "string",
"description": "Environment variable name that contains your API key for the Google Vertex API"
}
},
"required": [
"api_key_env_var",
"model_name",
"project_id",
"region"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,22 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "HuggingFace Embedding Function Schema",
"description": "Schema for the HuggingFace embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model_name": {
"type": "string",
"description": "The name of the model to use for text embeddings"
},
"api_key_env_var": {
"type": "string",
"description": "Environment variable name that contains your API key for the HuggingFace API"
}
},
"required": [
"api_key_env_var",
"model_name"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,21 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "HuggingFace Embedding Server Schema",
"description": "Schema for the HuggingFace embedding server configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "The URL of the HuggingFace Embedding Server"
},
"api_key_env_var": {
"type": "string",
"description": "The environment variable name that contains your API key for the HuggingFace API"
}
},
"required": [
"url"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,26 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Instructor Embedding Function Schema",
"description": "Schema for the instructor embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model_name": {
"type": "string",
"description": "Parameter model_name for the instructor embedding function"
},
"device": {
"type": "string",
"description": "Parameter device for the instructor embedding function"
},
"instruction": {
"type": "string",
"description": "Parameter instruction for the instructor embedding function"
}
},
"required": [
"model_name",
"device"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,46 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Jina Embedding Function Schema",
"description": "Schema for the jina embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model_name": {
"type": "string",
"description": "Parameter model_name for the jina embedding function"
},
"api_key_env_var": {
"type": "string",
"description": "Parameter api_key_env_var for the jina embedding function"
},
"task": {
"type": "string",
"description": "Parameter task for the jina embedding function"
},
"late_chunking": {
"type": "boolean",
"description": "Parameter late_chunking for the jina embedding function"
},
"truncate": {
"type": "boolean",
"description": "Parameter truncate for the jina embedding function"
},
"dimensions": {
"type": "integer",
"description": "Parameter dimensions for the jina embedding function"
},
"embedding_type": {
"type": "string",
"description": "Parameter embedding_type for the jina embedding function"
},
"normalized": {
"type": "boolean",
"description": "Parameter normalized for the jina embedding function"
}
},
"required": [
"api_key_env_var",
"model_name"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,22 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Mistral Embedding Function Schema",
"description": "Schema for the Mistral embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model": {
"type": "string",
"description": "Parameter model for the Mistral embedding function"
},
"api_key_env_var": {
"type": "string",
"description": "Parameter api_key_env_var for the Mistral embedding function"
}
},
"required": [
"api_key_env_var",
"model"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,36 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Morph Embedding Function Schema",
"description": "Schema for the Morph embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model_name": {
"type": "string",
"description": "The name of the model to use for embeddings"
},
"api_key_env_var": {
"type": "string",
"description": "Environment variable name that contains your API key for the Morph API"
},
"api_base": {
"type": [
"string",
"null"
],
"description": "The base URL for the Morph API"
},
"encoding_format": {
"type": [
"string",
"null"
],
"description": "The format for embeddings (float or base64)"
}
},
"required": [
"api_key_env_var",
"model_name"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,26 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Ollama Embedding Function Schema",
"description": "Schema for the Ollama embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "The URL of the Ollama server"
},
"model_name": {
"type": "string",
"description": "The name of the model to use for embeddings"
},
"timeout": {
"type": "integer",
"description": "Timeout in seconds for the API request"
}
},
"required": [
"url",
"model_name"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,18 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Onnx_mini_lm_l6_v2 Embedding Function Schema",
"description": "Schema for the onnx_mini_lm_l6_v2 embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"preferred_providers": {
"type": "array",
"items": {
"type": "string"
},
"description": "Parameter preferred_providers for the onnx_mini_lm_l6_v2 embedding function"
}
},
"required": [],
"additionalProperties": false
}

View File

@@ -0,0 +1,27 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Open_clip Embedding Function Schema",
"description": "Schema for the open_clip embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model_name": {
"type": "string",
"description": "Parameter model_name for the open_clip embedding function"
},
"checkpoint": {
"type": "string",
"description": "Parameter checkpoint for the open_clip embedding function"
},
"device": {
"type": "string",
"description": "Parameter device for the open_clip embedding function"
}
},
"required": [
"model_name",
"checkpoint",
"device"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,71 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "OpenAI Embedding Function Schema",
"description": "Schema for the OpenAI embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model_name": {
"type": "string",
"description": "The name of the model to use for text embeddings"
},
"organization_id": {
"type": [
"string",
"null"
],
"description": "The OpenAI organization ID if applicable"
},
"api_base": {
"type": [
"string",
"null"
],
"description": "The base path for the API"
},
"api_type": {
"type": [
"string",
"null"
],
"description": "The type of the API deployment"
},
"api_version": {
"type": [
"string",
"null"
],
"description": "The api version for the API"
},
"deployment_id": {
"type": [
"string",
"null"
],
"description": "Deployment ID for Azure OpenAI"
},
"default_headers": {
"type": [
"object",
"null"
],
"description": "A mapping of default headers to be sent with each API request"
},
"dimensions": {
"type": [
"integer",
"null"
],
"description": "The number of dimensions for the embeddings"
},
"api_key_env_var": {
"type": "string",
"description": "Environment variable name that contains your API key for the OpenAI API"
}
},
"required": [
"api_key_env_var",
"model_name"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,22 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Roboflow Embedding Function Schema",
"description": "Schema for the roboflow embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"api_url": {
"type": "string",
"description": "Parameter api_url for the roboflow embedding function"
},
"api_key_env_var": {
"type": "string",
"description": "Parameter api_key_env_var for the roboflow embedding function"
}
},
"required": [
"api_key_env_var",
"api_url"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,41 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "SentenceTransformer Embedding Function Schema",
"description": "Schema for the SentenceTransformer embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model_name": {
"type": "string",
"description": "Identifier of the SentenceTransformer model"
},
"device": {
"type": "string",
"description": "Device used for computation"
},
"normalize_embeddings": {
"type": "boolean",
"description": "Whether to normalize returned vectors"
},
"kwargs": {
"type": "object",
"description": "Additional arguments to pass to the SentenceTransformer model",
"additionalProperties": {
"type": [
"string",
"integer",
"number",
"boolean",
"array",
"object"
]
}
}
},
"required": [
"model_name",
"device",
"normalize_embeddings"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,17 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Text2vec Embedding Function Schema",
"description": "Schema for the text2vec embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model_name": {
"type": "string",
"description": "Parameter model_name for the text2vec embedding function"
}
},
"required": [
"model_name"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,22 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Together AI Embedding Function Schema",
"description": "Schema for the Together AI embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model_name": {
"type": "string",
"description": "The name of the model to use for text embeddings"
},
"api_key_env_var": {
"type": "string",
"description": "The environment variable name that contains your API key for the Together AI API"
}
},
"required": [
"api_key_env_var",
"model_name"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,27 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Transformers Embedding Function Schema",
"description": "Schema for the Transformers embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model": {
"type": "string",
"description": "Identifier of the SentenceTransformer model"
},
"revision": {
"type": "string",
"description": "Specific model version to use (can be a branch, tag name, or commit id)"
},
"quantized": {
"type": "boolean",
"description": "Whether to load the 8-bit quantized version of the model"
}
},
"required": [
"model",
"revision",
"quantized"
],
"additionalProperties": false
}

View File

@@ -0,0 +1,30 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Voyageai Embedding Function Schema",
"description": "Schema for the voyageai embedding function configuration",
"version": "1.0.0",
"type": "object",
"properties": {
"model_name": {
"type": "string",
"description": "Parameter model_name for the voyageai embedding function"
},
"api_key_env_var": {
"type": "string",
"description": "Parameter api_key_env_var for the voyageai embedding function"
},
"input_type": {
"type": "string",
"description": "Parameter input_type for the voyageai embedding function"
},
"truncation": {
"type": "boolean",
"description": "Parameter truncation for the voyageai embedding function"
}
},
"required": [
"api_key_env_var",
"model_name"
],
"additionalProperties": false
}