feat: 完善批量可视化测试验证逻辑（仅提交业务代码）

2025-12-09 13:20:39 +08:00
129 changed files with 21380 additions and 1984 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,19 @@
+# 图片/二进制大文件
+*.png
+*.jpg
+*.jpeg
+
+# 数据库文件
+*.sqlite3
+*.db
+
+# 日志/缓存
+logs/
+*.log
+__pycache__/
+*.pyc
+*.pid
+
+# 测试临时文件
+tools/api_test*.log
+tools/test_validate/validation_*/
--- a/backend_service/generated_visualizations/py_tree.png
+++ b/backend_service/generated_visualizations/py_tree.png
--- a/backend_service/src/pycache/init.cpython-310.pyc
+++ b/backend_service/src/pycache/init.cpython-310.pyc
--- a/backend_service/src/pycache/main.cpython-310.pyc
+++ b/backend_service/src/pycache/main.cpython-310.pyc
--- a/backend_service/src/pycache/models.cpython-310.pyc
+++ b/backend_service/src/pycache/models.cpython-310.pyc
--- a/backend_service/src/pycache/py_tree_generator.cpython-310.pyc
+++ b/backend_service/src/pycache/py_tree_generator.cpython-310.pyc
--- a/backend_service/src/pycache/websocket_manager.cpython-310.pyc
+++ b/backend_service/src/pycache/websocket_manager.cpython-310.pyc
--- a/logs/embedding_model.log
+++ b/logs/embedding_model.log
@@ -1,18 +1,19 @@
 ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
 ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
 ggml_cuda_init: found 1 CUDA devices:
-  Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
-build: 6097 (9515c613) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
-system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
+  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
+main: setting n_parallel = 4 and kv_unified = true (add -kvu to disable this)
+build: 7212 (ff90508d6) with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu
+system info: n_threads = 8, n_threads_batch = 8, total_threads = 32

-system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CUDA : ARCHS = 500,610,700,750,800,860,890 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 
+system_info: n_threads = 8 (n_threads_batch = 8) / 32 | CUDA : ARCHS = 890 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 

-main: binding port with default address family
-main: HTTP server is listening, hostname: 0.0.0.0, port: 8090, http threads: 15
+init: using 31 threads for HTTP server
+start: binding port with default address family
 main: loading model
-srv    load_model: loading model '/home/huangfukk/models/gguf/Qwen/Qwen3-Embedding-4B/Qwen3-Embedding-4B-Q4_K_M.gguf'
-llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4060 Ti) - 15225 MiB free
-llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from /home/huangfukk/models/gguf/Qwen/Qwen3-Embedding-4B/Qwen3-Embedding-4B-Q4_K_M.gguf (version GGUF V3 (latest))
+srv    load_model: loading model '/home/iscas/models/Qwen3-Embedding-4B-Q5_K_M.gguf'
+llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 5090) (0000:01:00.0) - 29667 MiB free
+llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from /home/iscas/models/Qwen3-Embedding-4B-Q5_K_M.gguf (version GGUF V3 (latest))
 llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
 llama_model_loader: - kv   0:                       general.architecture str              = qwen3
 llama_model_loader: - kv   1:                               general.type str              = model
@@ -49,13 +50,13 @@ llama_model_loader: - kv  31:               tokenizer.ggml.add_eos_token bool
 llama_model_loader: - kv  32:               tokenizer.ggml.add_bos_token bool             = false
 llama_model_loader: - kv  33:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
 llama_model_loader: - kv  34:               general.quantization_version u32              = 2
-llama_model_loader: - kv  35:                          general.file_type u32              = 15
+llama_model_loader: - kv  35:                          general.file_type u32              = 17
 llama_model_loader: - type  f32:  145 tensors
-llama_model_loader: - type q4_K:  216 tensors
+llama_model_loader: - type q5_K:  216 tensors
 llama_model_loader: - type q6_K:   37 tensors
 print_info: file format = GGUF V3 (latest)
-print_info: file type   = Q4_K - Medium
-print_info: file size   = 2.32 GiB (4.95 BPW) 
+print_info: file type   = Q5_K - Medium
+print_info: file size   = 2.68 GiB (5.73 BPW) 
 load: printing all EOG tokens:
 load:   - 151643 ('<|endoftext|>')
 load:   - 151645 ('<|im_end|>')
@@ -68,6 +69,7 @@ print_info: arch             = qwen3
 print_info: vocab_only       = 0
 print_info: n_ctx_train      = 40960
 print_info: n_embd           = 2560
+print_info: n_embd_inp       = 2560
 print_info: n_layer          = 36
 print_info: n_head           = 32
 print_info: n_head_kv        = 8
@@ -88,6 +90,8 @@ print_info: f_attn_scale     = 0.0e+00
 print_info: n_ff             = 9728
 print_info: n_expert         = 0
 print_info: n_expert_used    = 0
+print_info: n_expert_groups  = 0
+print_info: n_group_used     = 0
 print_info: causal attn      = 1
 print_info: pooling type     = 3
 print_info: rope type        = 2
@@ -122,27 +126,28 @@ print_info: max token length = 256
 load_tensors: loading model tensors, this can take a while... (mmap = true)
 load_tensors: offloading 36 repeating layers to GPU
 load_tensors: offloaded 36/37 layers to GPU
-load_tensors:        CUDA0 model buffer size =  2071.62 MiB
 load_tensors:   CPU_Mapped model buffer size =   303.75 MiB
-.........................................................................................
+load_tensors:        CUDA0 model buffer size =  2445.68 MiB
+..........................................................................................
 llama_context: constructing llama_context
-llama_context: n_seq_max     = 1
+llama_context: n_seq_max     = 4
 llama_context: n_ctx         = 4096
-llama_context: n_ctx_per_seq = 4096
+llama_context: n_ctx_seq     = 4096
 llama_context: n_batch       = 2048
 llama_context: n_ubatch      = 512
 llama_context: causal_attn   = 1
-llama_context: flash_attn    = 0
-llama_context: kv_unified    = false
+llama_context: flash_attn    = auto
+llama_context: kv_unified    = true
 llama_context: freq_base     = 1000000.0
 llama_context: freq_scale    = 1
-llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized
-llama_context:        CPU  output buffer size =     0.59 MiB
-llama_kv_cache_unified:      CUDA0 KV buffer size =   576.00 MiB
-llama_kv_cache_unified: size =  576.00 MiB (  4096 cells,  36 layers,  1/1 seqs), K (f16):  288.00 MiB, V (f16):  288.00 MiB
+llama_context: n_ctx_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized
+llama_context:        CPU  output buffer size =     2.35 MiB
+llama_kv_cache:      CUDA0 KV buffer size =   576.00 MiB
+llama_kv_cache: size =  576.00 MiB (  4096 cells,  36 layers,  4/1 seqs), K (f16):  288.00 MiB, V (f16):  288.00 MiB
+llama_context: Flash Attention was auto, set to enabled
 llama_context:      CUDA0 compute buffer size =   604.96 MiB
-llama_context:  CUDA_Host compute buffer size =    17.01 MiB
-llama_context: graph nodes  = 1411
+llama_context:  CUDA_Host compute buffer size =    13.01 MiB
+llama_context: graph nodes  = 1268
 llama_context: graph splits = 4 (with bs=512), 3 (with bs=1)
 common_init_from_params: added <|endoftext|> logit bias = -inf
 common_init_from_params: added <|im_end|> logit bias = -inf
@@ -151,10 +156,16 @@ common_init_from_params: added <|repo_name|> logit bias = -inf
 common_init_from_params: added <|file_sep|> logit bias = -inf
 common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
 common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
-srv          init: initializing slots, n_slots = 1
-slot         init: id  0 | task -1 | new slot n_ctx_slot = 4096
-main: model loaded
-main: chat template, chat_template: {%- if tools %}
+srv          init: initializing slots, n_slots = 4
+slot         init: id  0 | task -1 | new slot, n_ctx = 4096
+slot         init: id  1 | task -1 | new slot, n_ctx = 4096
+slot         init: id  2 | task -1 | new slot, n_ctx = 4096
+slot         init: id  3 | task -1 | new slot, n_ctx = 4096
+srv          init: prompt cache is enabled, size limit: 8192 MiB
+srv          init: use `--cache-ram 0` to disable the prompt cache
+srv          init: for more info see https://github.com/ggml-org/llama.cpp/pull/16391
+srv          init: thinking = 0
+init: chat template, chat_template: {%- if tools %}
    {{- '<|im_start|>system\n' }}
    {%- if messages[0]['role'] == 'system' %}
        {{- messages[0]['content'] }}
@@ -218,142 +229,148 @@ Hi there<|im_end|>
 How are you?<|im_end|>
 <|im_start|>assistant
 '
-main: server is listening on http://0.0.0.0:8090 - starting the main loop
+main: model loaded
+main: server is listening on http://0.0.0.0:8090
+main: starting the main loop...
 srv  update_slots: all slots are idle
 srv  log_server_r: request: GET /health 127.0.0.1 200
-slot launch_slot_: id  0 | task 0 | processing task
-slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 2
-slot update_slots: id  0 | task 0 | kv cache rm [0, end)
-slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 2, n_tokens = 2, progress = 1.000000
-slot update_slots: id  0 | task 0 | prompt done, n_past = 2, n_tokens = 2
-slot      release: id  0 | task 0 | stop processing: n_past = 2, truncated = 0
+slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = -1
+slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
+slot launch_slot_: id  3 | task 0 | processing task
+slot update_slots: id  3 | task 0 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 2
+slot update_slots: id  3 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
+slot update_slots: id  3 | task 0 | prompt processing progress, n_tokens = 2, batch.n_tokens = 2, progress = 1.000000
+slot update_slots: id  3 | task 0 | prompt done, n_tokens = 2, batch.n_tokens = 2
+slot      release: id  3 | task 0 | stop processing: n_tokens = 2, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 2 | processing task
-slot update_slots: id  0 | task 2 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 9
-slot update_slots: id  0 | task 2 | kv cache rm [1, end)
-slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 9, n_tokens = 8, progress = 0.888889
-slot update_slots: id  0 | task 2 | prompt done, n_past = 9, n_tokens = 8
-slot      release: id  0 | task 2 | stop processing: n_past = 9, truncated = 0
+slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 0.111 (> 0.100 thold), f_keep = 0.500
+slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
+slot launch_slot_: id  3 | task 2 | processing task
+slot update_slots: id  3 | task 2 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 9
+slot update_slots: id  3 | task 2 | n_tokens = 1, memory_seq_rm [1, end)
+slot update_slots: id  3 | task 2 | prompt processing progress, n_tokens = 9, batch.n_tokens = 8, progress = 1.000000
+slot update_slots: id  3 | task 2 | prompt done, n_tokens = 9, batch.n_tokens = 8
+slot      release: id  3 | task 2 | stop processing: n_tokens = 9, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 4 | processing task
-slot update_slots: id  0 | task 4 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 12
-slot update_slots: id  0 | task 4 | kv cache rm [7, end)
-slot update_slots: id  0 | task 4 | prompt processing progress, n_past = 12, n_tokens = 5, progress = 0.416667
-slot update_slots: id  0 | task 4 | prompt done, n_past = 12, n_tokens = 5
-slot      release: id  0 | task 4 | stop processing: n_past = 12, truncated = 0
+slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 0.583 (> 0.100 thold), f_keep = 0.778
+slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
+slot launch_slot_: id  3 | task 4 | processing task
+slot update_slots: id  3 | task 4 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 12
+slot update_slots: id  3 | task 4 | n_tokens = 7, memory_seq_rm [7, end)
+slot update_slots: id  3 | task 4 | prompt processing progress, n_tokens = 12, batch.n_tokens = 5, progress = 1.000000
+slot update_slots: id  3 | task 4 | prompt done, n_tokens = 12, batch.n_tokens = 5
+slot      release: id  3 | task 4 | stop processing: n_tokens = 12, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 6 | processing task
-slot update_slots: id  0 | task 6 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 2
-slot update_slots: id  0 | task 6 | kv cache rm [1, end)
-slot update_slots: id  0 | task 6 | prompt processing progress, n_past = 2, n_tokens = 1, progress = 0.500000
-slot update_slots: id  0 | task 6 | prompt done, n_past = 2, n_tokens = 1
-slot      release: id  0 | task 6 | stop processing: n_past = 2, truncated = 0
+slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 0.583 (> 0.100 thold), f_keep = 0.583
+slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
+slot launch_slot_: id  3 | task 6 | processing task
+slot update_slots: id  3 | task 6 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 12
+slot update_slots: id  3 | task 6 | n_tokens = 7, memory_seq_rm [7, end)
+slot update_slots: id  3 | task 6 | prompt processing progress, n_tokens = 12, batch.n_tokens = 5, progress = 1.000000
+slot update_slots: id  3 | task 6 | prompt done, n_tokens = 12, batch.n_tokens = 5
+slot      release: id  3 | task 6 | stop processing: n_tokens = 12, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 8 | processing task
-slot update_slots: id  0 | task 8 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 9
-slot update_slots: id  0 | task 8 | kv cache rm [1, end)
-slot update_slots: id  0 | task 8 | prompt processing progress, n_past = 9, n_tokens = 8, progress = 0.888889
-slot update_slots: id  0 | task 8 | prompt done, n_past = 9, n_tokens = 8
-slot      release: id  0 | task 8 | stop processing: n_past = 9, truncated = 0
+slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 0.583 (> 0.100 thold), f_keep = 0.583
+slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
+slot launch_slot_: id  3 | task 8 | processing task
+slot update_slots: id  3 | task 8 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 12
+slot update_slots: id  3 | task 8 | n_tokens = 7, memory_seq_rm [7, end)
+slot update_slots: id  3 | task 8 | prompt processing progress, n_tokens = 12, batch.n_tokens = 5, progress = 1.000000
+slot update_slots: id  3 | task 8 | prompt done, n_tokens = 12, batch.n_tokens = 5
+slot      release: id  3 | task 8 | stop processing: n_tokens = 12, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 10 | processing task
-slot update_slots: id  0 | task 10 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 2
-slot update_slots: id  0 | task 10 | kv cache rm [1, end)
-slot update_slots: id  0 | task 10 | prompt processing progress, n_past = 2, n_tokens = 1, progress = 0.500000
-slot update_slots: id  0 | task 10 | prompt done, n_past = 2, n_tokens = 1
-slot      release: id  0 | task 10 | stop processing: n_past = 2, truncated = 0
+slot get_availabl: id  2 | task -1 | selected slot by LRU, t_last = -1
+slot launch_slot_: id  2 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
+slot launch_slot_: id  2 | task 10 | processing task
+slot update_slots: id  2 | task 10 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 10
+slot update_slots: id  2 | task 10 | n_tokens = 0, memory_seq_rm [0, end)
+slot update_slots: id  2 | task 10 | prompt processing progress, n_tokens = 10, batch.n_tokens = 10, progress = 1.000000
+slot update_slots: id  2 | task 10 | prompt done, n_tokens = 10, batch.n_tokens = 10
+slot      release: id  2 | task 10 | stop processing: n_tokens = 10, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 12 | processing task
-slot update_slots: id  0 | task 12 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 9
-slot update_slots: id  0 | task 12 | kv cache rm [1, end)
-slot update_slots: id  0 | task 12 | prompt processing progress, n_past = 9, n_tokens = 8, progress = 0.888889
-slot update_slots: id  0 | task 12 | prompt done, n_past = 9, n_tokens = 8
-slot      release: id  0 | task 12 | stop processing: n_past = 9, truncated = 0
+slot get_availabl: id  2 | task -1 | selected slot by LCP similarity, sim_best = 0.583 (> 0.100 thold), f_keep = 0.700
+slot launch_slot_: id  2 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
+slot launch_slot_: id  2 | task 12 | processing task
+slot update_slots: id  2 | task 12 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 12
+slot update_slots: id  2 | task 12 | n_tokens = 7, memory_seq_rm [7, end)
+slot update_slots: id  2 | task 12 | prompt processing progress, n_tokens = 12, batch.n_tokens = 5, progress = 1.000000
+slot update_slots: id  2 | task 12 | prompt done, n_tokens = 12, batch.n_tokens = 5
+slot      release: id  2 | task 12 | stop processing: n_tokens = 12, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 14 | processing task
-slot update_slots: id  0 | task 14 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 12
-slot update_slots: id  0 | task 14 | kv cache rm [7, end)
-slot update_slots: id  0 | task 14 | prompt processing progress, n_past = 12, n_tokens = 5, progress = 0.416667
-slot update_slots: id  0 | task 14 | prompt done, n_past = 12, n_tokens = 5
-slot      release: id  0 | task 14 | stop processing: n_past = 12, truncated = 0
+slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 0.500 (> 0.100 thold), f_keep = 0.583
+slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
+slot launch_slot_: id  3 | task 14 | processing task
+slot update_slots: id  3 | task 14 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 14
+slot update_slots: id  3 | task 14 | n_tokens = 7, memory_seq_rm [7, end)
+slot update_slots: id  3 | task 14 | prompt processing progress, n_tokens = 14, batch.n_tokens = 7, progress = 1.000000
+slot update_slots: id  3 | task 14 | prompt done, n_tokens = 14, batch.n_tokens = 7
+slot      release: id  3 | task 14 | stop processing: n_tokens = 14, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 16 | processing task
-slot update_slots: id  0 | task 16 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 12
-slot update_slots: id  0 | task 16 | kv cache rm [7, end)
-slot update_slots: id  0 | task 16 | prompt processing progress, n_past = 12, n_tokens = 5, progress = 0.416667
-slot update_slots: id  0 | task 16 | prompt done, n_past = 12, n_tokens = 5
-slot      release: id  0 | task 16 | stop processing: n_past = 12, truncated = 0
+slot get_availabl: id  1 | task -1 | selected slot by LRU, t_last = -1
+slot launch_slot_: id  1 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
+slot launch_slot_: id  1 | task 16 | processing task
+slot update_slots: id  1 | task 16 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 17
+slot update_slots: id  1 | task 16 | n_tokens = 0, memory_seq_rm [0, end)
+slot update_slots: id  1 | task 16 | prompt processing progress, n_tokens = 17, batch.n_tokens = 17, progress = 1.000000
+slot update_slots: id  1 | task 16 | prompt done, n_tokens = 17, batch.n_tokens = 17
+slot      release: id  1 | task 16 | stop processing: n_tokens = 17, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 18 | processing task
-slot update_slots: id  0 | task 18 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 12
-slot update_slots: id  0 | task 18 | kv cache rm [7, end)
-slot update_slots: id  0 | task 18 | prompt processing progress, n_past = 12, n_tokens = 5, progress = 0.416667
-slot update_slots: id  0 | task 18 | prompt done, n_past = 12, n_tokens = 5
-slot      release: id  0 | task 18 | stop processing: n_past = 12, truncated = 0
+slot get_availabl: id  2 | task -1 | selected slot by LCP similarity, sim_best = 0.333 (> 0.100 thold), f_keep = 0.417
+slot launch_slot_: id  2 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
+slot launch_slot_: id  2 | task 18 | processing task
+slot update_slots: id  2 | task 18 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 15
+slot update_slots: id  2 | task 18 | n_tokens = 5, memory_seq_rm [5, end)
+slot update_slots: id  2 | task 18 | prompt processing progress, n_tokens = 15, batch.n_tokens = 10, progress = 1.000000
+slot update_slots: id  2 | task 18 | prompt done, n_tokens = 15, batch.n_tokens = 10
+slot      release: id  2 | task 18 | stop processing: n_tokens = 15, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 20 | processing task
-slot update_slots: id  0 | task 20 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 10
-slot update_slots: id  0 | task 20 | kv cache rm [0, end)
-slot update_slots: id  0 | task 20 | prompt processing progress, n_past = 10, n_tokens = 10, progress = 1.000000
-slot update_slots: id  0 | task 20 | prompt done, n_past = 10, n_tokens = 10
-slot      release: id  0 | task 20 | stop processing: n_past = 10, truncated = 0
+slot get_availabl: id  2 | task -1 | selected slot by LCP similarity, sim_best = 0.333 (> 0.100 thold), f_keep = 0.400
+slot launch_slot_: id  2 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
+slot launch_slot_: id  2 | task 20 | processing task
+slot update_slots: id  2 | task 20 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 18
+slot update_slots: id  2 | task 20 | n_tokens = 6, memory_seq_rm [6, end)
+slot update_slots: id  2 | task 20 | prompt processing progress, n_tokens = 18, batch.n_tokens = 12, progress = 1.000000
+slot update_slots: id  2 | task 20 | prompt done, n_tokens = 18, batch.n_tokens = 12
+slot      release: id  2 | task 20 | stop processing: n_tokens = 18, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 22 | processing task
-slot update_slots: id  0 | task 22 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 12
-slot update_slots: id  0 | task 22 | kv cache rm [7, end)
-slot update_slots: id  0 | task 22 | prompt processing progress, n_past = 12, n_tokens = 5, progress = 0.416667
-slot update_slots: id  0 | task 22 | prompt done, n_past = 12, n_tokens = 5
-slot      release: id  0 | task 22 | stop processing: n_past = 12, truncated = 0
+slot get_availabl: id  1 | task -1 | selected slot by LCP similarity, sim_best = 0.267 (> 0.100 thold), f_keep = 0.235
+slot launch_slot_: id  1 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
+slot launch_slot_: id  1 | task 22 | processing task
+slot update_slots: id  1 | task 22 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 15
+slot update_slots: id  1 | task 22 | n_tokens = 4, memory_seq_rm [4, end)
+slot update_slots: id  1 | task 22 | prompt processing progress, n_tokens = 15, batch.n_tokens = 11, progress = 1.000000
+slot update_slots: id  1 | task 22 | prompt done, n_tokens = 15, batch.n_tokens = 11
+slot      release: id  1 | task 22 | stop processing: n_tokens = 15, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
+slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = -1
+slot launch_slot_: id  0 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
 slot launch_slot_: id  0 | task 24 | processing task
-slot update_slots: id  0 | task 24 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 14
-slot update_slots: id  0 | task 24 | kv cache rm [0, end)
-slot update_slots: id  0 | task 24 | prompt processing progress, n_past = 14, n_tokens = 14, progress = 1.000000
-slot update_slots: id  0 | task 24 | prompt done, n_past = 14, n_tokens = 14
-slot      release: id  0 | task 24 | stop processing: n_past = 14, truncated = 0
+slot update_slots: id  0 | task 24 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 34
+slot update_slots: id  0 | task 24 | n_tokens = 0, memory_seq_rm [0, end)
+slot update_slots: id  0 | task 24 | prompt processing progress, n_tokens = 34, batch.n_tokens = 34, progress = 1.000000
+slot update_slots: id  0 | task 24 | prompt done, n_tokens = 34, batch.n_tokens = 34
+slot      release: id  0 | task 24 | stop processing: n_tokens = 34, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 26 | processing task
-slot update_slots: id  0 | task 26 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 17
-slot update_slots: id  0 | task 26 | kv cache rm [1, end)
-slot update_slots: id  0 | task 26 | prompt processing progress, n_past = 17, n_tokens = 16, progress = 0.941176
-slot update_slots: id  0 | task 26 | prompt done, n_past = 17, n_tokens = 16
-slot      release: id  0 | task 26 | stop processing: n_past = 17, truncated = 0
-srv  update_slots: all slots are idle
-srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 28 | processing task
-slot update_slots: id  0 | task 28 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 15
-slot update_slots: id  0 | task 28 | kv cache rm [0, end)
-slot update_slots: id  0 | task 28 | prompt processing progress, n_past = 15, n_tokens = 15, progress = 1.000000
-slot update_slots: id  0 | task 28 | prompt done, n_past = 15, n_tokens = 15
-slot      release: id  0 | task 28 | stop processing: n_past = 15, truncated = 0
-srv  update_slots: all slots are idle
-srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 30 | processing task
-slot update_slots: id  0 | task 30 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 18
-slot update_slots: id  0 | task 30 | kv cache rm [6, end)
-slot update_slots: id  0 | task 30 | prompt processing progress, n_past = 18, n_tokens = 12, progress = 0.666667
-slot update_slots: id  0 | task 30 | prompt done, n_past = 18, n_tokens = 12
-slot      release: id  0 | task 30 | stop processing: n_past = 18, truncated = 0
-srv  update_slots: all slots are idle
-srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 32 | processing task
-slot update_slots: id  0 | task 32 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 15
-slot update_slots: id  0 | task 32 | kv cache rm [0, end)
-slot update_slots: id  0 | task 32 | prompt processing progress, n_past = 15, n_tokens = 15, progress = 1.000000
-slot update_slots: id  0 | task 32 | prompt done, n_past = 15, n_tokens = 15
-slot      release: id  0 | task 32 | stop processing: n_past = 15, truncated = 0
+slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = 284111945431
+slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
+slot launch_slot_: id  3 | task 26 | processing task
+slot update_slots: id  3 | task 26 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 37
+slot update_slots: id  3 | task 26 | n_tokens = 0, memory_seq_rm [0, end)
+slot update_slots: id  3 | task 26 | prompt processing progress, n_tokens = 37, batch.n_tokens = 37, progress = 1.000000
+slot update_slots: id  3 | task 26 | prompt done, n_tokens = 37, batch.n_tokens = 37
+slot      release: id  3 | task 26 | stop processing: n_tokens = 37, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
--- a/logs/fastapi.log
+++ b/logs/fastapi.log
--- a/logs/inference_model.log
+++ b/logs/inference_model.log
--- a/logs/services.pid
+++ b/logs/services.pid
@@ -1,3 +1,3 @@
-19618
-19619
-19713
+4171746
+4171747
+4171892
--- a/start_all.sh
+++ b/start_all.sh
@@ -19,8 +19,8 @@ NC='\033[0m' # No Color
 # 默认配置（可通过环境变量覆盖）
 PROJECT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 LLAMA_SERVER_DIR="${LLAMA_SERVER_DIR:-~/llama.cpp/build/bin}"
-INFERENCE_MODEL="${INFERENCE_MODEL:-~/models/gguf/Qwen/Qwen3-4B/Qwen3-4B-Q5_K_M.gguf}"
-EMBEDDING_MODEL="${EMBEDDING_MODEL:-~/models/gguf/Qwen/Qwen3-Embedding-4B/Qwen3-Embedding-4B-Q4_K_M.gguf}"
+INFERENCE_MODEL="${INFERENCE_MODEL:-~/models/Qwen2-VL-2B-Instruct-GGUF/Qwen2-VL-2B-Instruct-Q8_0.gguf}"
+EMBEDDING_MODEL="${EMBEDDING_MODEL:-~/models/Qwen3-Embedding-4B-Q5_K_M.gguf}"
 VENV_PATH="${VENV_PATH:-${PROJECT_ROOT}/backend_service/venv}"
 LOG_DIR="${PROJECT_ROOT}/logs"
 PID_FILE="${LOG_DIR}/services.pid"
--- a/start_all_src.sh
+++ b/start_all_src.sh
@@ -0,0 +1,406 @@
+#!/bin/bash
+
+# ==============================================================================
+# 无人机自然语言控制项目 - 一键启动脚本
+# ==============================================================================
+# 功能：启动所有必需的服务（llama-server推理模型、embedding模型、FastAPI后端）
+# 用法：./start_all.sh [选项]
+# ==============================================================================
+
+set -e  # 遇到错误立即退出
+
+# 颜色定义
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+BLUE='\033[0;34m'
+NC='\033[0m' # No Color
+
+# 默认配置（可通过环境变量覆盖）
+PROJECT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+LLAMA_SERVER_DIR="${LLAMA_SERVER_DIR:-~/llama.cpp/build/bin}"
+INFERENCE_MODEL="${INFERENCE_MODEL:-~/models/Qwen3-4B-Q5_K_M.gguf}"
+EMBEDDING_MODEL="${EMBEDDING_MODEL:-~/models/Qwen3-Embedding-4B-Q5_K_M.gguf}"
+VENV_PATH="${VENV_PATH:-${PROJECT_ROOT}/backend_service/venv}"
+LOG_DIR="${PROJECT_ROOT}/logs"
+PID_FILE="${LOG_DIR}/services.pid"
+
+# 端口配置
+INFERENCE_PORT=8081
+EMBEDDING_PORT=8090
+API_PORT=8000
+
+# 创建日志目录
+mkdir -p "${LOG_DIR}"
+
+# ==============================================================================
+# 辅助函数
+# ==============================================================================
+
+print_info() {
+    echo -e "${BLUE}[INFO]${NC} $1"
+}
+
+print_success() {
+    echo -e "${GREEN}[SUCCESS]${NC} $1"
+}
+
+print_warning() {
+    echo -e "${YELLOW}[WARNING]${NC} $1"
+}
+
+print_error() {
+    echo -e "${RED}[ERROR]${NC} $1"
+}
+
+# 检查命令是否存在
+check_command() {
+    if ! command -v "$1" &> /dev/null; then
+        print_error "$1 命令未找到，请先安装"
+        return 1
+    fi
+    return 0
+}
+
+# 检查端口是否被占用
+check_port() {
+    local port=$1
+    if lsof -Pi :${port} -sTCP:LISTEN -t >/dev/null 2>&1 ; then
+        return 0  # 端口被占用
+    else
+        return 1  # 端口空闲
+    fi
+}
+
+# 等待服务就绪
+wait_for_service() {
+    local url=$1
+    local service_name=$2
+    local max_attempts=30
+    local attempt=0
+    
+    print_info "等待 ${service_name} 启动..."
+    while [ $attempt -lt $max_attempts ]; do
+        if curl -s "${url}" > /dev/null 2>&1; then
+            print_success "${service_name} 已就绪"
+            return 0
+        fi
+        attempt=$((attempt + 1))
+        sleep 1
+    done
+    
+    print_error "${service_name} 启动超时"
+    return 1
+}
+
+# 停止所有服务
+stop_services() {
+    print_info "正在停止所有服务..."
+    
+    if [ -f "${PID_FILE}" ]; then
+        while read pid; do
+            if ps -p $pid > /dev/null 2>&1; then
+                print_info "停止进程 PID: $pid"
+                kill $pid 2>/dev/null || true
+            fi
+        done < "${PID_FILE}"
+        rm -f "${PID_FILE}"
+    fi
+    
+    # 尝试通过端口停止服务
+    for port in ${INFERENCE_PORT} ${EMBEDDING_PORT} ${API_PORT}; do
+        if check_port ${port}; then
+            local pid=$(lsof -ti:${port})
+            if [ ! -z "$pid" ]; then
+                print_info "停止占用端口 ${port} 的进程 (PID: $pid)"
+                kill $pid 2>/dev/null || true
+            fi
+        fi
+    done
+    
+    print_success "所有服务已停止"
+}
+
+# 清理函数（脚本退出时调用）
+cleanup() {
+    if [ "$?" -ne 0 ]; then
+        print_error "启动过程中发生错误，正在清理..."
+    fi
+    # 注意：这里不自动停止服务，让用户手动控制
+}
+
+trap cleanup EXIT
+
+# ==============================================================================
+# 主函数
+# ==============================================================================
+
+start_services() {
+    print_info "=========================================="
+    print_info "  无人机自然语言控制项目 - 服务启动"
+    print_info "=========================================="
+    echo ""
+    
+    # 检查必要的命令
+    print_info "检查必要的命令..."
+    check_command "python3" || exit 1
+    check_command "curl" || exit 1
+    check_command "lsof" || print_warning "lsof 未安装，将无法检查端口占用"
+    echo ""
+    
+    # 检查端口占用
+    print_info "检查端口占用..."
+    if check_port ${INFERENCE_PORT}; then
+        print_warning "端口 ${INFERENCE_PORT} 已被占用，推理模型可能已在运行"
+    fi
+    if check_port ${EMBEDDING_PORT}; then
+        print_warning "端口 ${EMBEDDING_PORT} 已被占用，Embedding模型可能已在运行"
+    fi
+    if check_port ${API_PORT}; then
+        print_error "端口 ${API_PORT} 已被占用，请先停止占用该端口的服务"
+        exit 1
+    fi
+    echo ""
+    
+    # 检查llama-server（展开路径中的 ~）
+    local llama_server_dir_expanded=$(eval echo "${LLAMA_SERVER_DIR}")
+    local llama_server="${llama_server_dir_expanded}/llama-server"
+    if [ ! -f "${llama_server}" ]; then
+        print_error "llama-server 未找到: ${llama_server}"
+        print_info "请设置 LLAMA_SERVER_DIR 环境变量指向正确的路径"
+        print_info "当前路径: ${LLAMA_SERVER_DIR}"
+        print_info "展开后路径: ${llama_server_dir_expanded}"
+        exit 1
+    fi
+    print_success "找到 llama-server: ${llama_server}"
+    echo ""
+    
+    # 检查模型文件
+    local inference_model_expanded=$(eval echo "${INFERENCE_MODEL}")
+    local embedding_model_expanded=$(eval echo "${EMBEDDING_MODEL}")
+    
+    if [ ! -f "${inference_model_expanded}" ]; then
+        print_error "推理模型文件未找到: ${inference_model_expanded}"
+        print_info "请设置 INFERENCE_MODEL 环境变量指向正确的模型路径"
+        exit 1
+    fi
+    print_success "找到推理模型: ${inference_model_expanded}"
+    
+    if [ ! -f "${embedding_model_expanded}" ]; then
+        print_error "Embedding模型文件未找到: ${embedding_model_expanded}"
+        print_info "请设置 EMBEDDING_MODEL 环境变量指向正确的模型路径"
+        exit 1
+    fi
+    print_success "找到Embedding模型: ${embedding_model_expanded}"
+    echo ""
+    
+    # 检查ROS2环境
+    local ros2_setup="${PROJECT_ROOT}/install/setup.bash"
+    if [ ! -f "${ros2_setup}" ]; then
+        print_warning "ROS2 setup文件未找到: ${ros2_setup}"
+        print_warning "如果项目已与ROS2解耦，可以忽略此警告"
+    else
+        print_success "找到ROS2 setup文件: ${ros2_setup}"
+    fi
+    echo ""
+    
+    # 检查venv虚拟环境
+    local venv_path_expanded=$(eval echo "${VENV_PATH}")
+    print_info "检查venv虚拟环境: ${venv_path_expanded}"
+    if [ ! -d "${venv_path_expanded}" ]; then
+        print_error "venv虚拟环境目录不存在: ${venv_path_expanded}"
+        print_info "请先创建venv环境: python3 -m venv ${venv_path_expanded}"
+        print_info "然后安装依赖: ${venv_path_expanded}/bin/pip install -r backend_service/requirements.txt"
+        exit 1
+    fi
+    if [ ! -f "${venv_path_expanded}/bin/activate" ]; then
+        print_error "venv激活脚本不存在: ${venv_path_expanded}/bin/activate"
+        print_error "这看起来不是一个有效的venv环境"
+        exit 1
+    fi
+    print_success "venv虚拟环境存在: ${venv_path_expanded}"
+    echo ""
+    
+    # 初始化PID文件
+    > "${PID_FILE}"
+    
+    # ==========================================================================
+    # 启动推理模型服务
+    # ==========================================================================
+    print_info "启动推理模型服务 (端口 ${INFERENCE_PORT})..."
+    cd "${llama_server_dir_expanded}"
+    nohup ./llama-server \
+        -m "${inference_model_expanded}" \
+        --port ${INFERENCE_PORT} \
+        --gpu-layers 36 \
+        --host 0.0.0.0 \
+        -c 8192 \
+        > "${LOG_DIR}/inference_model.log" 2>&1 &
+    local inference_pid=$!
+    echo $inference_pid >> "${PID_FILE}"
+    print_success "推理模型服务已启动 (PID: $inference_pid)"
+    print_info "日志文件: ${LOG_DIR}/inference_model.log"
+    echo ""
+    
+    # ==========================================================================
+    # 启动Embedding模型服务
+    # ==========================================================================
+    print_info "启动Embedding模型服务 (端口 ${EMBEDDING_PORT})..."
+    nohup ./llama-server \
+        -m "${embedding_model_expanded}" \
+        --gpu-layers 36 \
+        --port ${EMBEDDING_PORT} \
+        --embeddings \
+        --pooling last \
+        --host 0.0.0.0 \
+        > "${LOG_DIR}/embedding_model.log" 2>&1 &
+    local embedding_pid=$!
+    echo $embedding_pid >> "${PID_FILE}"
+    print_success "Embedding模型服务已启动 (PID: $embedding_pid)"
+    print_info "日志文件: ${LOG_DIR}/embedding_model.log"
+    echo ""
+    
+    # ==========================================================================
+    # 等待模型服务就绪
+    # ==========================================================================
+    print_info "等待模型服务就绪..."
+    sleep 3  # 给服务一些启动时间
+    
+    # 等待推理模型服务
+    if ! wait_for_service "http://localhost:${INFERENCE_PORT}/health" "推理模型服务"; then
+        # 如果health端点不存在，尝试检查根路径
+        if ! wait_for_service "http://localhost:${INFERENCE_PORT}/v1/models" "推理模型服务"; then
+            print_warning "推理模型服务可能未完全就绪，但将继续启动"
+        fi
+    fi
+    
+    # 等待Embedding模型服务
+    if ! wait_for_service "http://localhost:${EMBEDDING_PORT}/health" "Embedding模型服务"; then
+        if ! wait_for_service "http://localhost:${EMBEDDING_PORT}/v1/models" "Embedding模型服务"; then
+            print_warning "Embedding模型服务可能未完全就绪，但将继续启动"
+        fi
+    fi
+    echo ""
+    
+    # ==========================================================================
+    # 启动FastAPI后端服务
+    # ==========================================================================
+    print_info "启动FastAPI后端服务 (端口 ${API_PORT})..."
+    cd "${PROJECT_ROOT}"
+    
+    # 激活venv虚拟环境并启动FastAPI服务
+    # 使用bash -c来在新的shell中激活venv环境
+    bash -c "
+        # 激活ROS2环境（如果存在）
+        if [ -f '${ros2_setup}' ]; then
+            source '${ros2_setup}'
+        fi
+        # 激活venv虚拟环境
+        source '${venv_path_expanded}/bin/activate' && \
+        cd '${PROJECT_ROOT}/backend_service' && \
+        uvicorn src.main:app --host 0.0.0.0 --port ${API_PORT}
+    " > "${LOG_DIR}/fastapi.log" 2>&1 &
+    local api_pid=$!
+    echo $api_pid >> "${PID_FILE}"
+    print_success "FastAPI服务已启动 (PID: $api_pid)"
+    print_info "日志文件: ${LOG_DIR}/fastapi.log"
+    echo ""
+    
+    # 等待FastAPI服务就绪
+    sleep 3
+    if wait_for_service "http://localhost:${API_PORT}/docs" "FastAPI服务"; then
+        print_success "所有服务已成功启动！"
+    else
+        print_warning "FastAPI服务可能未完全就绪，请检查日志: ${LOG_DIR}/fastapi.log"
+    fi
+    echo ""
+    
+    # 显示服务访问信息
+    print_info "=========================================="
+    print_info "  服务启动完成！"
+    print_info "=========================================="
+    print_info "推理模型API: http://localhost:${INFERENCE_PORT}/v1"
+    print_info "Embedding模型API: http://localhost:${EMBEDDING_PORT}/v1"
+    print_info "FastAPI后端: http://localhost:${API_PORT}"
+    print_info "API文档: http://localhost:${API_PORT}/docs"
+    print_info ""
+    print_info "日志文件位置:"
+    print_info "  - 推理模型: ${LOG_DIR}/inference_model.log"
+    print_info "  - Embedding模型: ${LOG_DIR}/embedding_model.log"
+    print_info "  - FastAPI服务: ${LOG_DIR}/fastapi.log"
+    print_info ""
+    print_info "按 Ctrl+C 停止所有服务"
+    print_info "=========================================="
+    echo ""
+    
+    # 设置信号处理，确保Ctrl+C时能清理
+    trap 'print_info "\n正在停止服务..."; stop_services; exit 0' INT TERM
+    
+    # 等待所有后台进程（保持脚本运行）
+    print_info "所有服务正在运行中，查看日志请使用:"
+    print_info "  tail -f ${LOG_DIR}/*.log"
+    echo ""
+    
+    # 等待所有后台进程
+    wait
+}
+
+# ==============================================================================
+# 脚本入口
+# ==============================================================================
+
+case "${1:-start}" in
+    start)
+        start_services
+        ;;
+    stop)
+        stop_services
+        ;;
+    restart)
+        stop_services
+        sleep 2
+        start_services
+        ;;
+    status)
+        print_info "检查服务状态..."
+        if [ -f "${PID_FILE}" ]; then
+            print_info "已记录的服务进程:"
+            while read pid; do
+                if ps -p $pid > /dev/null 2>&1; then
+                    print_success "PID $pid: 运行中"
+                else
+                    print_warning "PID $pid: 已停止"
+                fi
+            done < "${PID_FILE}"
+        else
+            print_info "未找到PID文件，服务可能未启动"
+        fi
+        echo ""
+        print_info "端口占用情况:"
+        for port in ${INFERENCE_PORT} ${EMBEDDING_PORT} ${API_PORT}; do
+            if check_port ${port}; then
+                local pid=$(lsof -ti:${port})
+                print_success "端口 ${port}: 被占用 (PID: $pid)"
+            else
+                print_warning "端口 ${port}: 空闲"
+            fi
+        done
+        ;;
+    *)
+        echo "用法: $0 {start|stop|restart|status}"
+        echo ""
+        echo "命令说明:"
+        echo "  start   - 启动所有服务（默认）"
+        echo "  stop    - 停止所有服务"
+        echo "  restart - 重启所有服务"
+        echo "  status  - 查看服务状态"
+        echo ""
+        echo "环境变量配置:"
+        echo "  LLAMA_SERVER_DIR  - llama-server所在目录 (默认: ~/llama.cpp/build/bin)"
+        echo "  INFERENCE_MODEL   - 推理模型路径 (默认: ~/models/gguf/Qwen/Qwen3-4B/Qwen3-4B-Q5_K_M.gguf)"
+        echo "  EMBEDDING_MODEL   - Embedding模型路径 (默认: ~/models/gguf/Qwen/Qwen3-Embedding-4B/Qwen3-Embedding-4B-Q4_K_M.gguf)"
+        echo "  VENV_PATH         - venv虚拟环境路径 (默认: \${PROJECT_ROOT}/backend_service/venv)"
+        exit 1
+        ;;
+esac
+
--- a/tools/api_test_qwen2.5_vl_3b.log
+++ b/tools/api_test_qwen2.5_vl_3b.log
--- a/tools/api_test_qwen2_vl_2b.log
+++ b/tools/api_test_qwen2_vl_2b.log
--- a/tools/api_test_qwen3_4b.log
+++ b/tools/api_test_qwen3_4b.log
--- a/tools/api_test_qwen3_vl_2b.log
+++ b/tools/api_test_qwen3_vl_2b.log
--- a/tools/api_test_qwen3_vl_4b.log
+++ b/tools/api_test_qwen3_vl_4b.log
--- a/tools/test_api.py
+++ b/tools/test_api.py
@@ -14,7 +14,8 @@ BASE_URL = "http://127.0.0.1:8000"
 ENDPOINT = "/generate_plan"

 # The user prompt we will send for the test
-TEST_PROMPT = "起飞"
+#TEST_PROMPT = "无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作"
+TEST_PROMPT = "已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒"

 # Log file path (will be created in the same directory as this script)
 LOG_FILE = os.path.join(os.path.dirname(__file__), "api_test.log")
--- a/tools/test_validate/api_test_log_qwen2.5_vl_3b.txt
+++ b/tools/test_validate/api_test_log_qwen2.5_vl_3b.txt
--- a/tools/test_validate/api_test_log_qwen2_vl_2b.txt
+++ b/tools/test_validate/api_test_log_qwen2_vl_2b.txt
--- a/tools/test_validate/api_test_log_qwen3_vl_2b.txt
+++ b/tools/test_validate/api_test_log_qwen3_vl_2b.txt
--- a/tools/test_validate/api_test_log_qwen3_vl_4b.txt
+++ b/tools/test_validate/api_test_log_qwen3_vl_4b.txt
--- a/tools/test_validate/batch_visualize.py
+++ b/tools/test_validate/batch_visualize.py
@@ -0,0 +1,269 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+从API测试日志中提取JSON响应并批量可视化
+"""
+import json
+import os
+import re
+import logging
+import platform
+import random
+import html
+from typing import Dict, List, Tuple
+from collections import defaultdict
+
+# 配置日志
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(levelname)s - %(message)s'
+)
+
+def sanitize_filename(text: str) -> str:
+    """将文本转换为安全的文件名"""
+    # 移除或替换不安全的字符
+    text = re.sub(r'[<>:"/\\|?*]', '_', text)
+    # 限制长度
+    if len(text) > 100:
+        text = text[:100]
+    return text
+
+def _pick_zh_font():
+    """选择合适的中文字体"""
+    sys = platform.system()
+    if sys == "Windows":
+        return "Microsoft YaHei"
+    elif sys == "Darwin":
+        return "PingFang SC"
+    else:
+        return "Noto Sans CJK SC"
+
+def _add_nodes_and_edges(node: dict, dot, parent_id: str | None = None) -> str:
+    """递归辅助函数，用于添加节点和边。"""
+    try:
+        from graphviz import Digraph
+    except ImportError:
+        logging.critical("错误：未安装graphviz库。请运行: pip install graphviz")
+        return ""
+    
+    current_id = f"{id(node)}_{random.randint(1000, 9999)}"
+    
+    # 准备节点标签（HTML-like，正确换行与转义）
+    name = html.escape(str(node.get('name', '')))
+    ntype = html.escape(str(node.get('type', '')))
+    label_parts = [f"<B>{name}</B> <FONT POINT-SIZE='10'><I>({ntype})</I></FONT>"]
+    
+    # 格式化参数显示
+    params = node.get('params') or {}
+    if params:
+        params_lines = []
+        for key, value in params.items():
+            k = html.escape(str(key))
+            if isinstance(value, float):
+                value_str = f"{value:.2f}".rstrip('0').rstrip('.')
+            else:
+                value_str = str(value)
+            v = html.escape(value_str)
+            params_lines.append(f"{k}: {v}")
+        params_text = "<BR ALIGN='LEFT'/>".join(params_lines)
+        label_parts.append(f"<FONT POINT-SIZE='9' COLOR='#555555'>{params_text}</FONT>")
+    
+    node_label = f"<{'<BR/>'.join(label_parts)}>"
+    
+    # 根据类型设置节点样式和颜色
+    node_type = (node.get('type') or '').lower()
+    shape = 'ellipse'
+    style = 'filled'
+    fillcolor = '#e6e6e6'   # 默认灰色填充
+    border_color = '#666666' # 默认描边色
+    
+    if node_type == 'action':
+        shape = 'box'
+        style = 'rounded,filled'
+        fillcolor = "#cde4ff"  # 浅蓝
+    elif node_type == 'condition':
+        shape = 'diamond'
+        style = 'filled'
+        fillcolor = "#fff2cc"  # 浅黄
+    elif node_type == 'sequence':
+        shape = 'ellipse'
+        style = 'filled'
+        fillcolor = '#d5e8d4'  # 绿色
+    elif node_type == 'selector':
+        shape = 'ellipse'
+        style = 'filled'
+        fillcolor = '#ffe6cc'  # 橙色
+    elif node_type == 'parallel':
+        shape = 'ellipse'
+        style = 'filled'
+        fillcolor = '#e1d5e7'  # 紫色
+    
+    # 特别标记安全相关节点
+    if node.get('name') in ['battery_above', 'gps_status', 'SafetyMonitor']:
+        border_color = '#ff0000'  # 红色边框突出显示安全节点
+        style = 'filled,bold'  # 加粗
+    
+    dot.node(current_id, label=node_label, shape=shape, style=style, fillcolor=fillcolor, color=border_color)
+    
+    # 连接父节点
+    if parent_id:
+        dot.edge(parent_id, current_id)
+    
+    # 递归处理子节点
+    children = node.get("children", [])
+    if not children:
+        return current_id
+    
+    # 记录所有子节点的ID
+    child_ids = []
+    
+    # 正确的递归连接：每个子节点都连接到当前节点
+    for child in children:
+        child_id = _add_nodes_and_edges(child, dot, current_id)
+        child_ids.append(child_id)
+    
+    # 子节点同级排列（横向排布，更直观地表现同层）
+    if len(child_ids) > 1:
+        with dot.subgraph(name=f"rank_{current_id}") as s:
+            s.attr(rank='same')
+            for cid in child_ids:
+                s.node(cid)
+    
+    return current_id
+
+def _visualize_pytree(node: Dict, file_path: str):
+    """
+    使用Graphviz将Pytree字典可视化，并保存到指定路径。
+    """
+    try:
+        from graphviz import Digraph
+    except ImportError:
+        logging.critical("错误：未安装graphviz库。请运行: pip install graphviz")
+        return
+
+    fontname = _pick_zh_font()
+    
+    dot = Digraph('Pytree', comment='Drone Mission Plan')
+    dot.attr(rankdir='TB', label='Drone Mission Plan', fontsize='20', fontname=fontname)
+    dot.attr('node', shape='box', style='rounded,filled', fontname=fontname)
+    dot.attr('edge', fontname=fontname)
+    
+    _add_nodes_and_edges(node, dot)
+    
+    try:
+        # 确保输出目录存在，并避免生成 .png.png
+        base_path, ext = os.path.splitext(file_path)
+        render_path = base_path if ext.lower() == '.png' else file_path
+
+        out_dir = os.path.dirname(render_path)
+        if out_dir and not os.path.exists(out_dir):
+            os.makedirs(out_dir, exist_ok=True)
+
+        # 保存为 .png 文件，并自动删除源码 .gv 文件
+        output_path = dot.render(render_path, format='png', cleanup=True, view=False)
+        logging.info(f"✅ 可视化成功: {output_path}")
+    except Exception as e:
+        logging.error(f"❌ 生成可视化图形失败: {e}")
+
+def parse_log_file(log_file_path: str) -> Dict[str, List[Dict]]:
+    """
+    解析日志文件，提取原始指令和完整API响应JSON
+    返回: {原始指令: [JSON响应列表]}
+    """
+    with open(log_file_path, 'r', encoding='utf-8') as f:
+        content = f.read()
+    
+    # 按分隔符分割条目
+    entries = re.split(r'={80,}', content)
+    
+    results = defaultdict(list)
+    
+    for entry in entries:
+        if not entry.strip():
+            continue
+        
+        # 提取原始指令
+        instruction_match = re.search(r'原始指令:\s*(.+)', entry)
+        if not instruction_match:
+            continue
+        
+        original_instruction = instruction_match.group(1).strip()
+        
+        # 提取完整API响应JSON
+        json_match = re.search(r'完整API响应:\s*\n(\{.*\})', entry, re.DOTALL)
+        if not json_match:
+            logging.warning(f"未找到指令 '{original_instruction}' 的JSON响应")
+            continue
+        
+        json_str = json_match.group(1).strip()
+        
+        try:
+            json_obj = json.loads(json_str)
+            results[original_instruction].append(json_obj)
+            logging.info(f"成功提取指令 '{original_instruction}' 的JSON响应")
+        except json.JSONDecodeError as e:
+            logging.error(f"解析指令 '{original_instruction}' 的JSON失败: {e}")
+            continue
+    
+    return results
+
+def process_and_visualize(log_file_path: str, output_dir: str):
+    """
+    处理日志文件并批量可视化
+    """
+    # 创建输出目录
+    os.makedirs(output_dir, exist_ok=True)
+    
+    # 解析日志文件
+    logging.info(f"开始解析日志文件: {log_file_path}")
+    instruction_responses = parse_log_file(log_file_path)
+    
+    logging.info(f"共找到 {len(instruction_responses)} 个不同的原始指令")
+    
+    # 处理每个指令的所有响应
+    for instruction, responses in instruction_responses.items():
+        logging.info(f"\n处理指令: {instruction} (共 {len(responses)} 个响应)")
+        
+        # 创建指令目录（使用安全的文件名）
+        safe_instruction_name = sanitize_filename(instruction)
+        instruction_dir = os.path.join(output_dir, safe_instruction_name)
+        os.makedirs(instruction_dir, exist_ok=True)
+        
+        # 处理每个响应
+        for idx, response in enumerate(responses, 1):
+            try:
+                # 提取root节点
+                root_node = response.get('root')
+                if not root_node:
+                    logging.warning(f"响应 #{idx} 没有root节点，跳过")
+                    continue
+                
+                # 生成文件名
+                json_filename = f"response_{idx}.json"
+                png_filename = f"response_{idx}.png"
+                
+                json_path = os.path.join(instruction_dir, json_filename)
+                png_path = os.path.join(instruction_dir, png_filename)
+                
+                # 保存JSON文件
+                with open(json_path, 'w', encoding='utf-8') as f:
+                    json.dump(response, f, ensure_ascii=False, indent=2)
+                
+                logging.info(f"  保存JSON: {json_filename}")
+                
+                # 生成可视化
+                _visualize_pytree(root_node, png_path)
+                logging.info(f"  生成可视化: {png_filename}")
+                
+            except Exception as e:
+                logging.error(f"处理响应 #{idx} 时出错: {e}")
+                continue
+    
+    logging.info(f"\n✅ 所有处理完成！结果保存在: {output_dir}")
+
+if __name__ == "__main__":
+    log_file = "/home/iscas/WorkSpace/code/DronePlanning/tools/test_validate/api_test_log.txt"
+    output_directory = "/home/iscas/WorkSpace/code/DronePlanning/tools/test_validate/validation"
+    
+    process_and_visualize(log_file, output_directory)
+
--- a/tools/test_validate/instructions.txt
+++ b/tools/test_validate/instructions.txt
@@ -10,5 +10,6 @@
 飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆
 飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆
 起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资
-
+无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作
+已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒

--- a/tools/test_validate/test_results.csv
+++ b/tools/test_validate/test_results.csv
@@ -1,13 +1,15 @@
 instruction_index,instruction,run_number,success,attempts,response_time,plan_id,error,timestamp
-1,起飞,1,True,1,2.4630444049835205,42903026-b02b-4089-859d-aec5cfa2435e,,2025-12-03 17:09:32
-2,起飞后移动到学生宿舍上方降落,1,True,1,10.017558574676514,86238ad2-e275-4d50-905c-175bd2f26fd0,,2025-12-03 17:09:43
-3,起飞后移动到学生宿舍上方查找蓝色的车,1,True,1,12.420023202896118,d8345bc3-b70f-41d7-b9fc-3e4898d7409e,,2025-12-03 17:09:56
-4,起飞后移动到学生宿舍上方寻找蓝色的车,1,True,1,12.864884614944458,29b5ee20-c809-4511-af08-80a85240c729,,2025-12-03 17:10:10
-5,起飞后移动到学生宿舍上方检测蓝色的车,1,True,1,10.438142538070679,5e7eb8c7-287a-469a-b6c0-a4102c1b0dac,,2025-12-03 17:10:21
-6,飞到学生宿舍上方查找蓝色的车,1,True,1,11.751057386398315,ef3d1981-1d51-433d-b2f4-2e92838075fd,,2025-12-03 17:10:34
-7,飞到学生宿舍上方查找蓝色车辆并进行打击,1,True,1,32.890604972839355,d8fc4658-08af-4910-89c4-b029c9a5daa0,,2025-12-03 17:11:08
-8,起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击,1,False,1,33.2862343788147,,,2025-12-03 17:11:42
-9,起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资,1,True,1,12.312166213989258,7fbf0091-f7d3-4c3a-a6b7-4c0bfd4df66e,,2025-12-03 17:11:56
-10,飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆,1,True,1,12.204660892486572,3ae0b258-b7e4-460c-9cfe-4b224266edc4,,2025-12-03 17:12:09
-11,飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆,1,True,1,12.808414936065674,2acb84cf-c89e-460d-a4d9-8d1edb4ee69a,,2025-12-03 17:12:23
-12,起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资,1,True,1,11.071707487106323,c05d46c9-1b1b-4c8d-b64b-86b76d0c4099,,2025-12-03 17:12:35
+1,起飞,1,True,1,0.3463160991668701,0ffa333d-574d-453d-99cd-f8852411b7be,,2025-12-08 16:08:41
+2,起飞后移动到学生宿舍上方降落,1,True,1,0.1823880672454834,46c5741f-1e51-4cbe-bd5a-1e099d0d53f5,,2025-12-08 16:08:42
+3,起飞后移动到学生宿舍上方查找蓝色的车,1,True,1,0.24654889106750488,636bfbb8-c3be-42b6-93cf-caf87ac6424c,,2025-12-08 16:08:43
+4,起飞后移动到学生宿舍上方寻找蓝色的车,1,True,1,0.23946380615234375,744d6e87-5067-4f91-9f73-7f65432e1b83,,2025-12-08 16:08:45
+5,起飞后移动到学生宿舍上方检测蓝色的车,1,True,1,3.5440704822540283,1bf6820d-0c04-4961-b624-49e9d919ac56,,2025-12-08 16:08:49
+6,飞到学生宿舍上方查找蓝色的车,1,False,1,3.451496124267578,,,2025-12-08 16:08:54
+7,飞到学生宿舍上方查找蓝色车辆并进行打击,1,False,1,3.321821689605713,,,2025-12-08 16:08:58
+8,起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击,1,False,1,18.552793502807617,,,2025-12-08 16:09:17
+9,起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资,1,True,1,1.5930235385894775,1ae38dbd-4e25-4a51-ac4f-8ed851fe8b1f,,2025-12-08 16:09:20
+10,飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆,1,False,1,17.402809381484985,,,2025-12-08 16:09:38
+11,飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆,1,True,1,1.269315481185913,24b02e2d-291d-4213-9e0f-acdd1165a1f1,,2025-12-08 16:09:41
+12,起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资,1,True,1,3.885636329650879,685b1d6d-8a82-463c-ab68-051348403c89,,2025-12-08 16:09:46
+13,无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作,1,False,1,16.88854742050171,,,2025-12-08 16:10:04
+14,已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒,1,False,1,3.594463586807251,,,2025-12-08 16:10:08
--- a/tools/test_validate/test_summary.csv
+++ b/tools/test_validate/test_summary.csv
@@ -1,13 +1,15 @@
 instruction_index,instruction,total_runs,successful_runs,success_rate,avg_response_time,min_response_time,max_response_time,total_response_time
-1,起飞,1,1,100.00%,2.46s,2.46s,2.46s,2.46s
-2,起飞后移动到学生宿舍上方降落,1,1,100.00%,10.02s,10.02s,10.02s,10.02s
-3,起飞后移动到学生宿舍上方查找蓝色的车,1,1,100.00%,12.42s,12.42s,12.42s,12.42s
-4,起飞后移动到学生宿舍上方寻找蓝色的车,1,1,100.00%,12.86s,12.86s,12.86s,12.86s
-5,起飞后移动到学生宿舍上方检测蓝色的车,1,1,100.00%,10.44s,10.44s,10.44s,10.44s
-6,飞到学生宿舍上方查找蓝色的车,1,1,100.00%,11.75s,11.75s,11.75s,11.75s
-7,飞到学生宿舍上方查找蓝色车辆并进行打击,1,1,100.00%,32.89s,32.89s,32.89s,32.89s
+1,起飞,1,1,100.00%,0.35s,0.35s,0.35s,0.35s
+2,起飞后移动到学生宿舍上方降落,1,1,100.00%,0.18s,0.18s,0.18s,0.18s
+3,起飞后移动到学生宿舍上方查找蓝色的车,1,1,100.00%,0.25s,0.25s,0.25s,0.25s
+4,起飞后移动到学生宿舍上方寻找蓝色的车,1,1,100.00%,0.24s,0.24s,0.24s,0.24s
+5,起飞后移动到学生宿舍上方检测蓝色的车,1,1,100.00%,3.54s,3.54s,3.54s,3.54s
+6,飞到学生宿舍上方查找蓝色的车,1,0,0.00%,N/A,N/A,N/A,0.00s
+7,飞到学生宿舍上方查找蓝色车辆并进行打击,1,0,0.00%,N/A,N/A,N/A,0.00s
 8,起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击,1,0,0.00%,N/A,N/A,N/A,0.00s
-9,起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资,1,1,100.00%,12.31s,12.31s,12.31s,12.31s
-10,飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆,1,1,100.00%,12.20s,12.20s,12.20s,12.20s
-11,飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆,1,1,100.00%,12.81s,12.81s,12.81s,12.81s
-12,起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资,1,1,100.00%,11.07s,11.07s,11.07s,11.07s
+9,起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资,1,1,100.00%,1.59s,1.59s,1.59s,1.59s
+10,飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆,1,0,0.00%,N/A,N/A,N/A,0.00s
+11,飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆,1,1,100.00%,1.27s,1.27s,1.27s,1.27s
+12,起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资,1,1,100.00%,3.89s,3.89s,3.89s,3.89s
+13,无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作,1,0,0.00%,N/A,N/A,N/A,0.00s
+14,已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒,1,0,0.00%,N/A,N/A,N/A,0.00s
--- a/tools/test_validate/test_validity.py
+++ b/tools/test_validate/test_validity.py
@@ -505,4 +505,4 @@ def main():
    print(f"统计摘要: {SUMMARY_CSV}")

 if __name__ == "__main__":
-    main()
+    main()
--- a/tools/test_validate/validation_qwen2.5_vl_3b/已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方降落/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方降落/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方降落/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方降落/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方查找蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方查找蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方查找蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方查找蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方查找蓝色车辆并进行打击/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方查找蓝色车辆并进行打击/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方查找蓝色车辆并进行打击/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方查找蓝色车辆并进行打击/response_1.png
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞/response_1.json
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞/response_1.json
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞/response_1.png
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞/response_1.png
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.json
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.json
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.png
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.png
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.json
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.json
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.png
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.png
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方降落/response_1.json
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方降落/response_1.json
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方降落/response_1.png
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方降落/response_1.png
--- a/tools/test_validate/validation_qwen2_vl_2b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.json
+++ b/tools/test_validate/validation_qwen2_vl_2b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.json
--- a/tools/test_validate/validation_qwen2_vl_2b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.png
+++ b/tools/test_validate/validation_qwen2_vl_2b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方降落/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方降落/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方降落/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方降落/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方查找蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方查找蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方查找蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方查找蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方查找蓝色车辆并进行打击/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方查找蓝色车辆并进行打击/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方查找蓝色车辆并进行打击/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方查找蓝色车辆并进行打击/response_1.png
--- a/Show More
+++ b/Show More