feat: 完善批量可视化测试验证逻辑（仅提交业务代码）

2025-12-09 13:20:39 +08:00
129 changed files with 21380 additions and 1984 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,19 @@
 # 图片/二进制大文件
 *.png
 *.jpg
 *.jpeg
 # 数据库文件
 *.sqlite3
 *.db
 # 日志/缓存
 logs/
 *.log
 __pycache__/
 *.pyc
 *.pid
 # 测试临时文件
 tools/api_test*.log
 tools/test_validate/validation_*/
--- a/backend_service/generated_visualizations/py_tree.png
+++ b/backend_service/generated_visualizations/py_tree.png
--- a/backend_service/src/pycache/init.cpython-310.pyc
+++ b/backend_service/src/pycache/init.cpython-310.pyc
--- a/backend_service/src/pycache/main.cpython-310.pyc
+++ b/backend_service/src/pycache/main.cpython-310.pyc
--- a/backend_service/src/pycache/models.cpython-310.pyc
+++ b/backend_service/src/pycache/models.cpython-310.pyc
--- a/backend_service/src/pycache/py_tree_generator.cpython-310.pyc
+++ b/backend_service/src/pycache/py_tree_generator.cpython-310.pyc
--- a/backend_service/src/pycache/websocket_manager.cpython-310.pyc
+++ b/backend_service/src/pycache/websocket_manager.cpython-310.pyc
--- a/logs/embedding_model.log
+++ b/logs/embedding_model.log
@@ -1,18 +1,19 @@
 ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
 ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
 ggml_cuda_init: found 1 CUDA devices:
-  Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
+  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
-build: 6097 (9515c613) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
+main: setting n_parallel = 4 and kv_unified = true (add -kvu to disable this)
-system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
+build: 7212 (ff90508d6) with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu
 system info: n_threads = 8, n_threads_batch = 8, total_threads = 32
-system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CUDA : ARCHS = 500,610,700,750,800,860,890 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 
+system_info: n_threads = 8 (n_threads_batch = 8) / 32 | CUDA : ARCHS = 890 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 
-main: binding port with default address family
+init: using 31 threads for HTTP server
-main: HTTP server is listening, hostname: 0.0.0.0, port: 8090, http threads: 15
+start: binding port with default address family
 main: loading model
-srv    load_model: loading model '/home/huangfukk/models/gguf/Qwen/Qwen3-Embedding-4B/Qwen3-Embedding-4B-Q4_K_M.gguf'
+srv    load_model: loading model '/home/iscas/models/Qwen3-Embedding-4B-Q5_K_M.gguf'
-llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4060 Ti) - 15225 MiB free
+llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 5090) (0000:01:00.0) - 29667 MiB free
-llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from /home/huangfukk/models/gguf/Qwen/Qwen3-Embedding-4B/Qwen3-Embedding-4B-Q4_K_M.gguf (version GGUF V3 (latest))
+llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from /home/iscas/models/Qwen3-Embedding-4B-Q5_K_M.gguf (version GGUF V3 (latest))
 llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
 llama_model_loader: - kv   0:                       general.architecture str              = qwen3
 llama_model_loader: - kv   1:                               general.type str              = model
@@ -49,13 +50,13 @@ llama_model_loader: - kv  31:               tokenizer.ggml.add_eos_token bool
 llama_model_loader: - kv  32:               tokenizer.ggml.add_bos_token bool             = false
 llama_model_loader: - kv  33:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
 llama_model_loader: - kv  34:               general.quantization_version u32              = 2
-llama_model_loader: - kv  35:                          general.file_type u32              = 15
+llama_model_loader: - kv  35:                          general.file_type u32              = 17
 llama_model_loader: - type  f32:  145 tensors
-llama_model_loader: - type q4_K:  216 tensors
+llama_model_loader: - type q5_K:  216 tensors
 llama_model_loader: - type q6_K:   37 tensors
 print_info: file format = GGUF V3 (latest)
-print_info: file type   = Q4_K - Medium
+print_info: file type   = Q5_K - Medium
-print_info: file size   = 2.32 GiB (4.95 BPW) 
+print_info: file size   = 2.68 GiB (5.73 BPW) 
 load: printing all EOG tokens:
 load:   - 151643 ('<|endoftext|>')
 load:   - 151645 ('<|im_end|>')
@@ -68,6 +69,7 @@ print_info: arch             = qwen3
 print_info: vocab_only       = 0
 print_info: n_ctx_train      = 40960
 print_info: n_embd           = 2560
 print_info: n_embd_inp       = 2560
 print_info: n_layer          = 36
 print_info: n_head           = 32
 print_info: n_head_kv        = 8
@@ -88,6 +90,8 @@ print_info: f_attn_scale     = 0.0e+00
 print_info: n_ff             = 9728
 print_info: n_expert         = 0
 print_info: n_expert_used    = 0
 print_info: n_expert_groups  = 0
 print_info: n_group_used     = 0
 print_info: causal attn      = 1
 print_info: pooling type     = 3
 print_info: rope type        = 2
@@ -122,27 +126,28 @@ print_info: max token length = 256
 load_tensors: loading model tensors, this can take a while... (mmap = true)
 load_tensors: offloading 36 repeating layers to GPU
 load_tensors: offloaded 36/37 layers to GPU
 load_tensors:        CUDA0 model buffer size =  2071.62 MiB
 load_tensors:   CPU_Mapped model buffer size =   303.75 MiB
-.........................................................................................
+load_tensors:        CUDA0 model buffer size =  2445.68 MiB
 ..........................................................................................
 llama_context: constructing llama_context
-llama_context: n_seq_max     = 1
+llama_context: n_seq_max     = 4
 llama_context: n_ctx         = 4096
-llama_context: n_ctx_per_seq = 4096
+llama_context: n_ctx_seq     = 4096
 llama_context: n_batch       = 2048
 llama_context: n_ubatch      = 512
 llama_context: causal_attn   = 1
-llama_context: flash_attn    = 0
+llama_context: flash_attn    = auto
-llama_context: kv_unified    = false
+llama_context: kv_unified    = true
 llama_context: freq_base     = 1000000.0
 llama_context: freq_scale    = 1
-llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized
+llama_context: n_ctx_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized
-llama_context:        CPU  output buffer size =     0.59 MiB
+llama_context:        CPU  output buffer size =     2.35 MiB
-llama_kv_cache_unified:      CUDA0 KV buffer size =   576.00 MiB
+llama_kv_cache:      CUDA0 KV buffer size =   576.00 MiB
-llama_kv_cache_unified: size =  576.00 MiB (  4096 cells,  36 layers,  1/1 seqs), K (f16):  288.00 MiB, V (f16):  288.00 MiB
+llama_kv_cache: size =  576.00 MiB (  4096 cells,  36 layers,  4/1 seqs), K (f16):  288.00 MiB, V (f16):  288.00 MiB
 llama_context: Flash Attention was auto, set to enabled
 llama_context:      CUDA0 compute buffer size =   604.96 MiB
-llama_context:  CUDA_Host compute buffer size =    17.01 MiB
+llama_context:  CUDA_Host compute buffer size =    13.01 MiB
-llama_context: graph nodes  = 1411
+llama_context: graph nodes  = 1268
 llama_context: graph splits = 4 (with bs=512), 3 (with bs=1)
 common_init_from_params: added <|endoftext|> logit bias = -inf
 common_init_from_params: added <|im_end|> logit bias = -inf
@@ -151,10 +156,16 @@ common_init_from_params: added <|repo_name|> logit bias = -inf
 common_init_from_params: added <|file_sep|> logit bias = -inf
 common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
 common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
-srv          init: initializing slots, n_slots = 1
+srv          init: initializing slots, n_slots = 4
-slot         init: id  0 | task -1 | new slot n_ctx_slot = 4096
+slot         init: id  0 | task -1 | new slot, n_ctx = 4096
-main: model loaded
+slot         init: id  1 | task -1 | new slot, n_ctx = 4096
-main: chat template, chat_template: {%- if tools %}
+slot         init: id  2 | task -1 | new slot, n_ctx = 4096
 slot         init: id  3 | task -1 | new slot, n_ctx = 4096
 srv          init: prompt cache is enabled, size limit: 8192 MiB
 srv          init: use `--cache-ram 0` to disable the prompt cache
 srv          init: for more info see https://github.com/ggml-org/llama.cpp/pull/16391
 srv          init: thinking = 0
 init: chat template, chat_template: {%- if tools %}
    {{- '<|im_start|>system\n' }}
    {%- if messages[0]['role'] == 'system' %}
        {{- messages[0]['content'] }}
@@ -218,142 +229,148 @@ Hi there<|im_end|>
 How are you?<|im_end|>
 <|im_start|>assistant
 '
-main: server is listening on http://0.0.0.0:8090 - starting the main loop
+main: model loaded
 main: server is listening on http://0.0.0.0:8090
 main: starting the main loop...
 srv  update_slots: all slots are idle
 srv  log_server_r: request: GET /health 127.0.0.1 200
-slot launch_slot_: id  0 | task 0 | processing task
+slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = -1
-slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 2
+slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
-slot update_slots: id  0 | task 0 | kv cache rm [0, end)
+slot launch_slot_: id  3 | task 0 | processing task
-slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 2, n_tokens = 2, progress = 1.000000
+slot update_slots: id  3 | task 0 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 2
-slot update_slots: id  0 | task 0 | prompt done, n_past = 2, n_tokens = 2
+slot update_slots: id  3 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
-slot      release: id  0 | task 0 | stop processing: n_past = 2, truncated = 0
+slot update_slots: id  3 | task 0 | prompt processing progress, n_tokens = 2, batch.n_tokens = 2, progress = 1.000000
 slot update_slots: id  3 | task 0 | prompt done, n_tokens = 2, batch.n_tokens = 2
 slot      release: id  3 | task 0 | stop processing: n_tokens = 2, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 2 | processing task
+slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 0.111 (> 0.100 thold), f_keep = 0.500
-slot update_slots: id  0 | task 2 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 9
+slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
-slot update_slots: id  0 | task 2 | kv cache rm [1, end)
+slot launch_slot_: id  3 | task 2 | processing task
-slot update_slots: id  0 | task 2 | prompt processing progress, n_past = 9, n_tokens = 8, progress = 0.888889
+slot update_slots: id  3 | task 2 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 9
-slot update_slots: id  0 | task 2 | prompt done, n_past = 9, n_tokens = 8
+slot update_slots: id  3 | task 2 | n_tokens = 1, memory_seq_rm [1, end)
-slot      release: id  0 | task 2 | stop processing: n_past = 9, truncated = 0
+slot update_slots: id  3 | task 2 | prompt processing progress, n_tokens = 9, batch.n_tokens = 8, progress = 1.000000
 slot update_slots: id  3 | task 2 | prompt done, n_tokens = 9, batch.n_tokens = 8
 slot      release: id  3 | task 2 | stop processing: n_tokens = 9, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 4 | processing task
+slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 0.583 (> 0.100 thold), f_keep = 0.778
-slot update_slots: id  0 | task 4 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 12
+slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
-slot update_slots: id  0 | task 4 | kv cache rm [7, end)
+slot launch_slot_: id  3 | task 4 | processing task
-slot update_slots: id  0 | task 4 | prompt processing progress, n_past = 12, n_tokens = 5, progress = 0.416667
+slot update_slots: id  3 | task 4 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 12
-slot update_slots: id  0 | task 4 | prompt done, n_past = 12, n_tokens = 5
+slot update_slots: id  3 | task 4 | n_tokens = 7, memory_seq_rm [7, end)
-slot      release: id  0 | task 4 | stop processing: n_past = 12, truncated = 0
+slot update_slots: id  3 | task 4 | prompt processing progress, n_tokens = 12, batch.n_tokens = 5, progress = 1.000000
 slot update_slots: id  3 | task 4 | prompt done, n_tokens = 12, batch.n_tokens = 5
 slot      release: id  3 | task 4 | stop processing: n_tokens = 12, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 6 | processing task
+slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 0.583 (> 0.100 thold), f_keep = 0.583
-slot update_slots: id  0 | task 6 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 2
+slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
-slot update_slots: id  0 | task 6 | kv cache rm [1, end)
+slot launch_slot_: id  3 | task 6 | processing task
-slot update_slots: id  0 | task 6 | prompt processing progress, n_past = 2, n_tokens = 1, progress = 0.500000
+slot update_slots: id  3 | task 6 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 12
-slot update_slots: id  0 | task 6 | prompt done, n_past = 2, n_tokens = 1
+slot update_slots: id  3 | task 6 | n_tokens = 7, memory_seq_rm [7, end)
-slot      release: id  0 | task 6 | stop processing: n_past = 2, truncated = 0
+slot update_slots: id  3 | task 6 | prompt processing progress, n_tokens = 12, batch.n_tokens = 5, progress = 1.000000
 slot update_slots: id  3 | task 6 | prompt done, n_tokens = 12, batch.n_tokens = 5
 slot      release: id  3 | task 6 | stop processing: n_tokens = 12, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 8 | processing task
+slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 0.583 (> 0.100 thold), f_keep = 0.583
-slot update_slots: id  0 | task 8 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 9
+slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
-slot update_slots: id  0 | task 8 | kv cache rm [1, end)
+slot launch_slot_: id  3 | task 8 | processing task
-slot update_slots: id  0 | task 8 | prompt processing progress, n_past = 9, n_tokens = 8, progress = 0.888889
+slot update_slots: id  3 | task 8 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 12
-slot update_slots: id  0 | task 8 | prompt done, n_past = 9, n_tokens = 8
+slot update_slots: id  3 | task 8 | n_tokens = 7, memory_seq_rm [7, end)
-slot      release: id  0 | task 8 | stop processing: n_past = 9, truncated = 0
+slot update_slots: id  3 | task 8 | prompt processing progress, n_tokens = 12, batch.n_tokens = 5, progress = 1.000000
 slot update_slots: id  3 | task 8 | prompt done, n_tokens = 12, batch.n_tokens = 5
 slot      release: id  3 | task 8 | stop processing: n_tokens = 12, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 10 | processing task
+slot get_availabl: id  2 | task -1 | selected slot by LRU, t_last = -1
-slot update_slots: id  0 | task 10 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 2
+slot launch_slot_: id  2 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
-slot update_slots: id  0 | task 10 | kv cache rm [1, end)
+slot launch_slot_: id  2 | task 10 | processing task
-slot update_slots: id  0 | task 10 | prompt processing progress, n_past = 2, n_tokens = 1, progress = 0.500000
+slot update_slots: id  2 | task 10 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 10
-slot update_slots: id  0 | task 10 | prompt done, n_past = 2, n_tokens = 1
+slot update_slots: id  2 | task 10 | n_tokens = 0, memory_seq_rm [0, end)
-slot      release: id  0 | task 10 | stop processing: n_past = 2, truncated = 0
+slot update_slots: id  2 | task 10 | prompt processing progress, n_tokens = 10, batch.n_tokens = 10, progress = 1.000000
 slot update_slots: id  2 | task 10 | prompt done, n_tokens = 10, batch.n_tokens = 10
 slot      release: id  2 | task 10 | stop processing: n_tokens = 10, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 12 | processing task
+slot get_availabl: id  2 | task -1 | selected slot by LCP similarity, sim_best = 0.583 (> 0.100 thold), f_keep = 0.700
-slot update_slots: id  0 | task 12 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 9
+slot launch_slot_: id  2 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
-slot update_slots: id  0 | task 12 | kv cache rm [1, end)
+slot launch_slot_: id  2 | task 12 | processing task
-slot update_slots: id  0 | task 12 | prompt processing progress, n_past = 9, n_tokens = 8, progress = 0.888889
+slot update_slots: id  2 | task 12 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 12
-slot update_slots: id  0 | task 12 | prompt done, n_past = 9, n_tokens = 8
+slot update_slots: id  2 | task 12 | n_tokens = 7, memory_seq_rm [7, end)
-slot      release: id  0 | task 12 | stop processing: n_past = 9, truncated = 0
+slot update_slots: id  2 | task 12 | prompt processing progress, n_tokens = 12, batch.n_tokens = 5, progress = 1.000000
 slot update_slots: id  2 | task 12 | prompt done, n_tokens = 12, batch.n_tokens = 5
 slot      release: id  2 | task 12 | stop processing: n_tokens = 12, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 14 | processing task
+slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 0.500 (> 0.100 thold), f_keep = 0.583
-slot update_slots: id  0 | task 14 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 12
+slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
-slot update_slots: id  0 | task 14 | kv cache rm [7, end)
+slot launch_slot_: id  3 | task 14 | processing task
-slot update_slots: id  0 | task 14 | prompt processing progress, n_past = 12, n_tokens = 5, progress = 0.416667
+slot update_slots: id  3 | task 14 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 14
-slot update_slots: id  0 | task 14 | prompt done, n_past = 12, n_tokens = 5
+slot update_slots: id  3 | task 14 | n_tokens = 7, memory_seq_rm [7, end)
-slot      release: id  0 | task 14 | stop processing: n_past = 12, truncated = 0
+slot update_slots: id  3 | task 14 | prompt processing progress, n_tokens = 14, batch.n_tokens = 7, progress = 1.000000
 slot update_slots: id  3 | task 14 | prompt done, n_tokens = 14, batch.n_tokens = 7
 slot      release: id  3 | task 14 | stop processing: n_tokens = 14, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 16 | processing task
+slot get_availabl: id  1 | task -1 | selected slot by LRU, t_last = -1
-slot update_slots: id  0 | task 16 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 12
+slot launch_slot_: id  1 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
-slot update_slots: id  0 | task 16 | kv cache rm [7, end)
+slot launch_slot_: id  1 | task 16 | processing task
-slot update_slots: id  0 | task 16 | prompt processing progress, n_past = 12, n_tokens = 5, progress = 0.416667
+slot update_slots: id  1 | task 16 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 17
-slot update_slots: id  0 | task 16 | prompt done, n_past = 12, n_tokens = 5
+slot update_slots: id  1 | task 16 | n_tokens = 0, memory_seq_rm [0, end)
-slot      release: id  0 | task 16 | stop processing: n_past = 12, truncated = 0
+slot update_slots: id  1 | task 16 | prompt processing progress, n_tokens = 17, batch.n_tokens = 17, progress = 1.000000
 slot update_slots: id  1 | task 16 | prompt done, n_tokens = 17, batch.n_tokens = 17
 slot      release: id  1 | task 16 | stop processing: n_tokens = 17, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 18 | processing task
+slot get_availabl: id  2 | task -1 | selected slot by LCP similarity, sim_best = 0.333 (> 0.100 thold), f_keep = 0.417
-slot update_slots: id  0 | task 18 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 12
+slot launch_slot_: id  2 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
-slot update_slots: id  0 | task 18 | kv cache rm [7, end)
+slot launch_slot_: id  2 | task 18 | processing task
-slot update_slots: id  0 | task 18 | prompt processing progress, n_past = 12, n_tokens = 5, progress = 0.416667
+slot update_slots: id  2 | task 18 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 15
-slot update_slots: id  0 | task 18 | prompt done, n_past = 12, n_tokens = 5
+slot update_slots: id  2 | task 18 | n_tokens = 5, memory_seq_rm [5, end)
-slot      release: id  0 | task 18 | stop processing: n_past = 12, truncated = 0
+slot update_slots: id  2 | task 18 | prompt processing progress, n_tokens = 15, batch.n_tokens = 10, progress = 1.000000
 slot update_slots: id  2 | task 18 | prompt done, n_tokens = 15, batch.n_tokens = 10
 slot      release: id  2 | task 18 | stop processing: n_tokens = 15, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 20 | processing task
+slot get_availabl: id  2 | task -1 | selected slot by LCP similarity, sim_best = 0.333 (> 0.100 thold), f_keep = 0.400
-slot update_slots: id  0 | task 20 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 10
+slot launch_slot_: id  2 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
-slot update_slots: id  0 | task 20 | kv cache rm [0, end)
+slot launch_slot_: id  2 | task 20 | processing task
-slot update_slots: id  0 | task 20 | prompt processing progress, n_past = 10, n_tokens = 10, progress = 1.000000
+slot update_slots: id  2 | task 20 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 18
-slot update_slots: id  0 | task 20 | prompt done, n_past = 10, n_tokens = 10
+slot update_slots: id  2 | task 20 | n_tokens = 6, memory_seq_rm [6, end)
-slot      release: id  0 | task 20 | stop processing: n_past = 10, truncated = 0
+slot update_slots: id  2 | task 20 | prompt processing progress, n_tokens = 18, batch.n_tokens = 12, progress = 1.000000
 slot update_slots: id  2 | task 20 | prompt done, n_tokens = 18, batch.n_tokens = 12
 slot      release: id  2 | task 20 | stop processing: n_tokens = 18, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 22 | processing task
+slot get_availabl: id  1 | task -1 | selected slot by LCP similarity, sim_best = 0.267 (> 0.100 thold), f_keep = 0.235
-slot update_slots: id  0 | task 22 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 12
+slot launch_slot_: id  1 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
-slot update_slots: id  0 | task 22 | kv cache rm [7, end)
+slot launch_slot_: id  1 | task 22 | processing task
-slot update_slots: id  0 | task 22 | prompt processing progress, n_past = 12, n_tokens = 5, progress = 0.416667
+slot update_slots: id  1 | task 22 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 15
-slot update_slots: id  0 | task 22 | prompt done, n_past = 12, n_tokens = 5
+slot update_slots: id  1 | task 22 | n_tokens = 4, memory_seq_rm [4, end)
-slot      release: id  0 | task 22 | stop processing: n_past = 12, truncated = 0
+slot update_slots: id  1 | task 22 | prompt processing progress, n_tokens = 15, batch.n_tokens = 11, progress = 1.000000
 slot update_slots: id  1 | task 22 | prompt done, n_tokens = 15, batch.n_tokens = 11
 slot      release: id  1 | task 22 | stop processing: n_tokens = 15, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
 slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = -1
 slot launch_slot_: id  0 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
 slot launch_slot_: id  0 | task 24 | processing task
-slot update_slots: id  0 | task 24 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 14
+slot update_slots: id  0 | task 24 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 34
-slot update_slots: id  0 | task 24 | kv cache rm [0, end)
+slot update_slots: id  0 | task 24 | n_tokens = 0, memory_seq_rm [0, end)
-slot update_slots: id  0 | task 24 | prompt processing progress, n_past = 14, n_tokens = 14, progress = 1.000000
+slot update_slots: id  0 | task 24 | prompt processing progress, n_tokens = 34, batch.n_tokens = 34, progress = 1.000000
-slot update_slots: id  0 | task 24 | prompt done, n_past = 14, n_tokens = 14
+slot update_slots: id  0 | task 24 | prompt done, n_tokens = 34, batch.n_tokens = 34
-slot      release: id  0 | task 24 | stop processing: n_past = 14, truncated = 0
+slot      release: id  0 | task 24 | stop processing: n_tokens = 34, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
-slot launch_slot_: id  0 | task 26 | processing task
+slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = 284111945431
-slot update_slots: id  0 | task 26 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 17
+slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
-slot update_slots: id  0 | task 26 | kv cache rm [1, end)
+slot launch_slot_: id  3 | task 26 | processing task
-slot update_slots: id  0 | task 26 | prompt processing progress, n_past = 17, n_tokens = 16, progress = 0.941176
+slot update_slots: id  3 | task 26 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 37
-slot update_slots: id  0 | task 26 | prompt done, n_past = 17, n_tokens = 16
+slot update_slots: id  3 | task 26 | n_tokens = 0, memory_seq_rm [0, end)
-slot      release: id  0 | task 26 | stop processing: n_past = 17, truncated = 0
+slot update_slots: id  3 | task 26 | prompt processing progress, n_tokens = 37, batch.n_tokens = 37, progress = 1.000000
-srv  update_slots: all slots are idle
+slot update_slots: id  3 | task 26 | prompt done, n_tokens = 37, batch.n_tokens = 37
-srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
+slot      release: id  3 | task 26 | stop processing: n_tokens = 37, truncated = 0
 slot launch_slot_: id  0 | task 28 | processing task
 slot update_slots: id  0 | task 28 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 15
 slot update_slots: id  0 | task 28 | kv cache rm [0, end)
 slot update_slots: id  0 | task 28 | prompt processing progress, n_past = 15, n_tokens = 15, progress = 1.000000
 slot update_slots: id  0 | task 28 | prompt done, n_past = 15, n_tokens = 15
 slot      release: id  0 | task 28 | stop processing: n_past = 15, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
 slot launch_slot_: id  0 | task 30 | processing task
 slot update_slots: id  0 | task 30 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 18
 slot update_slots: id  0 | task 30 | kv cache rm [6, end)
 slot update_slots: id  0 | task 30 | prompt processing progress, n_past = 18, n_tokens = 12, progress = 0.666667
 slot update_slots: id  0 | task 30 | prompt done, n_past = 18, n_tokens = 12
 slot      release: id  0 | task 30 | stop processing: n_past = 18, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
 slot launch_slot_: id  0 | task 32 | processing task
 slot update_slots: id  0 | task 32 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 15
 slot update_slots: id  0 | task 32 | kv cache rm [0, end)
 slot update_slots: id  0 | task 32 | prompt processing progress, n_past = 15, n_tokens = 15, progress = 1.000000
 slot update_slots: id  0 | task 32 | prompt done, n_past = 15, n_tokens = 15
 slot      release: id  0 | task 32 | stop processing: n_past = 15, truncated = 0
 srv  update_slots: all slots are idle
 srv  log_server_r: request: POST /v1/embeddings 127.0.0.1 200
--- a/logs/fastapi.log
+++ b/logs/fastapi.log
--- a/logs/inference_model.log
+++ b/logs/inference_model.log
--- a/logs/services.pid
+++ b/logs/services.pid
@@ -1,3 +1,3 @@
-19618
+4171746
-19619
+4171747
-19713
+4171892
--- a/start_all.sh
+++ b/start_all.sh
@@ -19,8 +19,8 @@ NC='\033[0m' # No Color
 # 默认配置（可通过环境变量覆盖）
 PROJECT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 LLAMA_SERVER_DIR="${LLAMA_SERVER_DIR:-~/llama.cpp/build/bin}"
-INFERENCE_MODEL="${INFERENCE_MODEL:-~/models/gguf/Qwen/Qwen3-4B/Qwen3-4B-Q5_K_M.gguf}"
+INFERENCE_MODEL="${INFERENCE_MODEL:-~/models/Qwen2-VL-2B-Instruct-GGUF/Qwen2-VL-2B-Instruct-Q8_0.gguf}"
-EMBEDDING_MODEL="${EMBEDDING_MODEL:-~/models/gguf/Qwen/Qwen3-Embedding-4B/Qwen3-Embedding-4B-Q4_K_M.gguf}"
+EMBEDDING_MODEL="${EMBEDDING_MODEL:-~/models/Qwen3-Embedding-4B-Q5_K_M.gguf}"
 VENV_PATH="${VENV_PATH:-${PROJECT_ROOT}/backend_service/venv}"
 LOG_DIR="${PROJECT_ROOT}/logs"
 PID_FILE="${LOG_DIR}/services.pid"
--- a/start_all_src.sh
+++ b/start_all_src.sh
@@ -0,0 +1,406 @@
 #!/bin/bash
 # ==============================================================================
 # 无人机自然语言控制项目 - 一键启动脚本
 # ==============================================================================
 # 功能：启动所有必需的服务（llama-server推理模型、embedding模型、FastAPI后端）
 # 用法：./start_all.sh [选项]
 # ==============================================================================
 set -e  # 遇到错误立即退出
 # 颜色定义
 RED='\033[0;31m'
 GREEN='\033[0;32m'
 YELLOW='\033[1;33m'
 BLUE='\033[0;34m'
 NC='\033[0m' # No Color
 # 默认配置（可通过环境变量覆盖）
 PROJECT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 LLAMA_SERVER_DIR="${LLAMA_SERVER_DIR:-~/llama.cpp/build/bin}"
 INFERENCE_MODEL="${INFERENCE_MODEL:-~/models/Qwen3-4B-Q5_K_M.gguf}"
 EMBEDDING_MODEL="${EMBEDDING_MODEL:-~/models/Qwen3-Embedding-4B-Q5_K_M.gguf}"
 VENV_PATH="${VENV_PATH:-${PROJECT_ROOT}/backend_service/venv}"
 LOG_DIR="${PROJECT_ROOT}/logs"
 PID_FILE="${LOG_DIR}/services.pid"
 # 端口配置
 INFERENCE_PORT=8081
 EMBEDDING_PORT=8090
 API_PORT=8000
 # 创建日志目录
 mkdir -p "${LOG_DIR}"
 # ==============================================================================
 # 辅助函数
 # ==============================================================================
 print_info() {
    echo -e "${BLUE}[INFO]${NC} $1"
 }
 print_success() {
    echo -e "${GREEN}[SUCCESS]${NC} $1"
 }
 print_warning() {
    echo -e "${YELLOW}[WARNING]${NC} $1"
 }
 print_error() {
    echo -e "${RED}[ERROR]${NC} $1"
 }
 # 检查命令是否存在
 check_command() {
    if ! command -v "$1" &> /dev/null; then
        print_error "$1 命令未找到，请先安装"
        return 1
    fi
    return 0
 }
 # 检查端口是否被占用
 check_port() {
    local port=$1
    if lsof -Pi :${port} -sTCP:LISTEN -t >/dev/null 2>&1 ; then
        return 0  # 端口被占用
    else
        return 1  # 端口空闲
    fi
 }
 # 等待服务就绪
 wait_for_service() {
    local url=$1
    local service_name=$2
    local max_attempts=30
    local attempt=0
    print_info "等待 ${service_name} 启动..."
    while [ $attempt -lt $max_attempts ]; do
        if curl -s "${url}" > /dev/null 2>&1; then
            print_success "${service_name} 已就绪"
            return 0
        fi
        attempt=$((attempt + 1))
        sleep 1
    done
    print_error "${service_name} 启动超时"
    return 1
 }
 # 停止所有服务
 stop_services() {
    print_info "正在停止所有服务..."
    if [ -f "${PID_FILE}" ]; then
        while read pid; do
            if ps -p $pid > /dev/null 2>&1; then
                print_info "停止进程 PID: $pid"
                kill $pid 2>/dev/null || true
            fi
        done < "${PID_FILE}"
        rm -f "${PID_FILE}"
    fi
    # 尝试通过端口停止服务
    for port in ${INFERENCE_PORT} ${EMBEDDING_PORT} ${API_PORT}; do
        if check_port ${port}; then
            local pid=$(lsof -ti:${port})
            if [ ! -z "$pid" ]; then
                print_info "停止占用端口 ${port} 的进程 (PID: $pid)"
                kill $pid 2>/dev/null || true
            fi
        fi
    done
    print_success "所有服务已停止"
 }
 # 清理函数（脚本退出时调用）
 cleanup() {
    if [ "$?" -ne 0 ]; then
        print_error "启动过程中发生错误，正在清理..."
    fi
    # 注意：这里不自动停止服务，让用户手动控制
 }
 trap cleanup EXIT
 # ==============================================================================
 # 主函数
 # ==============================================================================
 start_services() {
    print_info "=========================================="
    print_info "  无人机自然语言控制项目 - 服务启动"
    print_info "=========================================="
    echo ""
    # 检查必要的命令
    print_info "检查必要的命令..."
    check_command "python3" || exit 1
    check_command "curl" || exit 1
    check_command "lsof" || print_warning "lsof 未安装，将无法检查端口占用"
    echo ""
    # 检查端口占用
    print_info "检查端口占用..."
    if check_port ${INFERENCE_PORT}; then
        print_warning "端口 ${INFERENCE_PORT} 已被占用，推理模型可能已在运行"
    fi
    if check_port ${EMBEDDING_PORT}; then
        print_warning "端口 ${EMBEDDING_PORT} 已被占用，Embedding模型可能已在运行"
    fi
    if check_port ${API_PORT}; then
        print_error "端口 ${API_PORT} 已被占用，请先停止占用该端口的服务"
        exit 1
    fi
    echo ""
    # 检查llama-server（展开路径中的 ~）
    local llama_server_dir_expanded=$(eval echo "${LLAMA_SERVER_DIR}")
    local llama_server="${llama_server_dir_expanded}/llama-server"
    if [ ! -f "${llama_server}" ]; then
        print_error "llama-server 未找到: ${llama_server}"
        print_info "请设置 LLAMA_SERVER_DIR 环境变量指向正确的路径"
        print_info "当前路径: ${LLAMA_SERVER_DIR}"
        print_info "展开后路径: ${llama_server_dir_expanded}"
        exit 1
    fi
    print_success "找到 llama-server: ${llama_server}"
    echo ""
    # 检查模型文件
    local inference_model_expanded=$(eval echo "${INFERENCE_MODEL}")
    local embedding_model_expanded=$(eval echo "${EMBEDDING_MODEL}")
    if [ ! -f "${inference_model_expanded}" ]; then
        print_error "推理模型文件未找到: ${inference_model_expanded}"
        print_info "请设置 INFERENCE_MODEL 环境变量指向正确的模型路径"
        exit 1
    fi
    print_success "找到推理模型: ${inference_model_expanded}"
    if [ ! -f "${embedding_model_expanded}" ]; then
        print_error "Embedding模型文件未找到: ${embedding_model_expanded}"
        print_info "请设置 EMBEDDING_MODEL 环境变量指向正确的模型路径"
        exit 1
    fi
    print_success "找到Embedding模型: ${embedding_model_expanded}"
    echo ""
    # 检查ROS2环境
    local ros2_setup="${PROJECT_ROOT}/install/setup.bash"
    if [ ! -f "${ros2_setup}" ]; then
        print_warning "ROS2 setup文件未找到: ${ros2_setup}"
        print_warning "如果项目已与ROS2解耦，可以忽略此警告"
    else
        print_success "找到ROS2 setup文件: ${ros2_setup}"
    fi
    echo ""
    # 检查venv虚拟环境
    local venv_path_expanded=$(eval echo "${VENV_PATH}")
    print_info "检查venv虚拟环境: ${venv_path_expanded}"
    if [ ! -d "${venv_path_expanded}" ]; then
        print_error "venv虚拟环境目录不存在: ${venv_path_expanded}"
        print_info "请先创建venv环境: python3 -m venv ${venv_path_expanded}"
        print_info "然后安装依赖: ${venv_path_expanded}/bin/pip install -r backend_service/requirements.txt"
        exit 1
    fi
    if [ ! -f "${venv_path_expanded}/bin/activate" ]; then
        print_error "venv激活脚本不存在: ${venv_path_expanded}/bin/activate"
        print_error "这看起来不是一个有效的venv环境"
        exit 1
    fi
    print_success "venv虚拟环境存在: ${venv_path_expanded}"
    echo ""
    # 初始化PID文件
    > "${PID_FILE}"
    # ==========================================================================
    # 启动推理模型服务
    # ==========================================================================
    print_info "启动推理模型服务 (端口 ${INFERENCE_PORT})..."
    cd "${llama_server_dir_expanded}"
    nohup ./llama-server \
        -m "${inference_model_expanded}" \
        --port ${INFERENCE_PORT} \
        --gpu-layers 36 \
        --host 0.0.0.0 \
        -c 8192 \
        > "${LOG_DIR}/inference_model.log" 2>&1 &
    local inference_pid=$!
    echo $inference_pid >> "${PID_FILE}"
    print_success "推理模型服务已启动 (PID: $inference_pid)"
    print_info "日志文件: ${LOG_DIR}/inference_model.log"
    echo ""
    # ==========================================================================
    # 启动Embedding模型服务
    # ==========================================================================
    print_info "启动Embedding模型服务 (端口 ${EMBEDDING_PORT})..."
    nohup ./llama-server \
        -m "${embedding_model_expanded}" \
        --gpu-layers 36 \
        --port ${EMBEDDING_PORT} \
        --embeddings \
        --pooling last \
        --host 0.0.0.0 \
        > "${LOG_DIR}/embedding_model.log" 2>&1 &
    local embedding_pid=$!
    echo $embedding_pid >> "${PID_FILE}"
    print_success "Embedding模型服务已启动 (PID: $embedding_pid)"
    print_info "日志文件: ${LOG_DIR}/embedding_model.log"
    echo ""
    # ==========================================================================
    # 等待模型服务就绪
    # ==========================================================================
    print_info "等待模型服务就绪..."
    sleep 3  # 给服务一些启动时间
    # 等待推理模型服务
    if ! wait_for_service "http://localhost:${INFERENCE_PORT}/health" "推理模型服务"; then
        # 如果health端点不存在，尝试检查根路径
        if ! wait_for_service "http://localhost:${INFERENCE_PORT}/v1/models" "推理模型服务"; then
            print_warning "推理模型服务可能未完全就绪，但将继续启动"
        fi
    fi
    # 等待Embedding模型服务
    if ! wait_for_service "http://localhost:${EMBEDDING_PORT}/health" "Embedding模型服务"; then
        if ! wait_for_service "http://localhost:${EMBEDDING_PORT}/v1/models" "Embedding模型服务"; then
            print_warning "Embedding模型服务可能未完全就绪，但将继续启动"
        fi
    fi
    echo ""
    # ==========================================================================
    # 启动FastAPI后端服务
    # ==========================================================================
    print_info "启动FastAPI后端服务 (端口 ${API_PORT})..."
    cd "${PROJECT_ROOT}"
    # 激活venv虚拟环境并启动FastAPI服务
    # 使用bash -c来在新的shell中激活venv环境
    bash -c "
        # 激活ROS2环境（如果存在）
        if [ -f '${ros2_setup}' ]; then
            source '${ros2_setup}'
        fi
        # 激活venv虚拟环境
        source '${venv_path_expanded}/bin/activate' && \
        cd '${PROJECT_ROOT}/backend_service' && \
        uvicorn src.main:app --host 0.0.0.0 --port ${API_PORT}
    " > "${LOG_DIR}/fastapi.log" 2>&1 &
    local api_pid=$!
    echo $api_pid >> "${PID_FILE}"
    print_success "FastAPI服务已启动 (PID: $api_pid)"
    print_info "日志文件: ${LOG_DIR}/fastapi.log"
    echo ""
    # 等待FastAPI服务就绪
    sleep 3
    if wait_for_service "http://localhost:${API_PORT}/docs" "FastAPI服务"; then
        print_success "所有服务已成功启动！"
    else
        print_warning "FastAPI服务可能未完全就绪，请检查日志: ${LOG_DIR}/fastapi.log"
    fi
    echo ""
    # 显示服务访问信息
    print_info "=========================================="
    print_info "  服务启动完成！"
    print_info "=========================================="
    print_info "推理模型API: http://localhost:${INFERENCE_PORT}/v1"
    print_info "Embedding模型API: http://localhost:${EMBEDDING_PORT}/v1"
    print_info "FastAPI后端: http://localhost:${API_PORT}"
    print_info "API文档: http://localhost:${API_PORT}/docs"
    print_info ""
    print_info "日志文件位置:"
    print_info "  - 推理模型: ${LOG_DIR}/inference_model.log"
    print_info "  - Embedding模型: ${LOG_DIR}/embedding_model.log"
    print_info "  - FastAPI服务: ${LOG_DIR}/fastapi.log"
    print_info ""
    print_info "按 Ctrl+C 停止所有服务"
    print_info "=========================================="
    echo ""
    # 设置信号处理，确保Ctrl+C时能清理
    trap 'print_info "\n正在停止服务..."; stop_services; exit 0' INT TERM
    # 等待所有后台进程（保持脚本运行）
    print_info "所有服务正在运行中，查看日志请使用:"
    print_info "  tail -f ${LOG_DIR}/*.log"
    echo ""
    # 等待所有后台进程
    wait
 }
 # ==============================================================================
 # 脚本入口
 # ==============================================================================
 case "${1:-start}" in
    start)
        start_services
        ;;
    stop)
        stop_services
        ;;
    restart)
        stop_services
        sleep 2
        start_services
        ;;
    status)
        print_info "检查服务状态..."
        if [ -f "${PID_FILE}" ]; then
            print_info "已记录的服务进程:"
            while read pid; do
                if ps -p $pid > /dev/null 2>&1; then
                    print_success "PID $pid: 运行中"
                else
                    print_warning "PID $pid: 已停止"
                fi
            done < "${PID_FILE}"
        else
            print_info "未找到PID文件，服务可能未启动"
        fi
        echo ""
        print_info "端口占用情况:"
        for port in ${INFERENCE_PORT} ${EMBEDDING_PORT} ${API_PORT}; do
            if check_port ${port}; then
                local pid=$(lsof -ti:${port})
                print_success "端口 ${port}: 被占用 (PID: $pid)"
            else
                print_warning "端口 ${port}: 空闲"
            fi
        done
        ;;
    *)
        echo "用法: $0 {start|stop|restart|status}"
        echo ""
        echo "命令说明:"
        echo "  start   - 启动所有服务（默认）"
        echo "  stop    - 停止所有服务"
        echo "  restart - 重启所有服务"
        echo "  status  - 查看服务状态"
        echo ""
        echo "环境变量配置:"
        echo "  LLAMA_SERVER_DIR  - llama-server所在目录 (默认: ~/llama.cpp/build/bin)"
        echo "  INFERENCE_MODEL   - 推理模型路径 (默认: ~/models/gguf/Qwen/Qwen3-4B/Qwen3-4B-Q5_K_M.gguf)"
        echo "  EMBEDDING_MODEL   - Embedding模型路径 (默认: ~/models/gguf/Qwen/Qwen3-Embedding-4B/Qwen3-Embedding-4B-Q4_K_M.gguf)"
        echo "  VENV_PATH         - venv虚拟环境路径 (默认: \${PROJECT_ROOT}/backend_service/venv)"
        exit 1
        ;;
 esac
--- a/tools/api_test_qwen2.5_vl_3b.log
+++ b/tools/api_test_qwen2.5_vl_3b.log
--- a/tools/api_test_qwen2_vl_2b.log
+++ b/tools/api_test_qwen2_vl_2b.log
--- a/tools/api_test_qwen3_4b.log
+++ b/tools/api_test_qwen3_4b.log
--- a/tools/api_test_qwen3_vl_2b.log
+++ b/tools/api_test_qwen3_vl_2b.log
--- a/tools/api_test_qwen3_vl_4b.log
+++ b/tools/api_test_qwen3_vl_4b.log
--- a/tools/test_api.py
+++ b/tools/test_api.py
@@ -14,7 +14,8 @@ BASE_URL = "http://127.0.0.1:8000"
 ENDPOINT = "/generate_plan"
 # The user prompt we will send for the test
-TEST_PROMPT = "起飞"
+#TEST_PROMPT = "无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作"
 TEST_PROMPT = "已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒"
 # Log file path (will be created in the same directory as this script)
 LOG_FILE = os.path.join(os.path.dirname(__file__), "api_test.log")
--- a/tools/test_validate/api_test_log_qwen2.5_vl_3b.txt
+++ b/tools/test_validate/api_test_log_qwen2.5_vl_3b.txt
--- a/tools/test_validate/api_test_log_qwen2_vl_2b.txt
+++ b/tools/test_validate/api_test_log_qwen2_vl_2b.txt
--- a/tools/test_validate/api_test_log_qwen3_vl_2b.txt
+++ b/tools/test_validate/api_test_log_qwen3_vl_2b.txt
--- a/tools/test_validate/api_test_log_qwen3_vl_4b.txt
+++ b/tools/test_validate/api_test_log_qwen3_vl_4b.txt
--- a/tools/test_validate/batch_visualize.py
+++ b/tools/test_validate/batch_visualize.py
@@ -0,0 +1,269 @@
 #!/usr/bin/env python3
 # -*- coding: utf-8 -*-
 """
 从API测试日志中提取JSON响应并批量可视化
 """
 import json
 import os
 import re
 import logging
 import platform
 import random
 import html
 from typing import Dict, List, Tuple
 from collections import defaultdict
 # 配置日志
 logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
 )
 def sanitize_filename(text: str) -> str:
    """将文本转换为安全的文件名"""
    # 移除或替换不安全的字符
    text = re.sub(r'[<>:"/\\|?*]', '_', text)
    # 限制长度
    if len(text) > 100:
        text = text[:100]
    return text
 def _pick_zh_font():
    """选择合适的中文字体"""
    sys = platform.system()
    if sys == "Windows":
        return "Microsoft YaHei"
    elif sys == "Darwin":
        return "PingFang SC"
    else:
        return "Noto Sans CJK SC"
 def _add_nodes_and_edges(node: dict, dot, parent_id: str | None = None) -> str:
    """递归辅助函数，用于添加节点和边。"""
    try:
        from graphviz import Digraph
    except ImportError:
        logging.critical("错误：未安装graphviz库。请运行: pip install graphviz")
        return ""
    current_id = f"{id(node)}_{random.randint(1000, 9999)}"
    # 准备节点标签（HTML-like，正确换行与转义）
    name = html.escape(str(node.get('name', '')))
    ntype = html.escape(str(node.get('type', '')))
    label_parts = [f"<B>{name}</B> <FONT POINT-SIZE='10'><I>({ntype})</I></FONT>"]
    # 格式化参数显示
    params = node.get('params') or {}
    if params:
        params_lines = []
        for key, value in params.items():
            k = html.escape(str(key))
            if isinstance(value, float):
                value_str = f"{value:.2f}".rstrip('0').rstrip('.')
            else:
                value_str = str(value)
            v = html.escape(value_str)
            params_lines.append(f"{k}: {v}")
        params_text = "<BR ALIGN='LEFT'/>".join(params_lines)
        label_parts.append(f"<FONT POINT-SIZE='9' COLOR='#555555'>{params_text}</FONT>")
    node_label = f"<{'<BR/>'.join(label_parts)}>"
    # 根据类型设置节点样式和颜色
    node_type = (node.get('type') or '').lower()
    shape = 'ellipse'
    style = 'filled'
    fillcolor = '#e6e6e6'   # 默认灰色填充
    border_color = '#666666' # 默认描边色
    if node_type == 'action':
        shape = 'box'
        style = 'rounded,filled'
        fillcolor = "#cde4ff"  # 浅蓝
    elif node_type == 'condition':
        shape = 'diamond'
        style = 'filled'
        fillcolor = "#fff2cc"  # 浅黄
    elif node_type == 'sequence':
        shape = 'ellipse'
        style = 'filled'
        fillcolor = '#d5e8d4'  # 绿色
    elif node_type == 'selector':
        shape = 'ellipse'
        style = 'filled'
        fillcolor = '#ffe6cc'  # 橙色
    elif node_type == 'parallel':
        shape = 'ellipse'
        style = 'filled'
        fillcolor = '#e1d5e7'  # 紫色
    # 特别标记安全相关节点
    if node.get('name') in ['battery_above', 'gps_status', 'SafetyMonitor']:
        border_color = '#ff0000'  # 红色边框突出显示安全节点
        style = 'filled,bold'  # 加粗
    dot.node(current_id, label=node_label, shape=shape, style=style, fillcolor=fillcolor, color=border_color)
    # 连接父节点
    if parent_id:
        dot.edge(parent_id, current_id)
    # 递归处理子节点
    children = node.get("children", [])
    if not children:
        return current_id
    # 记录所有子节点的ID
    child_ids = []
    # 正确的递归连接：每个子节点都连接到当前节点
    for child in children:
        child_id = _add_nodes_and_edges(child, dot, current_id)
        child_ids.append(child_id)
    # 子节点同级排列（横向排布，更直观地表现同层）
    if len(child_ids) > 1:
        with dot.subgraph(name=f"rank_{current_id}") as s:
            s.attr(rank='same')
            for cid in child_ids:
                s.node(cid)
    return current_id
 def _visualize_pytree(node: Dict, file_path: str):
    """
    使用Graphviz将Pytree字典可视化，并保存到指定路径。
    """
    try:
        from graphviz import Digraph
    except ImportError:
        logging.critical("错误：未安装graphviz库。请运行: pip install graphviz")
        return
    fontname = _pick_zh_font()
    dot = Digraph('Pytree', comment='Drone Mission Plan')
    dot.attr(rankdir='TB', label='Drone Mission Plan', fontsize='20', fontname=fontname)
    dot.attr('node', shape='box', style='rounded,filled', fontname=fontname)
    dot.attr('edge', fontname=fontname)
    _add_nodes_and_edges(node, dot)
    try:
        # 确保输出目录存在，并避免生成 .png.png
        base_path, ext = os.path.splitext(file_path)
        render_path = base_path if ext.lower() == '.png' else file_path
        out_dir = os.path.dirname(render_path)
        if out_dir and not os.path.exists(out_dir):
            os.makedirs(out_dir, exist_ok=True)
        # 保存为 .png 文件，并自动删除源码 .gv 文件
        output_path = dot.render(render_path, format='png', cleanup=True, view=False)
        logging.info(f"✅ 可视化成功: {output_path}")
    except Exception as e:
        logging.error(f"❌ 生成可视化图形失败: {e}")
 def parse_log_file(log_file_path: str) -> Dict[str, List[Dict]]:
    """
    解析日志文件，提取原始指令和完整API响应JSON
    返回: {原始指令: [JSON响应列表]}
    """
    with open(log_file_path, 'r', encoding='utf-8') as f:
        content = f.read()
    # 按分隔符分割条目
    entries = re.split(r'={80,}', content)
    results = defaultdict(list)
    for entry in entries:
        if not entry.strip():
            continue
        # 提取原始指令
        instruction_match = re.search(r'原始指令:\s*(.+)', entry)
        if not instruction_match:
            continue
        original_instruction = instruction_match.group(1).strip()
        # 提取完整API响应JSON
        json_match = re.search(r'完整API响应:\s*\n(\{.*\})', entry, re.DOTALL)
        if not json_match:
            logging.warning(f"未找到指令 '{original_instruction}' 的JSON响应")
            continue
        json_str = json_match.group(1).strip()
        try:
            json_obj = json.loads(json_str)
            results[original_instruction].append(json_obj)
            logging.info(f"成功提取指令 '{original_instruction}' 的JSON响应")
        except json.JSONDecodeError as e:
            logging.error(f"解析指令 '{original_instruction}' 的JSON失败: {e}")
            continue
    return results
 def process_and_visualize(log_file_path: str, output_dir: str):
    """
    处理日志文件并批量可视化
    """
    # 创建输出目录
    os.makedirs(output_dir, exist_ok=True)
    # 解析日志文件
    logging.info(f"开始解析日志文件: {log_file_path}")
    instruction_responses = parse_log_file(log_file_path)
    logging.info(f"共找到 {len(instruction_responses)} 个不同的原始指令")
    # 处理每个指令的所有响应
    for instruction, responses in instruction_responses.items():
        logging.info(f"\n处理指令: {instruction} (共 {len(responses)} 个响应)")
        # 创建指令目录（使用安全的文件名）
        safe_instruction_name = sanitize_filename(instruction)
        instruction_dir = os.path.join(output_dir, safe_instruction_name)
        os.makedirs(instruction_dir, exist_ok=True)
        # 处理每个响应
        for idx, response in enumerate(responses, 1):
            try:
                # 提取root节点
                root_node = response.get('root')
                if not root_node:
                    logging.warning(f"响应 #{idx} 没有root节点，跳过")
                    continue
                # 生成文件名
                json_filename = f"response_{idx}.json"
                png_filename = f"response_{idx}.png"
                json_path = os.path.join(instruction_dir, json_filename)
                png_path = os.path.join(instruction_dir, png_filename)
                # 保存JSON文件
                with open(json_path, 'w', encoding='utf-8') as f:
                    json.dump(response, f, ensure_ascii=False, indent=2)
                logging.info(f"  保存JSON: {json_filename}")
                # 生成可视化
                _visualize_pytree(root_node, png_path)
                logging.info(f"  生成可视化: {png_filename}")
            except Exception as e:
                logging.error(f"处理响应 #{idx} 时出错: {e}")
                continue
    logging.info(f"\n✅ 所有处理完成！结果保存在: {output_dir}")
 if __name__ == "__main__":
    log_file = "/home/iscas/WorkSpace/code/DronePlanning/tools/test_validate/api_test_log.txt"
    output_directory = "/home/iscas/WorkSpace/code/DronePlanning/tools/test_validate/validation"
    process_and_visualize(log_file, output_directory)
--- a/tools/test_validate/instructions.txt
+++ b/tools/test_validate/instructions.txt
@@ -10,5 +10,6 @@
 飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆
 飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆
 起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资
-
+无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作
 已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒
--- a/tools/test_validate/test_results.csv
+++ b/tools/test_validate/test_results.csv
@@ -1,13 +1,15 @@
 instruction_index,instruction,run_number,success,attempts,response_time,plan_id,error,timestamp
-1,起飞,1,True,1,2.4630444049835205,42903026-b02b-4089-859d-aec5cfa2435e,,2025-12-03 17:09:32
+1,起飞,1,True,1,0.3463160991668701,0ffa333d-574d-453d-99cd-f8852411b7be,,2025-12-08 16:08:41
-2,起飞后移动到学生宿舍上方降落,1,True,1,10.017558574676514,86238ad2-e275-4d50-905c-175bd2f26fd0,,2025-12-03 17:09:43
+2,起飞后移动到学生宿舍上方降落,1,True,1,0.1823880672454834,46c5741f-1e51-4cbe-bd5a-1e099d0d53f5,,2025-12-08 16:08:42
-3,起飞后移动到学生宿舍上方查找蓝色的车,1,True,1,12.420023202896118,d8345bc3-b70f-41d7-b9fc-3e4898d7409e,,2025-12-03 17:09:56
+3,起飞后移动到学生宿舍上方查找蓝色的车,1,True,1,0.24654889106750488,636bfbb8-c3be-42b6-93cf-caf87ac6424c,,2025-12-08 16:08:43
-4,起飞后移动到学生宿舍上方寻找蓝色的车,1,True,1,12.864884614944458,29b5ee20-c809-4511-af08-80a85240c729,,2025-12-03 17:10:10
+4,起飞后移动到学生宿舍上方寻找蓝色的车,1,True,1,0.23946380615234375,744d6e87-5067-4f91-9f73-7f65432e1b83,,2025-12-08 16:08:45
-5,起飞后移动到学生宿舍上方检测蓝色的车,1,True,1,10.438142538070679,5e7eb8c7-287a-469a-b6c0-a4102c1b0dac,,2025-12-03 17:10:21
+5,起飞后移动到学生宿舍上方检测蓝色的车,1,True,1,3.5440704822540283,1bf6820d-0c04-4961-b624-49e9d919ac56,,2025-12-08 16:08:49
-6,飞到学生宿舍上方查找蓝色的车,1,True,1,11.751057386398315,ef3d1981-1d51-433d-b2f4-2e92838075fd,,2025-12-03 17:10:34
+6,飞到学生宿舍上方查找蓝色的车,1,False,1,3.451496124267578,,,2025-12-08 16:08:54
-7,飞到学生宿舍上方查找蓝色车辆并进行打击,1,True,1,32.890604972839355,d8fc4658-08af-4910-89c4-b029c9a5daa0,,2025-12-03 17:11:08
+7,飞到学生宿舍上方查找蓝色车辆并进行打击,1,False,1,3.321821689605713,,,2025-12-08 16:08:58
-8,起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击,1,False,1,33.2862343788147,,,2025-12-03 17:11:42
+8,起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击,1,False,1,18.552793502807617,,,2025-12-08 16:09:17
-9,起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资,1,True,1,12.312166213989258,7fbf0091-f7d3-4c3a-a6b7-4c0bfd4df66e,,2025-12-03 17:11:56
+9,起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资,1,True,1,1.5930235385894775,1ae38dbd-4e25-4a51-ac4f-8ed851fe8b1f,,2025-12-08 16:09:20
-10,飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆,1,True,1,12.204660892486572,3ae0b258-b7e4-460c-9cfe-4b224266edc4,,2025-12-03 17:12:09
+10,飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆,1,False,1,17.402809381484985,,,2025-12-08 16:09:38
-11,飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆,1,True,1,12.808414936065674,2acb84cf-c89e-460d-a4d9-8d1edb4ee69a,,2025-12-03 17:12:23
+11,飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆,1,True,1,1.269315481185913,24b02e2d-291d-4213-9e0f-acdd1165a1f1,,2025-12-08 16:09:41
-12,起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资,1,True,1,11.071707487106323,c05d46c9-1b1b-4c8d-b64b-86b76d0c4099,,2025-12-03 17:12:35
+12,起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资,1,True,1,3.885636329650879,685b1d6d-8a82-463c-ab68-051348403c89,,2025-12-08 16:09:46
 13,无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作,1,False,1,16.88854742050171,,,2025-12-08 16:10:04
 14,已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒,1,False,1,3.594463586807251,,,2025-12-08 16:10:08
--- a/tools/test_validate/test_summary.csv
+++ b/tools/test_validate/test_summary.csv
@@ -1,13 +1,15 @@
 instruction_index,instruction,total_runs,successful_runs,success_rate,avg_response_time,min_response_time,max_response_time,total_response_time
-1,起飞,1,1,100.00%,2.46s,2.46s,2.46s,2.46s
+1,起飞,1,1,100.00%,0.35s,0.35s,0.35s,0.35s
-2,起飞后移动到学生宿舍上方降落,1,1,100.00%,10.02s,10.02s,10.02s,10.02s
+2,起飞后移动到学生宿舍上方降落,1,1,100.00%,0.18s,0.18s,0.18s,0.18s
-3,起飞后移动到学生宿舍上方查找蓝色的车,1,1,100.00%,12.42s,12.42s,12.42s,12.42s
+3,起飞后移动到学生宿舍上方查找蓝色的车,1,1,100.00%,0.25s,0.25s,0.25s,0.25s
-4,起飞后移动到学生宿舍上方寻找蓝色的车,1,1,100.00%,12.86s,12.86s,12.86s,12.86s
+4,起飞后移动到学生宿舍上方寻找蓝色的车,1,1,100.00%,0.24s,0.24s,0.24s,0.24s
-5,起飞后移动到学生宿舍上方检测蓝色的车,1,1,100.00%,10.44s,10.44s,10.44s,10.44s
+5,起飞后移动到学生宿舍上方检测蓝色的车,1,1,100.00%,3.54s,3.54s,3.54s,3.54s
-6,飞到学生宿舍上方查找蓝色的车,1,1,100.00%,11.75s,11.75s,11.75s,11.75s
+6,飞到学生宿舍上方查找蓝色的车,1,0,0.00%,N/A,N/A,N/A,0.00s
-7,飞到学生宿舍上方查找蓝色车辆并进行打击,1,1,100.00%,32.89s,32.89s,32.89s,32.89s
+7,飞到学生宿舍上方查找蓝色车辆并进行打击,1,0,0.00%,N/A,N/A,N/A,0.00s
 8,起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击,1,0,0.00%,N/A,N/A,N/A,0.00s
-9,起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资,1,1,100.00%,12.31s,12.31s,12.31s,12.31s
+9,起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资,1,1,100.00%,1.59s,1.59s,1.59s,1.59s
-10,飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆,1,1,100.00%,12.20s,12.20s,12.20s,12.20s
+10,飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆,1,0,0.00%,N/A,N/A,N/A,0.00s
-11,飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆,1,1,100.00%,12.81s,12.81s,12.81s,12.81s
+11,飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆,1,1,100.00%,1.27s,1.27s,1.27s,1.27s
-12,起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资,1,1,100.00%,11.07s,11.07s,11.07s,11.07s
+12,起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资,1,1,100.00%,3.89s,3.89s,3.89s,3.89s
 13,无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作,1,0,0.00%,N/A,N/A,N/A,0.00s
 14,已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒,1,0,0.00%,N/A,N/A,N/A,0.00s
--- a/tools/test_validate/validation_qwen2.5_vl_3b/已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方降落/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方降落/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方降落/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/起飞后移动到学生宿舍上方降落/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方查找蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方查找蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方查找蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方查找蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方查找蓝色车辆并进行打击/response_1.json
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方查找蓝色车辆并进行打击/response_1.json
--- a/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方查找蓝色车辆并进行打击/response_1.png
+++ b/tools/test_validate/validation_qwen2.5_vl_3b/飞到学生宿舍上方查找蓝色车辆并进行打击/response_1.png
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞/response_1.json
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞/response_1.json
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞/response_1.png
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞/response_1.png
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.json
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.json
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.png
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.png
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.json
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.json
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.png
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.png
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方降落/response_1.json
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方降落/response_1.json
--- a/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方降落/response_1.png
+++ b/tools/test_validate/validation_qwen2_vl_2b/起飞后移动到学生宿舍上方降落/response_1.png
--- a/tools/test_validate/validation_qwen2_vl_2b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.json
+++ b/tools/test_validate/validation_qwen2_vl_2b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.json
--- a/tools/test_validate/validation_qwen2_vl_2b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.png
+++ b/tools/test_validate/validation_qwen2_vl_2b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/无人机起飞到一米高度后，先移动至搜索区，搜索并锁定任一穿黑色衣服的人，在对其进行跟踪30秒后进行打击操作/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞到学生宿舍上方搜索被困人员，并为被困人员投递救援物资/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞到学生宿舍搜索有没有被困人员，然后抛洒救援物资/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方寻找蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方搜索蓝色车辆，并进行打击/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方查找蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方检测蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方降落/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方降落/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方降落/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/起飞后移动到学生宿舍上方降落/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方搜索半径为10米区域范围内的蓝色车辆/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方搜索方圆10米范围内的蓝色车辆/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方查找蓝色的车/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方查找蓝色的车/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方查找蓝色的车/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方查找蓝色的车/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方查找蓝色车辆并进行打击/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方查找蓝色车辆并进行打击/response_1.json
--- a/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方查找蓝色车辆并进行打击/response_1.png
+++ b/tools/test_validate/validation_qwen3_vl_2b/飞到学生宿舍上方查找蓝色车辆并进行打击/response_1.png
--- a/tools/test_validate/validation_qwen3_vl_4b/已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒/response_1.json
+++ b/tools/test_validate/validation_qwen3_vl_4b/已知目标检测红色气球危险性高于蓝色气球高于绿色气球，飞往搜索区搜索并锁定危险性最高的气球，对其跟踪30秒/response_1.json
--- a/Show More
+++ b/Show More
+4171746
+4171747
+4171892