chore: 添加虚拟环境到仓库
- 添加 backend_service/venv 虚拟环境 - 包含所有Python依赖包 - 注意:虚拟环境约393MB,包含12655个文件
This commit is contained in:
@@ -0,0 +1,276 @@
|
||||
Metadata-Version: 2.4
|
||||
Name: json_repair
|
||||
Version: 0.53.0
|
||||
Summary: A package to repair broken json strings
|
||||
Author-email: Stefano Baccianella <4247706+mangiucugna@users.noreply.github.com>
|
||||
License-Expression: MIT
|
||||
Project-URL: Homepage, https://github.com/mangiucugna/json_repair/
|
||||
Project-URL: Bug Tracker, https://github.com/mangiucugna/json_repair/issues
|
||||
Project-URL: Live demo, https://mangiucugna.github.io/json_repair/
|
||||
Keywords: JSON,REPAIR,LLM,PARSER
|
||||
Classifier: Programming Language :: Python :: 3
|
||||
Classifier: Operating System :: OS Independent
|
||||
Requires-Python: >=3.10
|
||||
Description-Content-Type: text/markdown
|
||||
License-File: LICENSE
|
||||
Dynamic: license-file
|
||||
|
||||
[](https://pypi.org/project/json-repair/)
|
||||

|
||||
[](https://pypi.org/project/json-repair/)
|
||||
[](https://pepy.tech/projects/json-repair)
|
||||
[](https://github.com/sponsors/mangiucugna)
|
||||
[](https://github.com/mangiucugna/json_repair/stargazers)
|
||||
|
||||
|
||||
This simple package can be used to fix an invalid json string. To know all cases in which this package will work, check out the unit test.
|
||||
|
||||

|
||||
|
||||
---
|
||||
# Offer me a beer
|
||||
If you find this library useful, you can help me by donating toward my monthly beer budget here: https://github.com/sponsors/mangiucugna
|
||||
|
||||
---
|
||||
|
||||
# Demo
|
||||
If you are unsure if this library will fix your specific problem, or simply want your json validated online, you can visit the demo site on GitHub pages: https://mangiucugna.github.io/json_repair/
|
||||
|
||||
Or hear an [audio deepdive generate by Google's NotebookLM](https://notebooklm.google.com/notebook/05312bb3-f6f3-4e49-a99b-bd51db64520b/audio) for an introduction to the module
|
||||
|
||||
---
|
||||
|
||||
# Motivation
|
||||
Some LLMs are a bit iffy when it comes to returning well formed JSON data, sometimes they skip a parentheses and sometimes they add some words in it, because that's what an LLM does.
|
||||
Luckily, the mistakes LLMs make are simple enough to be fixed without destroying the content.
|
||||
|
||||
I searched for a lightweight python package that was able to reliably fix this problem but couldn't find any.
|
||||
|
||||
*So I wrote one*
|
||||
|
||||
# Supported use cases
|
||||
|
||||
### Fixing Syntax Errors in JSON
|
||||
|
||||
- Missing quotes, misplaced commas, unescaped characters, and incomplete key-value pairs.
|
||||
- Missing quotation marks, improperly formatted values (true, false, null), and repairs corrupted key-value structures.
|
||||
|
||||
### Repairing Malformed JSON Arrays and Objects
|
||||
|
||||
- Incomplete or broken arrays/objects by adding necessary elements (e.g., commas, brackets) or default values (null, "").
|
||||
- The library can process JSON that includes extra non-JSON characters like comments or improperly placed characters, cleaning them up while maintaining valid structure.
|
||||
|
||||
### Auto-Completion for Missing JSON Values
|
||||
|
||||
- Automatically completes missing values in JSON fields with reasonable defaults (like empty strings or null), ensuring validity.
|
||||
|
||||
# How to use
|
||||
|
||||
Install the library with pip
|
||||
|
||||
pip install json-repair
|
||||
|
||||
then you can use use it in your code like this
|
||||
|
||||
from json_repair import repair_json
|
||||
|
||||
good_json_string = repair_json(bad_json_string)
|
||||
# If the string was super broken this will return an empty string
|
||||
|
||||
|
||||
You can use this library to completely replace `json.loads()`:
|
||||
|
||||
import json_repair
|
||||
|
||||
decoded_object = json_repair.loads(json_string)
|
||||
|
||||
or just
|
||||
|
||||
import json_repair
|
||||
|
||||
decoded_object = json_repair.repair_json(json_string, return_objects=True)
|
||||
|
||||
### Avoid this antipattern
|
||||
Some users of this library adopt the following pattern:
|
||||
|
||||
obj = {}
|
||||
try:
|
||||
obj = json.loads(string)
|
||||
except json.JSONDecodeError as e:
|
||||
obj = json_repair.loads(string)
|
||||
...
|
||||
|
||||
This is wasteful because `json_repair` will already verify for you if the JSON is valid, if you still want to do that then add `skip_json_loads=True` to the call as explained the section below.
|
||||
|
||||
### Read json from a file or file descriptor
|
||||
|
||||
JSON repair provides also a drop-in replacement for `json.load()`:
|
||||
|
||||
import json_repair
|
||||
|
||||
try:
|
||||
file_descriptor = open(fname, 'rb')
|
||||
except OSError:
|
||||
...
|
||||
|
||||
with file_descriptor:
|
||||
decoded_object = json_repair.load(file_descriptor)
|
||||
|
||||
and another method to read from a file:
|
||||
|
||||
import json_repair
|
||||
|
||||
try:
|
||||
decoded_object = json_repair.from_file(json_file)
|
||||
except OSError:
|
||||
...
|
||||
except IOError:
|
||||
...
|
||||
|
||||
Keep in mind that the library will not catch any IO-related exception and those will need to be managed by you
|
||||
|
||||
### Non-Latin characters
|
||||
|
||||
When working with non-Latin characters (such as Chinese, Japanese, or Korean), you need to pass `ensure_ascii=False` to `repair_json()` in order to preserve the non-Latin characters in the output.
|
||||
|
||||
Here's an example using Chinese characters:
|
||||
|
||||
repair_json("{'test_chinese_ascii':'统一码'}")
|
||||
|
||||
will return
|
||||
|
||||
{"test_chinese_ascii": "\u7edf\u4e00\u7801"}
|
||||
|
||||
Instead passing `ensure_ascii=False`:
|
||||
|
||||
repair_json("{'test_chinese_ascii':'统一码'}", ensure_ascii=False)
|
||||
|
||||
will return
|
||||
|
||||
{"test_chinese_ascii": "统一码"}
|
||||
|
||||
### JSON dumps parameters
|
||||
|
||||
More in general, `repair_json` will accept all parameters that `json.dumps` accepts and just pass them through (for example indent)
|
||||
|
||||
### Performance considerations
|
||||
If you find this library too slow because is using `json.loads()` you can skip that by passing `skip_json_loads=True` to `repair_json`. Like:
|
||||
|
||||
from json_repair import repair_json
|
||||
|
||||
good_json_string = repair_json(bad_json_string, skip_json_loads=True)
|
||||
|
||||
I made a choice of not using any fast json library to avoid having any external dependency, so that anybody can use it regardless of their stack.
|
||||
|
||||
Some rules of thumb to use:
|
||||
- Setting `return_objects=True` will always be faster because the parser returns an object already and it doesn't have serialize that object to JSON
|
||||
- `skip_json_loads` is faster only if you 100% know that the string is not a valid JSON
|
||||
- If you are having issues with escaping pass the string as **raw** string like: `r"string with escaping\""`
|
||||
|
||||
### Use json_repair with streaming
|
||||
|
||||
Sometimes you are streaming some data and want to repair the JSON coming from it. Normally this won't work but you can pass `stream_stable` to `repair_json()` or `loads()` to make it work:
|
||||
|
||||
```
|
||||
stream_output = repair_json(stream_input, stream_stable=True)
|
||||
```
|
||||
|
||||
### Use json_repair from CLI
|
||||
|
||||
Install the library for command-line with:
|
||||
```
|
||||
pipx install json-repair
|
||||
```
|
||||
to know all options available:
|
||||
```
|
||||
$ json_repair -h
|
||||
usage: json_repair [-h] [-i] [-o TARGET] [--ensure_ascii] [--indent INDENT] [filename]
|
||||
|
||||
Repair and parse JSON files.
|
||||
|
||||
positional arguments:
|
||||
filename The JSON file to repair (if omitted, reads from stdin)
|
||||
|
||||
options:
|
||||
-h, --help show this help message and exit
|
||||
-i, --inline Replace the file inline instead of returning the output to stdout
|
||||
-o TARGET, --output TARGET
|
||||
If specified, the output will be written to TARGET filename instead of stdout
|
||||
--ensure_ascii Pass ensure_ascii=True to json.dumps()
|
||||
--indent INDENT Number of spaces for indentation (Default 2)
|
||||
```
|
||||
|
||||
## Adding to requirements
|
||||
**Please pin this library only on the major version!**
|
||||
|
||||
We use TDD and strict semantic versioning, there will be frequent updates and no breaking changes in minor and patch versions.
|
||||
To ensure that you only pin the major version of this library in your `requirements.txt`, specify the package name followed by the major version and a wildcard for minor and patch versions. For example:
|
||||
|
||||
json_repair==0.*
|
||||
|
||||
In this example, any version that starts with `0.` will be acceptable, allowing for updates on minor and patch versions.
|
||||
|
||||
---
|
||||
# How to cite
|
||||
If you are using this library in your academic work (as I know many folks are) please find the BibTex here:
|
||||
|
||||
@software{Baccianella_JSON_Repair_-_2025,
|
||||
author = "Stefano {Baccianella}",
|
||||
month = "feb",
|
||||
title = "JSON Repair - A python module to repair invalid JSON, commonly used to parse the output of LLMs",
|
||||
url = "https://github.com/mangiucugna/json_repair",
|
||||
version = "0.39.1",
|
||||
year = 2025
|
||||
}
|
||||
|
||||
Thank you for citing my work and please send me a link to the paper if you can!
|
||||
|
||||
---
|
||||
|
||||
# How it works
|
||||
This module will parse the JSON file following the BNF definition:
|
||||
|
||||
<json> ::= <primitive> | <container>
|
||||
|
||||
<primitive> ::= <number> | <string> | <boolean>
|
||||
; Where:
|
||||
; <number> is a valid real number expressed in one of a number of given formats
|
||||
; <string> is a string of valid characters enclosed in quotes
|
||||
; <boolean> is one of the literal strings 'true', 'false', or 'null' (unquoted)
|
||||
|
||||
<container> ::= <object> | <array>
|
||||
<array> ::= '[' [ <json> *(', ' <json>) ] ']' ; A sequence of JSON values separated by commas
|
||||
<object> ::= '{' [ <member> *(', ' <member>) ] '}' ; A sequence of 'members'
|
||||
<member> ::= <string> ': ' <json> ; A pair consisting of a name, and a JSON value
|
||||
|
||||
If something is wrong (a missing parentheses or quotes for example) it will use a few simple heuristics to fix the JSON string:
|
||||
- Add the missing parentheses if the parser believes that the array or object should be closed
|
||||
- Quote strings or add missing single quotes
|
||||
- Adjust whitespaces and remove line breaks
|
||||
|
||||
I am sure some corner cases will be missing, if you have examples please open an issue or even better push a PR
|
||||
|
||||
# How to develop
|
||||
Just create a virtual environment with `requirements.txt`, the setup uses [pre-commit](https://pre-commit.com/) to make sure all tests are run.
|
||||
|
||||
Make sure that the Github Actions running after pushing a new commit don't fail as well.
|
||||
|
||||
# How to release
|
||||
You will need owner access to this repository
|
||||
- Edit `pyproject.toml` and update the version number appropriately using `semver` notation
|
||||
- **Commit and push all changes to the repository before continuing or the next steps will fail**
|
||||
- Run `python -m build`
|
||||
- Create a new release in Github, making sure to tag all the issues solved and contributors. Create the new tag, same as the one in the build configuration
|
||||
- Once the release is created, a new Github Actions workflow will start to publish on Pypi, make sure it didn't fail
|
||||
---
|
||||
# Repair JSON in other programming languages
|
||||
- Typescript: https://github.com/josdejong/jsonrepair
|
||||
- Go: https://github.com/RealAlexandreAI/json-repair
|
||||
- Ruby: https://github.com/sashazykov/json-repair-rb
|
||||
- Rust: https://github.com/oramasearch/llm_json
|
||||
- R: https://github.com/cgxjdzz/jsonRepair
|
||||
- Java: https://github.com/du00cs/json-repairj
|
||||
---
|
||||
## Star History
|
||||
|
||||
[](https://star-history.com/#mangiucugna/json_repair&Date)
|
||||
Reference in New Issue
Block a user