344 lines
14 KiB
Plaintext
344 lines
14 KiB
Plaintext
Metadata-Version: 2.4
|
||
Name: mmh3
|
||
Version: 5.2.0
|
||
Summary: Python extension for MurmurHash (MurmurHash3), a set of fast and robust hash functions.
|
||
Author-email: Hajime Senuma <hajime.senuma@gmail.com>
|
||
License: MIT License
|
||
|
||
Copyright (c) 2011-2025 Hajime Senuma
|
||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||
of this software and associated documentation files (the "Software"), to deal
|
||
in the Software without restriction, including without limitation the rights
|
||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||
copies of the Software, and to permit persons to whom the Software is
|
||
furnished to do so, subject to the following conditions:
|
||
|
||
The above copyright notice and this permission notice shall be included in all
|
||
copies or substantial portions of the Software.
|
||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||
SOFTWARE.
|
||
Project-URL: Homepage, https://pypi.org/project/mmh3/
|
||
Project-URL: Repository, https://github.com/hajimes/mmh3
|
||
Project-URL: Changelog, https://github.com/hajimes/mmh3/blob/master/CHANGELOG.md
|
||
Project-URL: Bug Tracker, https://github.com/hajimes/mmh3/issues
|
||
Keywords: utility,hash,MurmurHash
|
||
Classifier: Development Status :: 5 - Production/Stable
|
||
Classifier: Intended Audience :: Developers
|
||
Classifier: License :: OSI Approved :: MIT License
|
||
Classifier: Programming Language :: Python :: 3
|
||
Classifier: Programming Language :: Python :: 3.9
|
||
Classifier: Programming Language :: Python :: 3.10
|
||
Classifier: Programming Language :: Python :: 3.11
|
||
Classifier: Programming Language :: Python :: 3.12
|
||
Classifier: Programming Language :: Python :: 3.13
|
||
Classifier: Programming Language :: Python :: 3.14
|
||
Classifier: Programming Language :: Python :: Free Threading :: 2 - Beta
|
||
Classifier: Topic :: Software Development :: Libraries
|
||
Classifier: Topic :: Utilities
|
||
Requires-Python: >=3.9
|
||
Description-Content-Type: text/markdown
|
||
License-File: LICENSE
|
||
Provides-Extra: test
|
||
Requires-Dist: pytest==8.4.1; extra == "test"
|
||
Requires-Dist: pytest-sugar==1.0.0; extra == "test"
|
||
Provides-Extra: lint
|
||
Requires-Dist: black==25.1.0; extra == "lint"
|
||
Requires-Dist: clang-format==20.1.8; extra == "lint"
|
||
Requires-Dist: isort==6.0.1; extra == "lint"
|
||
Requires-Dist: pylint==3.3.7; extra == "lint"
|
||
Provides-Extra: type
|
||
Requires-Dist: mypy==1.17.0; extra == "type"
|
||
Provides-Extra: docs
|
||
Requires-Dist: myst-parser==4.0.1; extra == "docs"
|
||
Requires-Dist: shibuya==2025.7.24; extra == "docs"
|
||
Requires-Dist: sphinx==8.2.3; extra == "docs"
|
||
Requires-Dist: sphinx-copybutton==0.5.2; extra == "docs"
|
||
Provides-Extra: benchmark
|
||
Requires-Dist: pymmh3==0.0.5; extra == "benchmark"
|
||
Requires-Dist: pyperf==2.9.0; extra == "benchmark"
|
||
Requires-Dist: xxhash==3.5.0; extra == "benchmark"
|
||
Provides-Extra: plot
|
||
Requires-Dist: matplotlib==3.10.3; extra == "plot"
|
||
Requires-Dist: pandas==2.3.1; extra == "plot"
|
||
Dynamic: license-file
|
||
|
||
# mmh3
|
||
|
||
[](https://mmh3.readthedocs.io/en/stable/)
|
||
[](https://github.com/hajimes/mmh3/actions?query=workflow%3ASuper-Linter+branch%3Amaster)
|
||
[](https://github.com/hajimes/mmh3/actions/workflows/build.yml?branch=master)
|
||
[](https://pypi.org/project/mmh3/)
|
||
[](https://pypi.org/project/mmh3/)
|
||
[](https://opensource.org/license/mit/)
|
||
[](https://pepy.tech/projects/mmh3?versions=*%2C5.*%2C4.*%2C3.*%2C2.*)
|
||
[](https://pepy.tech/projects/mmh3?versions=*%2C5.*%2C4.*%2C3.*%2C2.*)
|
||
[](https://doi.org/10.21105/joss.06124)
|
||
|
||
`mmh3` is a Python extension for
|
||
[MurmurHash (MurmurHash3)](https://en.wikipedia.org/wiki/MurmurHash), a set of
|
||
fast and robust non-cryptographic hash functions invented by Austin Appleby.
|
||
|
||
By combining `mmh3` with probabilistic techniques like
|
||
[Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter),
|
||
[MinHash](https://en.wikipedia.org/wiki/MinHash), and
|
||
[feature hashing](https://en.wikipedia.org/wiki/Feature_hashing), you can
|
||
develop high-performance systems in fields such as data mining, machine
|
||
learning, and natural language processing.
|
||
|
||
Another popular use of `mmh3` is to
|
||
[calculate favicon hashes](https://gist.github.com/yehgdotnet/b9dfc618108d2f05845c4d8e28c5fc6a),
|
||
which are utilized by [Shodan](https://www.shodan.io), the world's first IoT
|
||
search engine.
|
||
|
||
This page provides a quick start guide. For more comprehensive information,
|
||
please refer to the [documentation](https://mmh3.readthedocs.io/en/stable/).
|
||
|
||
## Installation
|
||
|
||
```shell
|
||
pip install mmh3
|
||
```
|
||
|
||
## Usage
|
||
|
||
### Basic usage
|
||
|
||
```pycon
|
||
>>> import mmh3
|
||
>>> mmh3.hash(b"foo") # returns a 32-bit signed int
|
||
-156908512
|
||
>>> mmh3.hash("foo") # accepts str (UTF-8 encoded)
|
||
-156908512
|
||
>>> mmh3.hash(b"foo", 42) # uses 42 as the seed
|
||
-1322301282
|
||
>>> mmh3.hash(b"foo", 0, False) # returns a 32-bit unsigned int
|
||
4138058784
|
||
```
|
||
|
||
`mmh3.mmh3_x64_128_digest()`, introduced in version 5.0.0, efficienlty hashes
|
||
buffer objects that implement the buffer protocol
|
||
([PEP 688](https://peps.python.org/pep-0688/)) without internal memory copying.
|
||
The function returns a `bytes` object of 16 bytes (128 bits). It is
|
||
particularly suited for hashing large memory views, such as
|
||
`bytearray`, `memoryview`, and `numpy.ndarray`, and performs faster than
|
||
the 32-bit variants like `hash()` on 64-bit machines.
|
||
|
||
```pycon
|
||
>>> mmh3.mmh3_x64_128_digest(numpy.random.rand(100))
|
||
b'\x8c\xee\xc6z\xa9\xfeR\xe8o\x9a\x9b\x17u\xbe\xdc\xee'
|
||
```
|
||
|
||
Various alternatives are available, offering different return types (e.g.,
|
||
signed integers, tuples of unsigned integers) and optimized for different
|
||
architectures. For a comprehensive list of functions, refer to the
|
||
[API Reference](https://mmh3.readthedocs.io/en/stable/api.html).
|
||
|
||
### `hashlib`-style hashers
|
||
|
||
`mmh3` implements hasher objects with interfaces similar to those in `hashlib`
|
||
from the standard library, although they are still experimental. See
|
||
[Hasher Classes](https://mmh3.readthedocs.io/en/stable/api.html#hasher-classes)
|
||
in the API Reference for more information.
|
||
|
||
## Changelog
|
||
|
||
See [Changelog (latest version)](https://mmh3.readthedocs.io/en/latest/changelog.html)
|
||
for the complete changelog.
|
||
|
||
### [5.2.0] - 2025-07-29
|
||
|
||
#### Added
|
||
|
||
- Add support for Python 3.14, including 3.14t (no-GIL) wheels. However, thread
|
||
safety for the no-GIL variant is not fully tested yet. Please report any
|
||
issues you encounter ([#134](https://github.com/hajimes/mmh3/pull/134),
|
||
[#136](https://github.com/hajimes/mmh3/pull/136)).
|
||
- Add support for Android (Python 3.13 only) and iOS (Python 3.13 and 3.14) wheels,
|
||
enabled by the major version update of
|
||
[cibuildwheel](https://github.com/pypa/cibuildwheel)
|
||
([#135](https://github.com/hajimes/mmh3/pull/135)).
|
||
|
||
### [5.1.0] - 2025-01-25
|
||
|
||
#### Added
|
||
|
||
- Improve the performance of `hash128()`, `hash64()`, and `hash_bytes()` by
|
||
using
|
||
[METH_FASTCALL](https://docs.python.org/3/c-api/structures.html#c.METH_FASTCALL),
|
||
reducing the overhead of function calls
|
||
([#116](https://github.com/hajimes/mmh3/pull/116)).
|
||
- Add the software paper for this library
|
||
([doi:10.21105/joss.06124](https://doi.org/10.21105/joss.06124)), following
|
||
its publication in the
|
||
[_Journal of Open Source Software_](https://joss.theoj.org)
|
||
([#118](https://github.com/hajimes/mmh3/pull/118)).
|
||
|
||
#### Removed
|
||
|
||
- Drop support for Python 3.8, as it has reached the end of life on 2024-10-07
|
||
([#117](https://github.com/hajimes/mmh3/pull/117)).
|
||
|
||
### [5.0.1] - 2024-09-22
|
||
|
||
#### Fixed
|
||
|
||
- Fix the issue that the package cannot be built from the source distribution
|
||
([#90](https://github.com/hajimes/mmh3/issues/90)).
|
||
|
||
## License
|
||
|
||
[MIT](https://github.com/hajimes/mmh3/blob/master/LICENSE), unless otherwise
|
||
noted within a file.
|
||
|
||
## Frequently Asked Questions
|
||
|
||
### Different results from other MurmurHash3-based libraries
|
||
|
||
By default, `mmh3` returns **signed** values for the 32-bit and 64-bit versions
|
||
and **unsigned** values for `hash128` due to historical reasons. To get the
|
||
desired result, use the `signed` keyword argument.
|
||
|
||
Starting from version 4.0.0, **`mmh3` is endian-neutral**, meaning that its
|
||
hash functions return the same values on big-endian platforms as they do on
|
||
little-endian ones. In contrast, the original C++ library by Appleby is
|
||
endian-sensitive. If you need results that comply with the original library on
|
||
big-endian systems, please use version 3.\*.
|
||
|
||
For compatibility with [Google Guava (Java)](https://github.com/google/guava),
|
||
see
|
||
<https://stackoverflow.com/questions/29932956/murmur3-hash-different-result-between-python-and-java-implementation>.
|
||
|
||
For compatibility with
|
||
[murmur3 (Go)](https://pkg.go.dev/github.com/spaolacci/murmur3), see
|
||
<https://github.com/hajimes/mmh3/issues/46>.
|
||
|
||
### Handling errors with negative seeds
|
||
|
||
From the version 5.0.0, `mmh3` functions accept only **unsigned** 32-bit integer
|
||
seeds to enable faster type-checking and conversion. However, this change may
|
||
cause issues if you need to calculate hash values using negative seeds within
|
||
the range of signed 32-bit integers. For instance,
|
||
[Telegram-iOS](https://github.com/TelegramMessenger/Telegram-iOS) uses
|
||
`-137723950` as a hard-coded seed (bitwise equivalent to `4157243346`). To
|
||
handle such cases, you can convert a signed 32-bit integer to its unsigned
|
||
equivalent by applying a bitwise AND operation with `0xffffffff`. Here's an
|
||
example:
|
||
|
||
```pycon
|
||
>>> mmh3.hash(b"quux", 4294967295)
|
||
258499980
|
||
>>> d = -1
|
||
>>> mmh3.hash(b"quux", d & 0xffffffff)
|
||
258499980
|
||
```
|
||
|
||
Alternatively, if the seed is hard-coded (as in the Telegram-iOS case), you can
|
||
precompute the unsigned value for simplicity.
|
||
|
||
## Contributing Guidelines
|
||
|
||
See [Contributing](https://mmh3.readthedocs.io/en/stable/CONTRIBUTING.html).
|
||
|
||
## Authors
|
||
|
||
MurmurHash3 was originally developed by Austin Appleby and distributed under
|
||
public domain
|
||
[https://github.com/aappleby/smhasher](https://github.com/aappleby/smhasher).
|
||
|
||
Ported and modified for Python by Hajime Senuma.
|
||
|
||
## External Tutorials
|
||
|
||
### High-performance computing
|
||
|
||
The following textbooks and tutorials are great resources for learning how to
|
||
use `mmh3` (and other hash algorithms in general) for high-performance computing.
|
||
|
||
- Chapter 11: _Using Less Ram_ in Micha Gorelick and Ian Ozsvald. 2014. _High
|
||
Performance Python: Practical Performant Programming for Humans_. O'Reilly
|
||
Media. [ISBN: 978-1-4493-6159-4](https://www.amazon.com/dp/1449361595).
|
||
- 2nd edition of the above (2020).
|
||
[ISBN: 978-1492055020](https://www.amazon.com/dp/1492055026).
|
||
- Max Burstein. February 2, 2013.
|
||
_[Creating a Simple Bloom Filter](http://www.maxburstein.com/blog/creating-a-simple-bloom-filter/)_.
|
||
- Duke University. April 14, 2016.
|
||
_[Efficient storage of data in memory](http://people.duke.edu/~ccc14/sta-663-2016/20B_Big_Data_Structures.html)_.
|
||
- Bugra Akyildiz. August 24, 2016.
|
||
_[A Gentle Introduction to Bloom Filter](https://www.kdnuggets.com/2016/08/gentle-introduction-bloom-filter.html)_.
|
||
KDnuggets.
|
||
|
||
### Internet of things
|
||
|
||
[Shodan](https://www.shodan.io), the world's first
|
||
[IoT](https://en.wikipedia.org/wiki/Internet_of_things) search engine, uses
|
||
MurmurHash3 hash values for [favicons](https://en.wikipedia.org/wiki/Favicon)
|
||
(icons associated with web pages). [ZoomEye](https://www.zoomeye.org) follows
|
||
Shodan's convention.
|
||
[Calculating these values with mmh3](https://gist.github.com/yehgdotnet/b9dfc618108d2f05845c4d8e28c5fc6a)
|
||
is useful for OSINT and cybersecurity activities.
|
||
|
||
- Jan Kopriva. April 19, 2021.
|
||
_[Hunting phishing websites with favicon hashes](https://isc.sans.edu/diary/Hunting+phishing+websites+with+favicon+hashes/27326)_.
|
||
SANS Internet Storm Center.
|
||
- Nikhil Panwar. May 2, 2022.
|
||
_[Using Favicons to Discover Phishing & Brand Impersonation Websites](https://bolster.ai/blog/how-to-use-favicons-to-find-phishing-websites)_.
|
||
Bolster.
|
||
- Faradaysec. July 25, 2022.
|
||
_[Understanding Spring4Shell: How used is it?](https://faradaysec.com/understanding-spring4shell/)_.
|
||
Faraday Security.
|
||
- Debjeet. August 2, 2022.
|
||
_[How To Find Assets Using Favicon Hashes](https://payatu.com/blog/favicon-hash/)_.
|
||
Payatu.
|
||
|
||
## How to Cite This Library
|
||
|
||
If you use this library in your research, it would be appreciated if you could
|
||
cite the following paper published in the
|
||
[_Journal of Open Source Software_](https://joss.theoj.org):
|
||
|
||
Hajime Senuma. 2025.
|
||
[mmh3: A Python extension for MurmurHash3](https://doi.org/10.21105/joss.06124).
|
||
_Journal of Open Source Software_, 10(105):6124.
|
||
|
||
In BibTeX format:
|
||
|
||
```tex
|
||
@article{senumaMmh3PythonExtension2025,
|
||
title = {{mmh3}: A {Python} extension for {MurmurHash3}},
|
||
author = {Senuma, Hajime},
|
||
year = {2025},
|
||
month = jan,
|
||
journal = {Journal of Open Source Software},
|
||
volume = {10},
|
||
number = {105},
|
||
pages = {6124},
|
||
issn = {2475-9066},
|
||
doi = {10.21105/joss.06124},
|
||
copyright = {http://creativecommons.org/licenses/by/4.0/}
|
||
}
|
||
```
|
||
|
||
## Related Libraries
|
||
|
||
- <https://github.com/wc-duck/pymmh3>: mmh3 in pure python (Fredrik Kihlander
|
||
and Swapnil Gusani)
|
||
- <https://github.com/escherba/python-cityhash>: Python bindings for CityHash
|
||
(Eugene Scherba)
|
||
- <https://github.com/veelion/python-farmhash>: Python bindings for FarmHash
|
||
(Veelion Chong)
|
||
- <https://github.com/escherba/python-metrohash>: Python bindings for MetroHash
|
||
(Eugene Scherba)
|
||
- <https://github.com/ifduyue/python-xxhash>: Python bindings for xxHash (Yue
|
||
Du)
|
||
|
||
[5.2.0]: https://github.com/hajimes/mmh3/compare/v5.1.0...v5.2.0
|
||
[5.1.0]: https://github.com/hajimes/mmh3/compare/v5.0.1...v5.1.0
|
||
[5.0.1]: https://github.com/hajimes/mmh3/compare/v5.0.0...v5.0.1
|