Generative AI.
Replace with charset_normalizer.
There is news about the chardet Python library. Itβs maintainer relicensed the LGPL project in release 7.0.0 claiming the full AI rewrite is cleanroom sufficient. https://github.com/chardet/chardet/issues/327
Here you can read about the backwards compatibility of charset_normalizer: https://charset-normalizer.readthedocs.io/en/latest/user/getstarted.html#backward-compatibility
You can fallback to chardet when replacing the import, if you dare face the implications (that’s a full AI rewrite done in a two-week pass), by importing it in an except ModuleNotFoundError block.
python-charset-normalizer is already present in Ubuntu repositories.
Of other alternatives, there’s faust-cchardet.
You can stop reading here.
How the guy describes it (entire PR description seems partially written by a language model): https://github.com/chardet/chardet/pull/322
This PR is for a ground-up, MIT-licensed rewrite of chardet. It maintains API compatibility with chardet 5.x and 6.x, but with 27x improvements to detection speed, and highly accurate support for even more encodings. It fixes numerous longstanding issues (like poor accuracy on short strings and poor performance), and is just all-around better than previous versions of chardet in every possible respect. It’s even faster, more memory efficient, and more accurate than charset-normalizer, which is something I’m particularly proud of. The test data has also been moved to a separate repo to help prevent any licensing issues having it here might have lead to.
(above emphasis mine, so that you can notice he already compares to the alternative, proud not of being supposedly more accurate than chardet 6.x but than charset-normalizer β which he has included in the benchmark suite residing in the repo. However, the reason might be that the alternative was superior long before already, I have not researched that.)
Below you can read entire commit subject history, verbatim (it might disappear soon if cloning an LGPL project will be an issue somehow for whoever pays him):
2025-12-18 Remove Python 3.9 support, add Python 3.14 support (#311) 2026-01-05 Ignore .claude 2026-01-06 Add github codespace support (#312) 2026-02-20 Pass max_bytes parameter to UniversalDetector in detect() and detect_all() (#314) 2026-02-21 Create CLAUDE.md and update AGENTS.md 2026-02-21 Fix SJISDistributionAnalysis discarding valid second-byte range >= 0x80 (#315) 2026-02-21 Add test for distribution analysis get_order() coverage of valid characters 2026-02-21 Pre-release fixes: bump to 6.0.0, fix get_charset crash, cleanup 2026-02-21 Add --encoding-era CLI flag and improve heuristic selection 2026-02-21 Add LEGACY_REGIONAL encoding era and reclassify misplaced encodings 2026-02-21 Update documentation for 6.0.0 release 2026-02-21 Fix pyright type errors in chardetect.py and test.py 2026-02-21 Add .readthedocs.yaml to fix RTD builds 2026-02-21 docs: update copyright to 2015-2026 chardet contributors 2026-02-21 docs: fix copyright start year and remove first-person reference 2026-02-21 docs: modernize usage examples and reorganize table of contents 2026-02-22 Update version to 6.0.0.post1 2026-02-22 Bump version to 6.0.1.dev0 for future dev 2026-02-23 Potential fix for code scanning alert no. 2: Workflow does not contain permissions (#320) 2026-02-23 Refactor create_language_model to cache unfiltered bigrams 2026-02-23 Add Urdu and Cornish probers that were missing 2026-02-23 Fix remove_xml_tags treating bare > as tag closer 2026-02-24 Fix CP932 misclassified as single byte 2026-02-24 Retrain with latest version of create_language_model 2026-02-24 Update AGENTS.md now that training is cached 2026-02-24 Add accuracy investigation design doc 2026-02-24 Add accuracy investigation implementation plan 2026-02-25 Fix 20 regressions in XML/markup file detection 2026-02-25 Improve detection accuracy with non-ASCII bigram analysis 2026-02-25 Add fallback encoding when no prober meets threshold 2026-02-26 Remove binary file from ASCII test data 2026-03-01 Wipe it clean 2026-02-25 Add chardet rewrite design document 2026-02-25 Add chardet rewrite implementation plan 2026-02-25 chore: scaffold project with uv, ruff, pytest, pre-commit 2026-02-25 feat: add EncodingEra IntFlag enum 2026-02-25 feat: add encoding registry with era classifications 2026-02-25 feat: add DetectionResult type for pipeline stages 2026-02-25 feat: add Stage 0 binary content detection 2026-02-25 feat: add Stage 1a BOM detection 2026-02-25 feat: add Stage 1b ASCII detection 2026-02-25 feat: add Stage 1c UTF-8 structural validation 2026-02-25 feat: add Stage 1d markup charset extraction 2026-02-25 feat: add Stage 2a byte validity filtering 2026-02-25 feat: add Stage 2b multi-byte structural probing 2026-02-25 feat: add bigram model format, loading, and training pipeline 2026-02-25 feat: add Stage 3 statistical scoring with parallel support 2026-02-25 feat: add pipeline orchestrator connecting all stages 2026-02-25 feat: add detect() and detect_all() public API 2026-02-25 feat: add UniversalDetector streaming API 2026-02-25 feat: add chardetect CLI 2026-02-25 feat: add accuracy test suite against chardet test data 2026-02-25 feat: add benchmark suite and performance regression tests 2026-02-25 fix: move BOM before binary detection, add UTF-16/32 null-byte detection 2026-02-25 feat: add encoding equivalence classes to accuracy test 2026-02-25 feat: add escape-sequence encoding detection for ISO-2022 and HZ-GB-2312 2026-02-25 feat: improve accuracy with high-byte bigram weighting and equivalence classes 2026-02-25 feat: retrain models with CulturaX data and 5000 samples per language 2026-02-25 fix: restore missing encoding models and clean up training script 2026-02-25 Retrain with max-samples set to 15000 2026-02-25 Bump up minimum overall accuracy to 79% 2026-02-25 Go nuts with ruff rules 2026-02-25 docs: add accuracy improvements design and diagnostic scripts 2026-02-25 docs: add accuracy improvements implementation plan 2026-02-25 test: replace bidirectional equivalences with directional superset classes 2026-02-25 feat: add windows-1252 fallback and binary confidence to pipeline 2026-02-25 feat: gate CJK multi-byte candidates on structural evidence 2026-02-25 perf: cache structural scores to avoid duplicate computation 2026-02-25 feat: add era-based tiebreaking for close statistical scores 2026-02-25 Revert "feat: add era-based tiebreaking for close statistical scores" 2026-02-25 refactor: update diagnostic scripts to use directional equivalences 2026-02-25 feat: add --encodings filter to train.py for targeted retraining 2026-02-25 feat: retrain koi8-r/cp866 on Russian-only to improve Cyrillic discrimination 2026-02-26 feat: retrain Western European models with expanded language coverage 2026-02-26 refactor: extract shared equivalence logic into chardet.equivalences 2026-02-26 refactor: improve train.py and add deserialize_models tests 2026-02-26 docs: add design for test parallelism, perf, and accuracy improvements 2026-02-26 docs: add implementation plan for test parallelism, perf, and accuracy 2026-02-26 build: add pytest-xdist for parallel test execution 2026-02-26 refactor: extract collect_test_files into shared scripts/utils.py 2026-02-26 test: parametrize accuracy tests per-file for xdist parallelism 2026-02-26 refactor: consolidate compare scripts into single compare_detectors.py 2026-02-26 feat: add detection profiling script 2026-02-26 perf: remove ProcessPoolExecutor from statistical scoring 2026-02-26 perf: use flat bytearray lookup tables for bigram scoring 2026-02-26 feat: add encoding equivalences for iso-8859-1, cp037/cp500, gb2312, and Central European encodings 2026-02-26 feat: add Phase 2 encoding equivalences for Hebrew, Greek, Baltic, Cyrillic, and Western European 2026-02-26 feat: unify Western European encodings into a single bidirectional equivalence group 2026-02-26 style: fix ruff FURB110 lint warnings in scripts 2026-02-26 fix: remove invalid encoding equivalences and add memory/startup benchmarking 2026-02-26 Add initial performance report for rewrite 2026-02-26 refactor: unify all detectors to subprocesses and extract benchmark script 2026-02-26 docs: update performance report with all 7 detectors 2026-02-26 feat: download test data from chardet/test-data and exclude from builds 2026-02-26 docs: add design for accuracy improvements to 95% 2026-02-26 fix: clone test data at collection time when missing 2026-02-26 docs: add implementation plan for accuracy improvements to 95% 2026-02-26 feat: add is_equivalent_detection() for base-letter equivalence checking 2026-02-26 fix: use normalized encoding names in decode and add symbol diff test 2026-02-26 feat: integrate is_equivalent_detection into test harness and diagnosis script 2026-02-26 fix: tighten CJK multi-byte gating to reduce false positives 2026-02-26 feat: add currency/euro symbol equivalence to decoded-output comparison 2026-02-26 fix: demote iso-8859-10 when no distinguishing bytes present 2026-02-26 fix: preserve confidence ordering in iso-8859-10 demotion 2026-02-26 feat: add post-decode mess detection to penalize wrong encodings 2026-02-26 fix: address mess detection review issues 2026-02-26 feat: train per-language-encoding bigram models with language detection 2026-02-26 fix: restore score_bigrams backward compat and add encoding index 2026-02-26 perf: tune detection pipeline for 95% accuracy target 2026-02-26 docs: update performance report with 95.4% accuracy 2026-02-26 fix: address code review issues and remove mess detection 2026-02-26 perf: optimize scoring loop, binary detection, and registry caching 2026-02-26 fix: address code review feedback 2026-02-27 docs: update performance report with 95.0% accuracy 2026-02-27 docs: add mypyc optimization design plan 2026-02-27 docs: add mypyc optimization implementation plan 2026-02-27 refactor: prepare models module for mypyc compilation 2026-02-27 refactor: prepare structural module for mypyc compilation 2026-02-27 refactor: prepare validity module for mypyc compilation 2026-02-27 refactor: prepare statistical module for mypyc compilation 2026-02-27 build: add hatch-mypyc configuration for optional compilation 2026-02-27 fix: resolve mypyc compilation type errors and add .so to gitignore 2026-02-27 docs: update performance report with mypyc benchmark results 2026-02-27 refactor: switch mypyc config from exclude to include list 2026-02-27 refactor: split benchmark.py into benchmark_time.py and benchmark_memory.py 2026-02-27 fix: apply decoded-output equivalence in compare_detectors.py and update performance report 2026-02-27 docs: add accuracy improvements v2 design plan 2026-02-27 docs: add accuracy improvements v2 implementation plan 2026-02-27 feat: add stdout flushing to train.py for tee compatibility 2026-02-27 feat: write training metadata YAML alongside models.bin 2026-02-27 feat: bump max_samples default to 25000 for better model quality 2026-02-27 feat: add KOI8-T promotion heuristic for Tajik text detection 2026-02-27 feat: add lead byte diversity check to CJK false-positive gating 2026-02-27 feat: retrain all 233 bigram models at 25K samples 2026-02-27 Revert "feat: retrain all 233 bigram models at 25K samples" 2026-02-27 revert: restore max_samples default to 15000 2026-02-27 docs: add parallel training design and implementation plan 2026-02-27 feat(train): add --build-workers CLI argument 2026-02-27 refactor(train): extract _build_one_model function 2026-02-27 feat(train): wire ProcessPoolExecutor into model build loop 2026-02-27 fix(train): use _load_cached_articles in workers to prevent hanging processes 2026-02-27 feat: retrain models with parallel training pipeline 2026-02-27 docs: add CLAUDE.md with project overview and conventions 2026-02-27 fix: suppress FA100 for mypyc-compiled modules and fix lint warnings 2026-02-27 refactor: address code review findings 2026-02-27 fix: match chardet 6.0.0 public API signatures for backward compatibility 2026-02-27 feat: implement should_rename_legacy, ignore_threshold, and lang_filter DeprecationWarning 2026-02-27 docs: update performance comparison with expanded ISO-to-Windows superset equivalences 2026-02-27 fix: add DeprecationWarning for unused chunk_size param and suppress FBT lint 2026-02-27 refactor: address code review findings across codebase 2026-02-27 fix: harden model loading and validate max_bytes parameter 2026-02-27 fix: address staff-level code review findings across codebase 2026-02-28 docs: update performance benchmarks with verified measurements 2026-02-28 feat: add --pure flag to benchmark scripts to detect stale mypyc .so files 2026-02-28 docs: add confusion group resolution design for accuracy improvement 2026-02-28 docs: add confusion group resolution implementation plan 2026-02-28 feat: add confusion group computation for similar encodings 2026-02-28 feat: add distinguishing byte maps with Unicode categories 2026-02-28 feat: add binary serialization for confusion group data 2026-02-28 feat: generate confusion.bin during model training 2026-02-28 feat: add lazy loading of confusion.bin at runtime 2026-02-28 feat: add Unicode category voting resolution strategy 2026-02-28 feat: add distinguishing-bigram re-scoring resolution strategy 2026-02-28 feat: integrate confusion group resolution into detection pipeline 2026-02-28 feat: add env var override for confusion strategy experimentation 2026-02-28 feat: regenerate models with full training data (15k samples) 2026-02-28 fix: resolve lint issues in confusion module and tests 2026-02-28 perf: add utf1632, utf8, and escape modules to mypyc compilation 2026-02-28 style: apply pyupgrade --py310-plus modernizations 2026-02-28 fix: replace dot product with cosine similarity in bigram scoring 2026-02-28 fix: score all candidates in single pool to fix normalization bug 2026-02-28 fix: use raw cosine scores as confidence instead of max-normalization 2026-02-28 fix: use weighted category voting instead of binary votes 2026-02-28 fix: run chardet-rewrite in isolated venv for fair comparison 2026-02-28 refactor: replace --use-encoding-era bool flag with --encoding-era choice 2026-02-28 docs: add language detection accuracy tracking design 2026-02-28 docs: add language detection accuracy implementation plan 2026-02-28 feat: add detected_language to benchmark_time.py JSON output 2026-02-28 feat: track language detection accuracy in compare_detectors.py 2026-02-28 feat: add language accuracy reporting to compare_detectors.py output 2026-02-28 fix: use shared denominator in per-encoding language accuracy table 2026-02-28 feat: add language mismatch warnings to test_accuracy.py 2026-02-28 feat: infer language from single-language encodings 2026-02-28 feat: normalize ISO 639-1 language codes in scripts and tests 2026-02-28 docs: update performance comparison with latest benchmark results 2026-02-28 docs: add language detection accuracy section to performance doc 2026-02-28 docs: add design for statistical language inference in _fill_language 2026-02-28 feat: use statistical bigram scoring for language on multi-language encodings 2026-02-28 docs: add design for UTF-8 language models with universal fallback 2026-02-28 docs: add implementation plan for UTF-8 language models 2026-02-28 feat: train UTF-8 byte bigram models for all 48 languages 2026-02-28 feat: add Tier 3 decode-to-UTF-8 language fallback in _fill_language 2026-02-28 perf: cap language scoring data to 2KB and fix mypyc BigramProfile bug 2026-02-28 docs: note previous mypyc numbers were from stale build 2026-02-28 fix: add BigramProfile.from_weighted_freq() factory classmethod 2026-02-28 fix: add len(data) to structural analysis cache key for safety 2026-02-28 refactor: remove score_bigrams wrapper and dead models parameter 2026-02-28 refactor: deduplicate _resolve_rename and _validate_max_bytes 2026-02-28 refactor: move build-time confusion functions and script tests to scripts/ 2026-02-28 refactor: minor cleanups from code review 2026-02-28 build: switch from pre-commit to prek for git hooks 2026-02-28 docs: add thread-safety design for PipelineContext + cache locking 2026-02-28 feat: add PipelineContext dataclass for per-run pipeline state 2026-02-28 refactor: thread PipelineContext through structural.py, remove module-level cache 2026-02-28 refactor: create PipelineContext in orchestrator, thread through call chain 2026-02-28 fix: add double-checked locking to model caches for thread safety 2026-02-28 fix: add double-checked locking to confusion and registry caches 2026-02-28 test: add thread-safety integration test for concurrent detect() 2026-02-28 fix: remove inner re-check in mypyc-compiled cache locks 2026-02-28 test: add cold-cache race, high-concurrency, and GIL diagnostic tests 2026-02-28 ci: add GitHub Actions workflow with free-threaded Python testing 2026-02-28 ci: add Python 3.14 and 3.14t to CI matrix 2026-02-28 ci: add coverage configuration 2026-02-28 ci: run tests under coverage and upload to Codecov 2026-02-28 docs: add design for performance doc update with thread safety 2026-02-28 docs: add implementation plan for performance doc update 2026-02-28 docs: update performance doc with thread safety results 2026-02-28 ci: add ty type checking to prek and xfail known accuracy gaps 2026-02-28 fix: UTF-32 BOM alignment, UTF-8 binary false positives, and CJK gating 2026-02-28 docs: add design for docstrings and type annotations 2026-02-28 docs: add implementation plan for docstrings and type annotations 2026-02-28 chore: enforce docstring and return-type ruff rules for src/chardet 2026-02-28 docs: add Sphinx reST docstrings to all src/chardet/ modules 2026-03-01 refactor: code review fixes β DRY constants, cleaner types, test hygiene 2026-03-01 Update performance report 2026-03-01 Remove unnecessary TYPE_CHECKING guards 2026-03-01 Remove unused strategy param from resolve_confusion_groups 2026-03-01 analyser -> analyzer 2026-03-01 Commit thread safety implementation plan for posterity 2026-03-01 Add documentation setup design doc 2026-03-01 Add documentation setup implementation plan 2026-03-01 Add docs dependency group (Sphinx, Furo, sphinx-copybutton) 2026-03-01 Add Sphinx configuration and ReadTheDocs config 2026-03-01 Add docs landing page (index.rst) 2026-03-01 Add usage documentation page 2026-03-01 Add supported encodings page (generated from registry) 2026-03-01 Add 'How It Works' documentation page 2026-03-01 Add performance benchmarks documentation page 2026-03-01 Add FAQ documentation page 2026-03-01 Add API reference page (autodoc from source docstrings) 2026-03-01 Fix copyright year and clarify FAQ version narrative 2026-03-01 Add documentation build commands to CLAUDE.md 2026-03-01 Update copyright date to reflect rewrite 2026-03-01 Unify UniversalDetector with detect()/detect_all() pipeline 2026-03-01 Add release workflow for PyPI publishing on tag push 2026-03-01 feat: add UTF-7 detection via escape sequence matching 2026-03-01 fix: tighten UTF-7 validation to reject false positives 2026-03-01 style: fix ruff lint violations in escape.py 2026-03-01 feat: add cp273 (EBCDIC German) to registry and training config 2026-03-01 feat: add hp-roman8 to registry and training config 2026-03-01 fix: accept RFC 2152 implicit termination in UTF-7 detection 2026-03-01 feat: train bigram models for cp273 and hp-roman8 2026-03-01 test: add end-to-end detection tests for UTF-7, cp273, hp-roman8 2026-03-01 docs: update supported encoding count to 84 2026-03-01 docs: add design and implementation plan for cp273, hp-roman8, UTF-7 2026-03-01 perf: skip irrelevant escape checks when only '+' is present 2026-03-01 fix: reject UTF-7 detection when data contains bytes > 0x7F 2026-03-01 docs: update README examples with accurate output comments 2026-03-01 docs: add English ASCII and UTF-8 examples to README 2026-03-01 feat: train English bigram models for improved language detection 2026-03-01 docs: add design for language metadata in encoding registry 2026-03-01 docs: add implementation plan for registry language metadata 2026-03-01 feat: add languages field to EncodingInfo and utf-7 to registry 2026-03-01 test: add tests for registry languages field and utf-7 entry 2026-03-01 refactor: derive _SINGLE_LANG_MAP from registry instead of hardcoding 2026-03-01 refactor: derive ENCODING_LANG_MAP from registry instead of hardcoding 2026-03-01 refactor: standardize escape.py language strings to ISO 639-1 codes 2026-03-01 test: use ISO 639-1 code in pipeline types test for consistency 2026-03-01 docs: add design for full Python character encoding coverage 2026-03-01 docs: add implementation plan for full Python encoding coverage 2026-03-01 refactor: flip encoding supersets and add missing aliases in registry 2026-03-01 refactor: update structural analyzer dispatch keys for renamed encodings 2026-03-01 feat: resolve registry aliases in model index, remove gb2312 hack 2026-03-01 feat: differentiate ISO-2022-JP branches in escape detector 2026-03-01 feat: update equivalences for flipped supersets and ISO-2022-JP branches 2026-03-01 fix: regenerate confusion.bin and update tests for renamed encodings 2026-03-01 revert: remove cp037 from cp500 equivalences (out of scope) 2026-03-01 fix: correct EBCDIC superset: cp1140 replaces cp037, not cp500 2026-03-01 docs: correct EBCDIC section in design doc 2026-03-01 refactor: convert REGISTRY from tuple to MappingProxyType dict 2026-03-01 refactor: address code review findings across the codebase 2026-03-01 docs: update benchmark numbers and revamp README comparison table 2026-03-01 docs: update supported encoding count to 99 across all docs 2026-03-01 fix: score_best_language early-exit bug and move training-only dict 2026-03-01 docs: report speeds as files/second instead of total time 2026-03-01 Update README some more 2026-03-01 refactor: default encoding_era to ALL and simplify should_rename_legacy 2026-03-01 Remove plans to keep PR less noisy 2026-03-01 build: derive version from git tags via hatch-vcs 2026-03-02 docs: update CLAUDE.md for hatch-vcs versioning and current architecture 2026-03-02 docs: update docs for encoding_era=ALL default, fix stale numbers, add missing pipeline stages 2026-03-02 docs: fix stale encoding_era example in README 2026-03-02 ci: restrict GITHUB_TOKEN to read-only permissions 2026-03-02 fix: address PR review findings across error handling, types, docs, and tests 2026-03-02 docs: fix stale confidence, language count, and stage count in docs 2026-03-02 fix: resolve all ruff and ty lint failures 2026-03-02 chore: update devcontainer to install prek hooks and add Ruff extension 2026-03-02 Add authors and maintainers back to pyproject.toml 2026-03-02 fix: test harness None-None support, UTF-7 false positives, and WHATWG replacement era reclassification 2026-03-02 fix: handle binary files systematically via collect_test_files() and equivalences 2026-03-02 docs: update performance benchmarks for 2179 test files (incl. binary) 2026-03-02 fix: CLI defaults to ALL encoding era, remove --legacy flag 2026-03-02 docs: comprehensive documentation update 2026-03-02 Slight tweak to index.rst 2026-03-02 docs: hide inline toctree on landing page, keep sidebar nav 2026-03-02 Tweak FAQs 2026-03-02 Tweak README 2026-03-02 Update tag trigger for PyPI releases 2026-03-02 Make readthedocs get tags 2026-03-02 Make index.rst example match one from README 2026-03-02 docs: fix code examples and benchmark numbers to match actual output 2026-03-02 tests: add streaming parity test for detect vs UniversalDetector (GH-296) 2026-03-02 fix: use post_checkout to unshallow clone for ReadTheDocs builds 2026-03-02 Add slug for codecov 2026-03-02 Add codecov badge 2026-03-02 docs: add keywords and changelog URL to project metadata 2026-03-02 fix: update cibuildwheel action from v2 to v3 2026-03-02 fix: use cibuildwheel@v3.3 (no rolling major version tag exists) 2026-03-02 fix: install uv and pass mypyc env var into cibuildwheel builds 2026-03-02 Add 7.0.0 release date to changelog 2026-03-02 docs: clarify that mypyc wheels are prebuilt on PyPI 2026-03-02 docs: add mypyc vs pure Python speed stats across all docs 2026-03-02 docs: add cross-Python-version timing benchmarks (7.0.0rc4) 2026-03-02 docs: add import timing to cross-version benchmark tables 2026-03-02 docs: extend changelog to cover all PyPI releases (1.0-2.1.1) 2026-03-02 docs: add test data coverage design plan 2026-03-02 docs: add test data coverage implementation plan 2026-03-03 Make tests/data get ignored if it is a symlink too 2026-03-03 Allow symlink tests/data to simplify updating test-data repo 2026-03-03 Remove known failure entries for deleted duplicate test files 2026-03-03 Add known failure entries for new DOS codepage test files 2026-03-03 docs: add test coverage gap analysis and improvement design 2026-03-03 docs: add test coverage gap implementation plan 2026-03-03 refactor: remove dead code branches and add pragma for invariant assertion 2026-03-03 Add tests for uncovered branches across pipeline modules 2026-03-03 Add CLI, confusion, models, and script utility tests 2026-03-03 Reach 100% test coverage across all chardet modules 2026-03-03 Update test infrastructure for ISO 639-1 language codes 2026-03-03 Update language equivalences for mutual intelligibility 2026-03-04 Update CLAUDE.md to stop a couple stupid things 2026-03-04 Fix _SINGLE_LANG_MAP missing aliases 2026-03-04 Retrain models and remove 24 resolved xfails 2026-03-04 Fix false UTF-7 detection of SHA-1 git hashes (#324) 2026-03-04 Add separate lint job back 2026-03-04 Bump coverage requirements up to 95% since we have 100% 2026-03-04 Fix precommit hook failures 2026-03-04 Use package name in cache filenames and enrich display labels 2026-03-04 Remove plans 2026-03-04 feat: add helpers for venv-less version/tag resolution and cache checking 2026-03-04 fix: use project_root parameter instead of pip_args[0] in _resolve_version_without_venv 2026-03-04 feat: skip venv creation when full cache exists for detector 2026-03-04 fix: remove unused cached_specs and add version mismatch diagnostic 2026-03-04 docs: update benchmark numbers for expanded test suite (2,510 files) 2026-03-06 Update changelog with missing 7.0.1 info 2026-03-06 fix: prevent false UTF-7 detection of ASCII with ++ or +word (#332) (#335) 2026-03-06 fix: eliminate 0.5s startup cost by computing model norms during loading (#333) 2026-03-06 feat: add 1st detect and max columns to compare_detectors output 2026-03-06 fix: correct import time measurement when --pure is used 2026-03-06 refactor: replace unwieldy tuple returns with _TimingResult dataclass 2026-03-06 fix: display startup times in milliseconds to avoid rounding to 0.000s 2026-03-06 Fix issue where language was not returned by charset_normalizer 2026-03-06 fix: generalize warmup comment to apply to all detector types 2026-03-06 feat: add --no-memory flag and "time to 1st result" column to compare_detectors 2026-03-06 perf: use struct.iter_unpack for bulk model parsing 2026-03-06 fix: detect truncated model data with iter_unpack and extract parser 2026-03-06 fix: normalize language names to ISO 639-1 for cross-detector comparison 2026-03-06 docs: update benchmark numbers for chardet 7.0.2 2026-03-06 test: add coverage for struct.error and UnicodeDecodeError paths 2026-03-06 Update incorrect date in LICENSE 2026-03-06 fix: restore _MODEL_NORMS in mock_models_bin fixture to prevent test slowdown 2026-03-07 feat: pin test-data cloning to chardet release version tags 2026-03-07 test: add unit tests for get_data_dir cache logic 2026-03-07 docs: add test-data tagging step to versioning notes in CLAUDE.md 2026-03-07 fix: remove v prefix from test-data tags to match release tag format 2026-03-07 Fix xpasses 2026-03-08 Fix undocumented encoding name changes (#338) 2026-03-09 docs: add 7.0.2 changelog entry 2026-03-09 Update benchmark numbers and fix subprocess import error 2026-03-10 Improve ISO-2022-JP family escape sequence detection 2026-03-10 docs: add design spec for switching to Python codec canonical names 2026-03-10 docs: add implementation plan for switching to Python codec canonical names 2026-03-10 fix: patch chardet.__version__ directly in test_utils 2026-03-10 fix: build mypyc wheel explicitly in compare_detectors --mypyc 2026-03-10 refactor: replace _LEGACY_NAMES with _COMPAT_NAMES (codec name keys) 2026-03-10 feat: add compat_names and prefer_superset parameters to detect() 2026-03-10 feat: add compat_names and prefer_superset to UniversalDetector 2026-03-10 refactor: rename apply_legacy_rename to apply_preferred_superset 2026-03-10 refactor: switch pipeline and equivalences to Python codec names 2026-03-10 test: update all assertions for Python codec name internals 2026-03-10 refactor: update registry EncodingName and train.py to use Python codec names 2026-03-10 refactor: remove redundant EncodingInfo.python_codec field 2026-03-11 refactor: clean up post-codec-name-switch and document new API parameters 2026-03-11 refactor: replace manual caches with functools.cache, consolidate normalization 2026-03-11 docs: update accuracy to 98.2%, use prefer_superset for accuracy eval 2026-03-11 test: replace old benchmarks with three-layer performance regression suite 2026-03-11 test: tune benchmark thresholds based on measured baselines 2026-03-11 docs: add -n auto to pytest commands, fix markdown spacing 2026-03-11 refactor: deduplicate code and derive inverse dicts from single sources 2026-03-11 fix: add chardet.universaldetector stub for 6.x backward compatibility 2026-03-11 feat: detect PEP 263 encoding declarations (# -*- coding: ... -*-) 2026-03-11 chore: remove docs/plans/ design and plan files 2026-03-11 docs: add threaded benchmark design spec 2026-03-11 docs: address spec review feedback for threaded benchmark 2026-03-11 docs: add threaded benchmark implementation plan 2026-03-11 feat: add --threads option to benchmark_time.py for concurrent detection 2026-03-11 feat: add --threads passthrough to compare_detectors.py 2026-03-11 fix: only include threads in timing cache keys, not memory cache keys 2026-03-11 fix: add --threads validation and docstring updates in compare_detectors.py 2026-03-11 Remove plans that got thrown in other directory 2026-03-11 docs: update thread scaling table with GIL vs free-threaded benchmarks 2026-03-11 fix: adjust benchmark speedup threshold for pure Python vs mypyc 2026-03-11 test: achieve 100% test coverage 2026-03-11 refactor: use pathlib.Path instead of str for filesystem paths in scripts 2026-03-11 perf: add early-exit check in PEP 263 detection for non-Python data 2026-03-11 docs: expand changelog with contributor attribution, PR links, and charade history