Releases: moj-analytical-services/splink
Releases · moj-analytical-services/splink
v3.9.15
What's Changed
- Document first-time developer setup, add conda option by @zmbc in #2083
- fix links by @RobinL in #2097
- Add dirty reload for much faster updates by @RobinL in #2096
- Add documentation for spellchecker and spellcheck docs by @zslade in #2025
- Add graph definition to docs by @zslade in #1979
- Minor fixes to spellchecker by @zslade in #2113
- Changing args as kwargs by @jlb52 in #2116
- Update threshold_selection_tool.json by @aalexandersson in #2120
- Fix broken link by @samnlindsay in #2098
- added tf_minimum_u_value to as_dict method by @aymonwuolanne in #2122
- Fix a bug in conda script and make minor improvements to quickstart by @zmbc in #2125
- Fix documentation Github Action for forks by @zmbc in #2126
- Add better check for whether conda is already installed by @zmbc in #2130
- Update PULL_REQUEST_TEMPLATE.md with spellchecker tick box by @zslade in #2128
- Clusters topic guide by @zslade in #1883
- Splink blog March 2024: Splink 3 update and Splink 4 development announcement by @RobinL in #2081
- Fix link to linter by @RobinL in #2121
- add probabilistic section to graphs definitions by @RossKen in #2137
- Update PULL_REQUEST_TEMPLATE.md by @zslade in #2138
- Minor bug in filtering predict table by @samnlindsay in #2152
- Update documentation on settings validation in response to code changes by @ThomasHepworth in #2149
- Remove reference to github action that will not come to be by @zslade in #2163
- Fixing spurious error messages with Databricks enable_splink by @aymonwuolanne in #2159
- Fix Splink 4 blog post link by @probjects in #2172
- Make spellcheck work cross-platform by @zmbc in #2131
- add marie curie by @RobinL in #2201
- Fix bug giving warning messages in term_frequencies.py by @DavidFrenchSG in #2204
- Fix lint by @RobinL in #2205
- Improve performance of SQL generation by using deepcopy less by @RobinL in #2212
- 3.9.15 release by @RobinL in #2213
New Contributors
- @zmbc made their first contribution in #2083
- @jlb52 made their first contribution in #2116
- @aalexandersson made their first contribution in #2120
- @probjects made their first contribution in #2172
- @DavidFrenchSG made their first contribution in #2204
Full Changelog: v3.9.14...v3.9.15
v4.0.0.dev6
What's Changed
Full Changelog: v4.0.0.dev5...v4.0.0.dev6
v4.0.0.dev5
v4.0.0.dev4
What's Changed
- Simple extension to term frequency adjustments for inexact matches by @samkodes in #2020
- Update bug report template by @ADBond in #2073
- update colab links by @RobinL in #2080
- Fix mkdocs rendering symbols in notebook code by @ADBond in #2033
- Enqueue and compute methods by @RobinL in #2086
- rm deprecated action and bash scripts by @ThomasHepworth in #2094
- Fix sqlglot>=23.0.0 issue by @RobinL in #2079
- 3.9.14 release by @RobinL in #2095
- Document first-time developer setup, add conda option by @zmbc in #2083
- fix links by @RobinL in #2097
- Add dirty reload for much faster updates by @RobinL in #2096
- Remove
_pipeline
from linker and refactor CTE pipeline by @RobinL in #2069 - Splink 4 blocking rule/blocking rule creator fixes by @RobinL in #2103
- remove deprecated and outdated code by @RobinL in #2107
- Further br fixes by @RobinL in #2106
- Fix find matches input column by @RobinL in #2109
- tf_logic_simplify by @RobinL in #2110
- Add documentation for spellchecker and spellcheck docs by @zslade in #2025
- Add graph definition to docs by @zslade in #1979
- Minor fixes to spellchecker by @zslade in #2113
- Changing args as kwargs by @jlb52 in #2116
- Update threshold_selection_tool.json by @aalexandersson in #2120
- Fix broken link by @samnlindsay in #2098
- added tf_minimum_u_value to as_dict method by @aymonwuolanne in #2122
- Stricter mypy checks by @ADBond in #2108
- Merge 3 4 2123 by @RobinL in #2124
- Fix a bug in conda script and make minor improvements to quickstart by @zmbc in #2125
- Refactor and simplify how TF adjustments are made in
_find_new_matches_mode
and_compare_two_records_mode
by @RobinL in #2111 - Faster tests: Split out tests into separate backends and use altair 5.3.0 by @RobinL in #2117
- Fix documentation Github Action for forks by @zmbc in #2126
- Add better check for whether conda is already installed by @zmbc in #2130
- Restore Settings Validation (Splink 4) by @ADBond in #2127
- Update PULL_REQUEST_TEMPLATE.md with spellchecker tick box by @zslade in #2128
- Clusters topic guide by @zslade in #1883
- Splink blog March 2024: Splink 3 update and Splink 4 development announcement by @RobinL in #2081
- Merge/splink 3 to 4 by @RobinL in #2134
- Fix link to linter by @RobinL in #2121
- add probabilistic section to graphs definitions by @RossKen in #2137
- Update PULL_REQUEST_TEMPLATE.md by @zslade in #2138
- Remove flags from
block_using_rules_sqls
logic (_find_new_matches_mode
and_compare_two_records_mode
etc.) by @RobinL in #2129 - Merge/splink 3 to 4 by @RobinL in #2148
- Process input tables simplification by @RobinL in #2143
- Type decorator by @ADBond in #2151
- Allow df_concat to be created without a linker by @RobinL in #2144
- Specify generic types by @ADBond in #2153
- switch to ruff by @RobinL in #2156
- Mark spark tests by @ADBond in #2161
- Fix bugs in calculations for true negatives when using accuracy
_from_column
functions by @RobinL in #2150 - Move missingness chart out of linker and move profile_columns to splink.exploratory by @RobinL in #2157
- Test pythons > 3.9 in CI by @ADBond in #2164
- Adding type-hints, part 1 by @ADBond in #2169
- More type hints - remaining incomplete definitions by @ADBond in #2171
- Estimate u - default value warning by @ADBond in #2181
- Refactor blocking to not need linker by @RobinL in #2180
New Contributors
- @samkodes made their first contribution in #2020
- @jlb52 made their first contribution in #2116
- @aalexandersson made their first contribution in #2120
Full Changelog: v4.0.0.dev3...v4.0.0.dev4
v3.9.14
What's Changed
- Update u probability formula and example in fellegi_sunter.md by @jacuna88 in #2036
- Splink 3: Increment minimum python version from 3.7 to 3.8 by @RobinL in #2031
- Make graph metrics public by @zslade in #2027
- Add PUDL to list of use cases by @zaneselvans in #2044
- Threshold selection tool by @samnlindsay in #2003
- Simple extension to term frequency adjustments for inexact matches by @samkodes in #2020
- Update bug report template by @ADBond in #2073
- Fix mkdocs rendering symbols in notebook code by @ADBond in #2033
- rm deprecated action and bash scripts by @ThomasHepworth in #2094
- Fix sqlglot>=23.0.0 issue by @RobinL in #2079
- 3.9.14 release by @RobinL in #2095
New Contributors
- @jacuna88 made their first contribution in #2036
- @zaneselvans made their first contribution in #2044
- @samkodes made their first contribution in #2020
Full Changelog: v3.9.13...v3.9.14
v4.0.0.dev3
update release workflow
v3.9.13
What's Changed
- Mkdocs preprocess hooks by @ADBond in #1913
- Docs workflow - build and check links on PRs by @ADBond in #1915
- minor homepage tweaks by @RossKen in #1919
- Model evaluation guide by @RossKen in #1916
- convert accuracy metrics to float by @ThomasHepworth in #1893
- Use CASE instead of bool to float casting in truth_space_table by @cinnq346 in #1928
- add NICD x Gateshead use case by @RossKen in #1931
- Update venv to use a custom name and edit errors by @ThomasHepworth in #1918
- Add comparison level validation check by @ThomasHepworth in #1926
- Update load settings and make it the defacto load logic by @ThomasHepworth in #1921
- Cast row_count as float8 in truth_table by @cinnq346 in #1936
- Trim documentation dependencies by @ADBond in #1917
- Fix docs build by @ADBond in #1953
- Implement
is_bridge
edge metric by @ADBond in #1894 - add parameter to anonymise waterfall chart by @RossKen in #1938
- Clarify naming of hide_details on waterfall chart by @RobinL in #1963
- (Try to) fix css styling for the summary/details tags in .vega-embed by @RobinL in #1966
- Accuracy chart - altair bug by @RossKen in #1965
- use .sql not .execute by @RobinL in #1952
- CI - update splink4 'to-merge' branch by @ADBond in #1984
- sqlglot.parse_one - use read keyword argument by @ADBond in #1996
- Edge evaluation guide by @RossKen in #1927
- Adding support for DBR 13.x and 14.x by @boobay in #1973
- SplinkDataFrame metadata in clustering + metrics by @ADBond in #1981
- Refine additional installs in the readme by @ThomasHepworth in #2007
compute_graph_metrics
- compute what we can withoutigraph
by @ADBond in #1982- Add a section on dependency management within Splink by @ThomasHepworth in #1985
- Spell check single files by @ThomasHepworth in #2000
- Change file name to reflect graph naming conventions by @zslade in #2015
- Relax Splink 3 Dependency Requirements - demonstrate all tests pass with latest sqlglot by @RobinL in #1998
- Fix test failures in duckdb 0.10.0 by @RobinL in #1999
- v3.9.13 release by @RobinL in #2024
New Contributors
Full Changelog: v3.9.12...v3.9.13
v3.9.12
What's Changed
- Update mkdocs.yml by @RossKen in #1858
- Add support for SaltedBlockingRule for EM training (again) by @RobinL in #1853
- Update performance.md by @DanielOX in #1865
- add initial usecases to homepage by @RossKen in #1864
- fix edit link by @RossKen in #1866
- Minor correction to docstring by @zslade in #1867
- Fixes #1872 Update deduplicate_1k_synthetic.ipynb to fix spark error by @w2o-hbrashear in #1873
- Document duckdb parallelism by @RobinL in #1877
- Ethics Blog & blog docs by @RossKen in #1849
- Initial evaluation topic guide by @RossKen in #1876
- Update 2024-01-25-ethics.md by @RossKen in #1879
- add datafirst datasets to use cases by @RossKen in #1880
- Minor tweaks to sampling by cluster size by @zslade in #1829
- fix broken link by @RossKen in #1900
- Update sampling logic for density by @zslade in #1831
- return data class instead of dictionary by @zslade in #1887
- CI link-checking + fixed links by @ADBond in #1902
- SQLAlchemy 1.x and 2.x compatibility: Use explicit transactions, remove sqlalchemy version constraint by @RobinL in #1908
- Type hinting and variable renaming (mypy conformance stage 1) by @ADBond in #1780
- 3.9.12 Release by @RobinL in #1911
New Contributors
- @DanielOX made their first contribution in #1865
- @w2o-hbrashear made their first contribution in #1873
Full Changelog: v3.9.11...v3.9.12
v3.9.11
What's Changed
- Faster duckdb train u by @RobinL in #1800
- remove reference to deleted token by @RossKen in #1802
- material by mkdocs upgrade by @RossKen in #1803
- authors yml format fix by @RossKen in #1804
- Fix broken links by @RossKen in #1805
- Cluster studio sample by density by @zslade in #1754
- Cluster metrics - node degree + cluster centralisation by @ADBond in #1806
- Parallelise duckdb resulting in e.g. 2-4x speedup on 6 core machine by @RobinL in #1796
- Remove brittleness of convergence test by @RobinL in #1798
- Enable salting for EM training by @RobinL in #1832
- Fix linting workflows by @ADBond in #1836
- Refactor of 1664: add ability to do efficient blocking based on list/array intersections by @RobinL in #1692
- changelog: note add ability to block on array columns to by @RobinL in #1847
- 3.9.11 release by @RobinL in #1848
Full Changelog: v3.9.10...v3.9.11
v3.9.10
What's Changed
- Bump aiohttp from 3.8.5 to 3.8.6 by @dependabot in #1741
- Bump urllib3 from 1.26.16 to 1.26.18 by @dependabot in #1656
- BlockingRule: Refactor to enable better iteration by @RobinL in #1701
- Fix issue with
_source_dataset_col
and_source_dataset_input_column
by @RobinL in #1731 - [MAINT] Improve speed of tests by @RobinL in #1736
- When you go to open an issue, add a link to discussions below our issue templates by @RobinL in #1746
- Finds blocking rules which return a comparison count below a given threshold by @RobinL in #1665
- Compute the cost of combinations of blocking rules by @RobinL in #1667
- Fix docs build by @ADBond in #1748
- [BUG] Delete cached tables before resetting the cache by @ThomasHepworth in #1752
- Automatically detect blocking rules for prediction and blocking rules for EM training by @RobinL in #1668
- added argument for register_udfs_automatically by @JonathanLaidler in #1774
- Make notebook tests run faster by @RobinL in #1772
- Improve speed of link only sample test by @RobinL in #1773
- Remove unused code and improve the Athena Linker by @ThomasHepworth in #1775
- Fixes to _compute_cluster_metrics by @zslade in #1763
- Add Mypy setup to
pyproject.toml
by @ADBond in #1779 - Introduce a
ColumnTreeBuilder
to aid in the construction of our column ASTs by @ThomasHepworth in #1757 - [MAINT] Revamp the settings validation steps by @ThomasHepworth in #1764
- Fix brl comp test by @ADBond in #1784
- add cs awards to readme by @RossKen in #1792
- v3.9.10 by @RossKen in #1790
- Blog - Dec 2023 by @RossKen in #1791
New Contributors
- @dependabot made their first contribution in #1741
- @JonathanLaidler made their first contribution in #1774
Full Changelog: v3.9.9...v3.9.10