Skip to content

fix: correct suffix boundary lookup for prefixed last names (#100)#179

Merged
derek73 merged 3 commits into
masterfrom
fix/prefix-suffix-lookup-issue-100
Jun 29, 2026
Merged

fix: correct suffix boundary lookup for prefixed last names (#100)#179
derek73 merged 3 commits into
masterfrom
fix/prefix-suffix-lookup-issue-100

Conversation

@derek73

@derek73 derek73 commented Jun 29, 2026

Copy link
Copy Markdown
Owner

Summary

  • Fixes issue Strange parsing of name w lastname prefix and title before and after #100: "dr Vincent van Gogh dr" was producing a corrupted middle name (" dr Vincent van") because pieces.index(stop_at) searched from position 0, matching the leading "dr" (a title that is also a suffix acronym) instead of the trailing one
  • One-line fix: adds the i + 1 start argument to pieces.index(stop_at, i + 1), making it consistent with the sibling next_prefix lookup just above it that was already correct
  • Adds a regression guard for MemoryError for a name with a lot of prefixes #108 (many repeated prefixes must not exhaust memory); that blow-up is already fixed by a prior refactor — the test ensures it cannot silently come back

Test Plan

  • test_title_before_and_after_prefixed_last_name — asserts the agreed output for Strange parsing of name w lastname prefix and title before and after #100: title="dr", first="Vincent", middle="", last="van Gogh", suffix="dr"
  • test_many_repeated_prefixes_does_not_blow_up — parses "Jan van der … Berg" (30× prefix) without hanging or raising
  • Full suite: 821 passed, 22 xfailed, 0 failed (up from 817 baseline)
  • No regressions to test_prefix_is_first_name (Van Johnson), test_portuguese_prefixes, test_portuguese_dos, test_prefix_before_two_part_last_name_with_acronym_suffix
  • mypy and ruff clean

Out of scope

Issues #121 and #132 were evaluated and excluded — they are irreducible ambiguities that collide with real names, not corruption bugs.

🤖 Generated with Claude Code

derek73 and others added 3 commits June 29, 2026 13:40
The prefix-joining loop located the suffix stop boundary with a
value-based pieces.index() that searched from position 0. When a token
value repeated (a trailing title that is also a suffix acronym, e.g.
the second 'dr' in 'dr Vincent van Gogh dr'), it matched the leading
occurrence, producing an empty slice that duplicated pieces and
corrupted the middle name. Constrain the lookup to start at i + 1,
consistent with the sibling next_prefix lookup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix inline comment in join_on_conjunctions: clarify that filter()
  finds the value in pieces[i+1:] but index() searches from 0 by
  default, and drop the misleading "title" framing (the token only
  needs to satisfy is_suffix, not is_title)
- Add test for two-word prefix collision ("van der") — different loop
  iteration count than the single-word case
- Add test with a genuine middle name alongside the repeated token,
  since the pre-fix bug corrupted the middle field specifically
- Add @pytest.mark.timeout(2) to the #108 guard so the timeout is
  enforced locally and in CI, not just by CI job limits
- Assert hn.last contains "Berg" in the #108 guard to catch silent
  last-name corruption
- Add pytest-timeout dev dependency
- Resolve pre-existing stash conflict in docs/resources.rst (keep upstream)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@derek73 derek73 self-assigned this Jun 29, 2026
@derek73 derek73 added this to the v1.3.0 milestone Jun 29, 2026
@derek73 derek73 merged commit 8cb62a9 into master Jun 29, 2026
8 checks passed
@derek73 derek73 deleted the fix/prefix-suffix-lookup-issue-100 branch June 29, 2026 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant