bug(medcat): CU-869djf7qd Fix trainer detected name preprocessing by mart-r · Pull Request #527 · CogStack/cogstack-nlp

mart-r · 2026-06-04T09:18:37Z

This PR fixes the preprocessing of names during supervised training.

The issue was that for names which had tokens that would normally need to be skipped (e.g new lines) would not be skipped during the prepare_name call. That's because these skips are annotated in the tagger.
E.g (in an example from Adam) the annotation for:

electrolytes
were monitored

would previously be processed into:

electrolytes~\n~were~monitored

This PR uses the Pipeline.tokenizer_with_tag property instead of just the .tokenizer. The former is a tokenizer that also runs the tagger component.

With this change, the same name is correctly processed to:

electrolytes~were~monitored

There are also a few tests to this effect.

…e processor

adam-sutton-1992

Yeah nice, lucky to find that issue.

* CU-869djf7qd: Fix issue with name preparation during supervised training * CU-869djf7qd: Add test for prepare_name call with tagger * CU-869djf7qd: Moved name preparation to its own method in trainer * CU-869djf7qd: Add more specific / targetted test for name trainer name processor --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

github-actions Bot added 4 commits June 4, 2026 09:57

CU-869djf7qd: Fix issue with name preparation during supervised training

5305de4

CU-869djf7qd: Add test for prepare_name call with tagger

dacb3a2

CU-869djf7qd: Moved name preparation to its own method in trainer

6a0d82e

CU-869djf7qd: Add more specific / targetted test for name trainer nam…

fdea519

…e processor

adam-sutton-1992 approved these changes Jun 4, 2026

View reviewed changes

mart-r merged commit 5563e3b into main Jun 4, 2026
25 of 29 checks passed

mart-r deleted the bug/medcat/CU-869djf7qd-fix-trainer-detected-name-preprocessing branch June 4, 2026 09:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(medcat): CU-869djf7qd Fix trainer detected name preprocessing#527

bug(medcat): CU-869djf7qd Fix trainer detected name preprocessing#527
mart-r merged 4 commits into
mainfrom
bug/medcat/CU-869djf7qd-fix-trainer-detected-name-preprocessing

mart-r commented Jun 4, 2026

Uh oh!

adam-sutton-1992 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mart-r commented Jun 4, 2026

Uh oh!

adam-sutton-1992 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants