Thanks for the help so far!
Problem
I am trying to use Medcat to annotate main clinically relevant findings in pathology reports using a large pretrained SNOMED International CDB in Medcattrainer. My goal is not to annotate every SNOMED concept in the text. I am mainly interested in the main clinically relevant findings, for example histological type, grade, ER/PR/HER2/Ki-67, margins, lymphovascular invasion (as highlighted in blue in the screenshot). Currently, each document is automatically pre-annotated with many concepts such as ("material", "size", "cells", "protocol", etc.) that are not relevant for my use case. However, I still need to manually mark as "terminate" or "incorrect" before I can submit. This makes the annotation process quite slow. I guess that my use case is also quite different from the intended use of Medcat, but I am wondering if there is a better way.
I have a curated whitelist of ~200 clinically relevant CUIs per organ. The CUI File project filter restricts concept lookup but does not restrict automatic pre-annotation, so irrelevant concepts outside the whitelist are still automatically recognised in grey.
Questions
- Is there a supported way to disable automatic NER pre-annotation entirely, while keeping the full CDB available for manual concept lookup?
- Alternatively, can I restrict automatic pre-annotation to only the CUIs in the project
CUI File, while keeping the full CDB available for manual lookup when concepts are missing from the whitelist?
- Or would it be better to build a small CDB containing only the ~200 whitelisted concepts? If so, can I still manually search and annotate concepts outside that CDB using the CDB search filter, or does every concept need to be fully added including CUI, name, and synonyms — which also seems inefficient?
- Or is the best approach to continue with the full CDB and terminate unwanted concepts as negative training examples?
Thanks for the help so far!
Problem
I am trying to use Medcat to annotate main clinically relevant findings in pathology reports using a large pretrained SNOMED International CDB in Medcattrainer. My goal is not to annotate every SNOMED concept in the text. I am mainly interested in the main clinically relevant findings, for example histological type, grade, ER/PR/HER2/Ki-67, margins, lymphovascular invasion (as highlighted in blue in the screenshot). Currently, each document is automatically pre-annotated with many concepts such as ("material", "size", "cells", "protocol", etc.) that are not relevant for my use case. However, I still need to manually mark as "terminate" or "incorrect" before I can submit. This makes the annotation process quite slow. I guess that my use case is also quite different from the intended use of Medcat, but I am wondering if there is a better way.
I have a curated whitelist of ~200 clinically relevant CUIs per organ. The
CUI Fileproject filter restricts concept lookup but does not restrict automatic pre-annotation, so irrelevant concepts outside the whitelist are still automatically recognised in grey.Questions
CUI File, while keeping the full CDB available for manual lookup when concepts are missing from the whitelist?