Skip to content

fix: prevent VAD from driving user_state when turn_detection=sst#5582

Open
MdSadiqMd wants to merge 13 commits into
livekit:mainfrom
MdSadiqMd:fix/prevent-vad-from-driving-userstate
Open

fix: prevent VAD from driving user_state when turn_detection=sst#5582
MdSadiqMd wants to merge 13 commits into
livekit:mainfrom
MdSadiqMd:fix/prevent-vad-from-driving-userstate

Conversation

@MdSadiqMd

@MdSadiqMd MdSadiqMd commented Apr 28, 2026

Copy link
Copy Markdown

Closes #5580

Summary

Added user_state_source configuration to TurnHandlingOptions with three modes: "vad", "stt", and "auto" (default). Implemented _vad_drives_user_state property in AudioRecognition that encapsulates the decision logic. VAD can now run for interruption detection without affecting user_state when user_state_source="stt", solving false positives from background noise. Fully backward compatible with default "auto" mode

@CLAassistant

CLAassistant commented Apr 28, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

@miguelmoralai

Copy link
Copy Markdown

I think you should implement Option B described in the issue for a more reliable solution:

Option B — explicit configuration. Add a user_state_source: Literal["vad", "stt", "auto"] field to TurnHandlingOptions (or AgentSession). "auto" keeps current behavior. "stt" makes the VAD branch skip the _speaking writes while still running VAD inference for interruption detection. "vad" is today's default

@MdSadiqMd

Copy link
Copy Markdown
Author

I think you should implement Option B described in the issue for a more reliable solution:

Option B — explicit configuration. Add a user_state_source: Literal["vad", "stt", "auto"] field to TurnHandlingOptions (or AgentSession). "auto" keeps current behavior. "stt" makes the VAD branch skip the _speaking writes while still running VAD inference for interruption detection. "vad" is today's default

I thought Option A might be a good fit, as it just acts as a fix for the existing system, After thinking Option B makes more sense for long term, thus making the change now

@miguelmoralai

miguelmoralai commented Apr 28, 2026

Copy link
Copy Markdown

I think you should implement Option B described in the issue for a more reliable solution:

Option B — explicit configuration. Add a user_state_source: Literal["vad", "stt", "auto"] field to TurnHandlingOptions (or AgentSession). "auto" keeps current behavior. "stt" makes the VAD branch skip the _speaking writes while still running VAD inference for interruption detection. "vad" is today's default

I thought Option A might be a good fit, as it just acts as a fix for the existing system, After thinking Option B makes more sense for long term, thus making the change now

Yep but Option A (the one you implemented) has trade offs. I mean selecting vad/stt as turn_detection does not mean you also want to handle user_state through the same method. IMO seems a more logical but still wrong assumption. Ideally the user should be able to select both

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@MdSadiqMd

MdSadiqMd commented Apr 28, 2026

Copy link
Copy Markdown
Author

@miguelmoralai, can you please verify the changes

@miguelmoralai

Copy link
Copy Markdown

@claude review

@MdSadiqMd

Copy link
Copy Markdown
Author

Looks like claude is not up

cc: @miguelmoralai

@MdSadiqMd

Copy link
Copy Markdown
Author

Bump @miguelmoralai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow decoupling user_state source from VAD when STT emits speech events

3 participants