Skip to content

Back out "Re-apply D101260086: Android unified error reporting"#19201

Merged
kirklandsign merged 1 commit intomainfrom
android/backout-unified-error-reporting
Apr 29, 2026
Merged

Back out "Re-apply D101260086: Android unified error reporting"#19201
kirklandsign merged 1 commit intomainfrom
android/backout-unified-error-reporting

Conversation

@kirklandsign
Copy link
Copy Markdown
Contributor

Summary

Make internal repo in sync

Test plan

CI

The original backout D102493838 only reverted fbcode/executorch mirror
files but missed xplat/executorch source files, leaving GitHub out of
sync. This applies the same backout to the xplat source.
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 29, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19201

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ You can merge normally! (2 Unrelated Failures)

As of commit 8bd0329 with merge base c4ec988 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 29, 2026
@kirklandsign kirklandsign marked this pull request as ready for review April 29, 2026 00:47
@kirklandsign kirklandsign requested a review from Gasoonjia as a code owner April 29, 2026 00:47
Copilot AI review requested due to automatic review settings April 29, 2026 00:47
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR backs out the prior “Android unified error reporting” changes to resync with an internal repository, shifting several Android APIs away from exception-based reporting and updating JNI method bindings accordingly.

Changes:

  • Switch Android Module.loadMethod() and several LLM APIs to return integer status codes instead of throwing.
  • Rename/update JNI registered method names (e.g., *Native → direct names) and adjust corresponding Java declarations.
  • Update Android instrumentation tests and minibench runners to use status codes.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
extension/benchmark/android/benchmark/app/src/main/java/org/pytorch/minibench/ModelRunner.java Uses status code returned by Module.loadMethod() for benchmark reporting.
extension/benchmark/android/benchmark/app/src/main/java/org/pytorch/minibench/LlmModelRunner.java Removes exception handling around LLM load/generate and forwards returned status.
extension/android/jni/jni_layer_llama.cpp Changes LLM JNI load error handling and renames registered native method names.
extension/android/jni/jni_layer.cpp Renames registered method names and changes EValue input type handling in execute path.
extension/android/executorch_android/src/main/java/org/pytorch/executorch/extension/llm/LlmModule.java Converts generate/load/resetContext APIs to native/status-based flow; removes destroyed-state guards; changes prefill error handling.
extension/android/executorch_android/src/main/java/org/pytorch/executorch/extension/llm/LlmCallback.java Makes default onError implementation a no-op.
extension/android/executorch_android/src/main/java/org/pytorch/executorch/Module.java Changes method signatures and replaces some wrapper methods with direct native methods.
extension/android/executorch_android/src/main/java/org/pytorch/executorch/ExecutorchRuntimeException.java Adjusts error-code constants/mappings and message formatting.
extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/ModuleInstrumentationTest.kt Updates tests to assert status codes instead of exceptions (with one expected-exception mismatch).
extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmModuleInstrumentationTest.kt Updates tests to check load() return status.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

private const val TEST_FILE_NAME = "/stories.pte"
private const val TOKENIZER_FILE_NAME = "/tokenizer.bin"
private const val TEST_PROMPT = "Hello"
private const val OK = 0x00
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test introduces a local OK constant (0x00). To keep tests aligned with the runtime’s source of truth and avoid drift, prefer using ExecutorchRuntimeException.OK (or a shared constant) instead of duplicating the numeric value here.

Suggested change
private const val OK = 0x00

Copilot uses AI. Check for mistakes.
@@ -385,12 +385,6 @@ class ExecuTorchJni : public facebook::jni::HybridClass<ExecuTorchJni> {
static const auto toBoolMethod =
JEValue::javaClassStatic()->getMethod<jboolean()>("toBool");
evalues.emplace_back(static_cast<bool>(toBoolMethod(jevalue)));
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the EValue input conversion loop, unsupported type codes are now silently ignored (the previous branch that threw InvalidArgument was removed). This can result in executing the method with missing/shifted inputs, producing incorrect results or confusing downstream errors. Restore explicit handling for unknown type codes (e.g., throw an ExecuTorch exception / return empty result) so invalid inputs fail fast.

Suggested change
evalues.emplace_back(static_cast<bool>(toBoolMethod(jevalue)));
evalues.emplace_back(static_cast<bool>(toBoolMethod(jevalue)));
} else {
ET_LOG(
Error,
"Unsupported JEValue type code %d at input index %d.",
static_cast<int>(typeCode),
i);
return facebook::jni::JArrayClass<JEValue>::newArray(0);

Copilot uses AI. Check for mistakes.
} catch (Exception e) {
status = -1;
}
int status = mLlmModelRunner.mModule.load();
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleMessage() no longer catches exceptions around mModule.load(). With the current JNI implementation, load() can throw (e.g., on non-zero native load result), which would crash the Handler thread and prevent onModelLoaded(status) from being called. Either restore exception handling here or ensure LlmModule.load() is a pure status-returning API that never throws.

Suggested change
int status = mLlmModelRunner.mModule.load();
int status = -1;
try {
status = mLlmModelRunner.mModule.load();
} catch (RuntimeException e) {
// Preserve handler thread liveness and report load failure via status callback.
}

Copilot uses AI. Check for mistakes.
} finally {
mLock.unlock();
}
return readLogBufferNative();
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readLogBuffer() now forwards directly to readLogBufferNative() without any mHybridData.isValid() check. After destroy(), this risks invoking JNI on an invalid HybridData/native pointer (potential crash). Please add the same isValid guard used by execute/loadMethod (return an empty array or similar when destroyed) and consider also guarding with mLock if the native side isn’t thread-safe.

Suggested change
return readLogBufferNative();
mLock.lock();
try {
if (!mHybridData.isValid()) {
return new String[0];
}
return readLogBufferNative();
} finally {
mLock.unlock();
}

Copilot uses AI. Check for mistakes.
if (errorCode != 0) {
throw new ExecutorchRuntimeException(errorCode, "Failed to load method: " + methodName);
Log.e("ExecuTorch", "Attempt to use a destroyed module");
return 0x2; // InvalidState
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loadMethod() returns a hard-coded 0x2 for the destroyed-module case. Since this class already depends on ExecutorchRuntimeException’s error-code constants (and the comment says “keep in sync with runtime/core/error.h”), prefer using a named constant (e.g., ExecutorchRuntimeException.INVALID_STATE) to avoid magic numbers drifting from native definitions.

Suggested change
return 0x2; // InvalidState
return ExecutorchRuntimeException.INVALID_STATE;

Copilot uses AI. Check for mistakes.
Comment on lines 187 to +188
@DoNotStrip
private native String[] getMethodsNative();
public native String[] getMethods();
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getMethods() is now a direct native call with no mLock/mHybridData.isValid() guard. This breaks the class’s existing thread-safety + destroyed-object pattern (execute/loadMethod still lock and check isValid), and it can call into freed native state after destroy(). Consider restoring a Java wrapper that locks + checks isValid (returning an empty array on destroyed), and keep the native method private (e.g., getMethodsNative).

Copilot uses AI. Check for mistakes.
Comment on lines 226 to +228
@Experimental
public boolean etdump() {
mLock.lock();
try {
if (!mHybridData.isValid()) {
throw new IllegalStateException("Module has been destroyed");
}
return etdumpNative();
} finally {
mLock.unlock();
}
}

@DoNotStrip
private native boolean etdumpNative();
public native boolean etdump();
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etdump() is now a direct native method without an isValid check or locking. Calling this after destroy() (or concurrently with execute()) can race with native teardown and lead to crashes or corrupted dumps. Wrap the call in a Java method that acquires mLock and checks mHybridData.isValid(), returning false when destroyed.

Copilot uses AI. Check for mistakes.
Comment on lines 188 to 190
public void resetNative() {
if (mDestroyed) return;
mDestroyed = true;
mHybridData.resetNative();
}
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LlmModule removed the destroyed-state guard (mDestroyed/checkNotDestroyed), but resetNative() still frees the underlying HybridData. After resetNative(), subsequent calls into native methods like generate()/load()/resetContext() can invoke JNI on an invalid native pointer (potential crash). Consider restoring an isValid/destroyed guard (similar to Module/TrainingModule) to fail fast after resetNative().

Copilot uses AI. Check for mistakes.
Comment on lines 103 to 110
@Test(expected = RuntimeException::class)
@Throws(IOException::class)
fun testNonPteFile() {
val module = Module.load(getTestFilePath(NON_PTE_FILE_NAME))

module.loadMethod(FORWARD_METHOD)
val loadMethod = module.loadMethod(FORWARD_METHOD)
Assert.assertEquals(loadMethod.toLong(), INVALID_ARGUMENT.toLong())
}
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testNonPteFile is still annotated with @test(expected = RuntimeException::class), but the test body now asserts that loadMethod() returns INVALID_ARGUMENT rather than throwing. Unless Module.load() is expected to throw before reaching the assertions, this test will fail. Update the test to either assert the returned error code (and remove the expected-exception annotation) or keep exception-based assertions consistently.

Copilot uses AI. Check for mistakes.
System.err.println("ExecuTorch LLM error " + errorCode + ": " + message);
}
}
default void onError(int errorCode, String message) {}
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LlmCallback.onError’s default implementation was changed from logging to a no-op. If callers don’t override onError, runtime failures will become silent, making it hard to diagnose issues in production/benchmarks. Consider keeping a minimal default log (or at least documenting that implementations should override onError) to preserve baseline observability.

Copilot uses AI. Check for mistakes.
@psiddh
Copy link
Copy Markdown
Contributor

psiddh commented Apr 29, 2026

Thanks @kirklandsign , Yes we need this to keep things in sync

@kirklandsign kirklandsign merged commit 38fa0ff into main Apr 29, 2026
180 of 184 checks passed
@kirklandsign kirklandsign deleted the android/backout-unified-error-reporting branch April 29, 2026 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants