Back out "Re-apply D101260086: Android unified error reporting"#19201
Back out "Re-apply D101260086: Android unified error reporting"#19201kirklandsign merged 1 commit intomainfrom
Conversation
The original backout D102493838 only reverted fbcode/executorch mirror files but missed xplat/executorch source files, leaving GitHub out of sync. This applies the same backout to the xplat source.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19201
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ You can merge normally! (2 Unrelated Failures)As of commit 8bd0329 with merge base c4ec988 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
This PR backs out the prior “Android unified error reporting” changes to resync with an internal repository, shifting several Android APIs away from exception-based reporting and updating JNI method bindings accordingly.
Changes:
- Switch Android
Module.loadMethod()and several LLM APIs to return integer status codes instead of throwing. - Rename/update JNI registered method names (e.g.,
*Native→ direct names) and adjust corresponding Java declarations. - Update Android instrumentation tests and minibench runners to use status codes.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| extension/benchmark/android/benchmark/app/src/main/java/org/pytorch/minibench/ModelRunner.java | Uses status code returned by Module.loadMethod() for benchmark reporting. |
| extension/benchmark/android/benchmark/app/src/main/java/org/pytorch/minibench/LlmModelRunner.java | Removes exception handling around LLM load/generate and forwards returned status. |
| extension/android/jni/jni_layer_llama.cpp | Changes LLM JNI load error handling and renames registered native method names. |
| extension/android/jni/jni_layer.cpp | Renames registered method names and changes EValue input type handling in execute path. |
| extension/android/executorch_android/src/main/java/org/pytorch/executorch/extension/llm/LlmModule.java | Converts generate/load/resetContext APIs to native/status-based flow; removes destroyed-state guards; changes prefill error handling. |
| extension/android/executorch_android/src/main/java/org/pytorch/executorch/extension/llm/LlmCallback.java | Makes default onError implementation a no-op. |
| extension/android/executorch_android/src/main/java/org/pytorch/executorch/Module.java | Changes method signatures and replaces some wrapper methods with direct native methods. |
| extension/android/executorch_android/src/main/java/org/pytorch/executorch/ExecutorchRuntimeException.java | Adjusts error-code constants/mappings and message formatting. |
| extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/ModuleInstrumentationTest.kt | Updates tests to assert status codes instead of exceptions (with one expected-exception mismatch). |
| extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmModuleInstrumentationTest.kt | Updates tests to check load() return status. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| private const val TEST_FILE_NAME = "/stories.pte" | ||
| private const val TOKENIZER_FILE_NAME = "/tokenizer.bin" | ||
| private const val TEST_PROMPT = "Hello" | ||
| private const val OK = 0x00 |
There was a problem hiding this comment.
This test introduces a local OK constant (0x00). To keep tests aligned with the runtime’s source of truth and avoid drift, prefer using ExecutorchRuntimeException.OK (or a shared constant) instead of duplicating the numeric value here.
| private const val OK = 0x00 |
| @@ -385,12 +385,6 @@ class ExecuTorchJni : public facebook::jni::HybridClass<ExecuTorchJni> { | |||
| static const auto toBoolMethod = | |||
| JEValue::javaClassStatic()->getMethod<jboolean()>("toBool"); | |||
| evalues.emplace_back(static_cast<bool>(toBoolMethod(jevalue))); | |||
There was a problem hiding this comment.
In the EValue input conversion loop, unsupported type codes are now silently ignored (the previous branch that threw InvalidArgument was removed). This can result in executing the method with missing/shifted inputs, producing incorrect results or confusing downstream errors. Restore explicit handling for unknown type codes (e.g., throw an ExecuTorch exception / return empty result) so invalid inputs fail fast.
| evalues.emplace_back(static_cast<bool>(toBoolMethod(jevalue))); | |
| evalues.emplace_back(static_cast<bool>(toBoolMethod(jevalue))); | |
| } else { | |
| ET_LOG( | |
| Error, | |
| "Unsupported JEValue type code %d at input index %d.", | |
| static_cast<int>(typeCode), | |
| i); | |
| return facebook::jni::JArrayClass<JEValue>::newArray(0); |
| } catch (Exception e) { | ||
| status = -1; | ||
| } | ||
| int status = mLlmModelRunner.mModule.load(); |
There was a problem hiding this comment.
handleMessage() no longer catches exceptions around mModule.load(). With the current JNI implementation, load() can throw (e.g., on non-zero native load result), which would crash the Handler thread and prevent onModelLoaded(status) from being called. Either restore exception handling here or ensure LlmModule.load() is a pure status-returning API that never throws.
| int status = mLlmModelRunner.mModule.load(); | |
| int status = -1; | |
| try { | |
| status = mLlmModelRunner.mModule.load(); | |
| } catch (RuntimeException e) { | |
| // Preserve handler thread liveness and report load failure via status callback. | |
| } |
| } finally { | ||
| mLock.unlock(); | ||
| } | ||
| return readLogBufferNative(); |
There was a problem hiding this comment.
readLogBuffer() now forwards directly to readLogBufferNative() without any mHybridData.isValid() check. After destroy(), this risks invoking JNI on an invalid HybridData/native pointer (potential crash). Please add the same isValid guard used by execute/loadMethod (return an empty array or similar when destroyed) and consider also guarding with mLock if the native side isn’t thread-safe.
| return readLogBufferNative(); | |
| mLock.lock(); | |
| try { | |
| if (!mHybridData.isValid()) { | |
| return new String[0]; | |
| } | |
| return readLogBufferNative(); | |
| } finally { | |
| mLock.unlock(); | |
| } |
| if (errorCode != 0) { | ||
| throw new ExecutorchRuntimeException(errorCode, "Failed to load method: " + methodName); | ||
| Log.e("ExecuTorch", "Attempt to use a destroyed module"); | ||
| return 0x2; // InvalidState |
There was a problem hiding this comment.
loadMethod() returns a hard-coded 0x2 for the destroyed-module case. Since this class already depends on ExecutorchRuntimeException’s error-code constants (and the comment says “keep in sync with runtime/core/error.h”), prefer using a named constant (e.g., ExecutorchRuntimeException.INVALID_STATE) to avoid magic numbers drifting from native definitions.
| return 0x2; // InvalidState | |
| return ExecutorchRuntimeException.INVALID_STATE; |
| @DoNotStrip | ||
| private native String[] getMethodsNative(); | ||
| public native String[] getMethods(); |
There was a problem hiding this comment.
getMethods() is now a direct native call with no mLock/mHybridData.isValid() guard. This breaks the class’s existing thread-safety + destroyed-object pattern (execute/loadMethod still lock and check isValid), and it can call into freed native state after destroy(). Consider restoring a Java wrapper that locks + checks isValid (returning an empty array on destroyed), and keep the native method private (e.g., getMethodsNative).
| @Experimental | ||
| public boolean etdump() { | ||
| mLock.lock(); | ||
| try { | ||
| if (!mHybridData.isValid()) { | ||
| throw new IllegalStateException("Module has been destroyed"); | ||
| } | ||
| return etdumpNative(); | ||
| } finally { | ||
| mLock.unlock(); | ||
| } | ||
| } | ||
|
|
||
| @DoNotStrip | ||
| private native boolean etdumpNative(); | ||
| public native boolean etdump(); |
There was a problem hiding this comment.
etdump() is now a direct native method without an isValid check or locking. Calling this after destroy() (or concurrently with execute()) can race with native teardown and lead to crashes or corrupted dumps. Wrap the call in a Java method that acquires mLock and checks mHybridData.isValid(), returning false when destroyed.
| public void resetNative() { | ||
| if (mDestroyed) return; | ||
| mDestroyed = true; | ||
| mHybridData.resetNative(); | ||
| } |
There was a problem hiding this comment.
LlmModule removed the destroyed-state guard (mDestroyed/checkNotDestroyed), but resetNative() still frees the underlying HybridData. After resetNative(), subsequent calls into native methods like generate()/load()/resetContext() can invoke JNI on an invalid native pointer (potential crash). Consider restoring an isValid/destroyed guard (similar to Module/TrainingModule) to fail fast after resetNative().
| @Test(expected = RuntimeException::class) | ||
| @Throws(IOException::class) | ||
| fun testNonPteFile() { | ||
| val module = Module.load(getTestFilePath(NON_PTE_FILE_NAME)) | ||
|
|
||
| module.loadMethod(FORWARD_METHOD) | ||
| val loadMethod = module.loadMethod(FORWARD_METHOD) | ||
| Assert.assertEquals(loadMethod.toLong(), INVALID_ARGUMENT.toLong()) | ||
| } |
There was a problem hiding this comment.
testNonPteFile is still annotated with @test(expected = RuntimeException::class), but the test body now asserts that loadMethod() returns INVALID_ARGUMENT rather than throwing. Unless Module.load() is expected to throw before reaching the assertions, this test will fail. Update the test to either assert the returned error code (and remove the expected-exception annotation) or keep exception-based assertions consistently.
| System.err.println("ExecuTorch LLM error " + errorCode + ": " + message); | ||
| } | ||
| } | ||
| default void onError(int errorCode, String message) {} |
There was a problem hiding this comment.
LlmCallback.onError’s default implementation was changed from logging to a no-op. If callers don’t override onError, runtime failures will become silent, making it hard to diagnose issues in production/benchmarks. Consider keeping a minimal default log (or at least documenting that implementations should override onError) to preserve baseline observability.
|
Thanks @kirklandsign , Yes we need this to keep things in sync |
Summary
Make internal repo in sync
Test plan
CI