Skip to content

Add Tensor.copyDataInto to Java API#19171

Open
jgibson2 wants to merge 2 commits intopytorch:mainfrom
PolyCam:android-tensor-copyDataInto
Open

Add Tensor.copyDataInto to Java API#19171
jgibson2 wants to merge 2 commits intopytorch:mainfrom
PolyCam:android-tensor-copyDataInto

Conversation

@jgibson2
Copy link
Copy Markdown
Contributor

@jgibson2 jgibson2 commented Apr 27, 2026

Summary

Adds Tensor.copyDataInto(... dst) to the Android Java API for the float32 and float16 dtypes. It copies the tensor's data into a caller-provided destination buffer instead of allocating a fresh float[] per call (as getDataAsFloatArray() does today). The same pattern is repeated for other types.

Motivation

While profiling depth inference on Android with Perfetto, output extraction was a meaningful contributor to ART GC pressure. Each call to output.toTensor().dataAsFloatArray allocates a new Java float[] sized to the tensor's element count and bulk-copies from the underlying off-heap buffer into it.

The native side already exposes the underlying FloatBuffer directly (zero-copy view of the C++ tensor's data_ptr()), so the only thing missing was a public way for callers to drain it into a destination buffer they already own and reuse across calls.

API

public void copyDataInto(FloatBuffer dst)
  • Implemented on all datatypes

Caller-side usage example

// One-time setup
FloatBuffer depthBuf = Tensor.allocateFloatBuffer(numelDepth);

// Per inference
EValue[] outputs = module.forward(...);
depthBuf.rewind();
outputs[0].toTensor().copyDataInto(depthBuf);   // no allocation
// ... read from depthBuf ...

Test plan

  • Added unit tests in TensorTest.kt:
    • testCopyDataIntoFloat32 — round-trip with reuse across two calls
    • testCopyDataIntoFloat32_writesAtDstPosition — verifies the call writes at dst.position() and advances it (does not overwrite from index 0)
    • testCopyDataIntoFloat32_overflowBufferOverflowException on undersized destination
    • testCopyDataIntoFloat16 — verifies fp16→fp32 widening matches getDataAsFloatArray
    • testCopyDataIntoFloat_unsupportedDtypeIllegalStateException from base default for non-float dtypes

This PR was authored with Claude.

…put reads

The existing Tensor.getDataAsFloatArray() allocates a fresh float[] on every
call and copies from the underlying off-heap buffer into it. In a
steady-state inference loop this is a per-frame allocation proportional to
the output tensor size — for a 144x192 single-channel depth output that's
~110 KB per call, multiplied by however many output tensors a model
returns. Profiling on Android (Perfetto) showed this driving substantial
ART GC pressure that ended up stealing CPU back from the inference thread
itself via run-queue contention.

The off-heap FloatBuffer that backs the output Tensor is already owned by
the native side and exposed internally via the package-private
getRawDataBuffer(). copyDataInto(FloatBuffer dst) lets callers reuse a
pre-allocated destination across calls, eliminating the per-call float[]
allocation while keeping ownership unambiguous (the destination is
caller-owned, so it's safe to hand off to async consumers without racing
against the next forward() overwriting native memory).

Implemented for float32 (zero-copy bulk put) and float16 (per-element
half->float widening, mirroring getDataAsFloatArray). Other dtypes inherit
the base-class IllegalStateException default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jgibson2 jgibson2 requested a review from kirklandsign as a code owner April 27, 2026 21:10
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 27, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19171

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 1 Cancelled Job, 1 Unrelated Failure

As of commit 068212e with merge base 79b799d (image):

NEW FAILURE - The following job has failed:

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 27, 2026
@jgibson2 jgibson2 changed the title Add Tensor.copyDataInto(FloatBuffer) for allocation-free output reads on Android Add Tensor.copyDataInto Apr 27, 2026
@jgibson2 jgibson2 changed the title Add Tensor.copyDataInto Add Tensor.copyDataInto to Java API Apr 27, 2026
@kirklandsign
Copy link
Copy Markdown
Contributor

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 27, 2026

Claude finished @kirklandsign's task in 1m 17s —— View job


Code Review: Add Tensor.copyDataInto to Java API

  • Read changed files and gather context
  • Review Tensor.java changes
  • Review TensorTest.kt changes
  • Post review feedback

Overall: This is a clean, well-motivated PR. The API design is consistent with existing patterns, and test coverage is thorough. A few observations below, mostly minor.

What's good

  • Motivation is solid — eliminating per-frame float[] allocations in inference loops is a real win for GC pressure on Android.
  • API mirrors existing patterns — the copyDataInto overrides follow the same data.rewind(); dst.put(data) approach as getDataAs*Array(), just with a caller-provided destination.
  • int8/uint8 disambiguation via copyDataIntoUnsigned is the right call since both share ByteBuffer.
  • Test coverage is comprehensive — happy path, reuse, position-awareness, overflow, cross-dtype error messages, and signed/unsigned discrimination are all covered.
  • Float16 → FloatBuffer widening correctly mirrors the existing getDataAsFloatArray() logic for Tensor_float16.

Issues

  1. Thread safety of data.rewind() (pre-existing, not a blocker)
    Tensor.java:829 — Every copyDataInto override calls data.rewind() on the tensor's internal buffer before reading. This is the same pattern getDataAs*Array() already uses, so it's consistent. But worth knowing: java.nio.Buffer operations are not thread-safe. If two threads call copyDataInto (or any mix of copyDataInto/getDataAs*Array) on the same Tensor concurrently, they can corrupt each other's reads. Since the PR description mentions "steady-state inference loops" — if callers ever share a Tensor across threads, they'll need external synchronization. This is a pre-existing concern, not introduced here.

  2. Partial write on overflow for float16 copyDataInto(FloatBuffer)
    Tensor.java:893-898 — The element-by-element loop for fp16→fp32 widening means a BufferOverflowException can leave dst partially written and data's position somewhere in the middle. The bulk dst.put(data) used by other dtypes is atomic at the buffer level (it checks remaining capacity upfront before transferring). This is a minor inconsistency — in practice callers should size their buffers correctly — but you could guard with an upfront check:

    if (dst.remaining() < data.remaining()) {
        throw new BufferOverflowException();
    }

    Not a blocker, but would make the fp16 path behave identically to other dtypes on undersized buffers.

Nits

  • The Javadoc on the FloatBuffer base method (lines 557–576) is quite detailed compared to the other copyDataInto overloads. This makes sense since it's the one most callers will reach for, just noting the asymmetry is intentional.
  • testCopyDataIntoFloat64 and testCopyDataIntoFloat16RawBits manually construct DoubleBuffer/ShortBuffer via ByteBuffer.allocateDirect(...).order(...).asFooBuffer()Tensor.allocateDoubleBuffer and Tensor.allocateHalfBuffer already exist and would shorten these.

Summary

Looks good to land. The only suggestion worth considering before merge is the upfront capacity check in Tensor_float16.copyDataInto(FloatBuffer) to avoid partial writes (issue #2), but it's not a must-have.

@jgibson2
Copy link
Copy Markdown
Contributor Author

@pytorchbot label "release notes: none"

@pytorch-bot pytorch-bot Bot added the release notes: none Do not include this in the release notes label Apr 27, 2026
@kirklandsign kirklandsign added release notes: android Android Java and JNI code and removed release notes: none Do not include this in the release notes labels Apr 27, 2026
Extends the previous commit (which added copyDataInto(FloatBuffer) for
float32 / float16) to cover every dtype that the Tensor class supports:

  copyDataInto(ByteBuffer)            int8
  copyDataIntoUnsigned(ByteBuffer)    uint8
  copyDataInto(IntBuffer)             int32
  copyDataInto(LongBuffer)            int64
  copyDataInto(DoubleBuffer)          float64
  copyDataInto(ShortBuffer)           float16 (raw fp16 bits, no widening)

Mirrors the asymmetry of getDataAsByteArray (int8) vs
getDataAsUnsignedByteArray (uint8) — calling copyDataInto(ByteBuffer) on a
uint8 tensor throws, and copyDataIntoUnsigned(ByteBuffer) on int8 throws,
just like the array accessors. The two methods are intentionally separate
even though the underlying bits are identical, so a misuse caused by a
dtype switch surfaces as an exception instead of silently producing values
with the wrong sign interpretation.

The float16 ShortBuffer overload writes the raw 16-bit half-precision bits
without widening; copyDataInto(FloatBuffer) on the same tensor still
performs the half->float widening, matching the dual getDataAsShortArray /
getDataAsFloatArray accessors that float16 already exposes.

Tests cover each new variant's happy path plus the asymmetric int8/uint8
rejection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jgibson2 jgibson2 force-pushed the android-tensor-copyDataInto branch from 4475cb5 to 068212e Compare April 28, 2026 11:41
@jgibson2
Copy link
Copy Markdown
Contributor Author

@claude re-review

@mergennachin mergennachin requested a review from psiddh April 28, 2026 19:20
@mergennachin
Copy link
Copy Markdown
Contributor

@claude Review this PR

@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Claude finished @mergennachin's task in 1m 1s —— View job


Code Review: Add Tensor.copyDataInto to Java API

  • Read changed files and gather context
  • Review Tensor.java changes
  • Review TensorTest.kt changes
  • Post review feedback

Overall: This is a well-designed, well-tested addition. The API is consistent with existing patterns, the motivation (reducing GC pressure in inference loops) is sound, and the test coverage is thorough. Ready to land with one minor suggestion.

What's good

  • Complete dtype coverage — every supported tensor subclass (float32, float16, int8, uint8, int32, int64, float64) gets a copyDataInto override, plus the float16 → FloatBuffer widening path. No gaps.
  • copyDataIntoUnsigned disambiguation — since both int8 and uint8 back onto ByteBuffer, giving unsigned its own method name avoids ambiguity and makes misuse a compile-time error rather than a runtime surprise. The tests at lines TensorTest.kt:453-459 and TensorTest.kt:475-481 verify cross-type rejection, which is great.
  • Upfront capacity check for fp16 widening (Tensor.java:899-901) — the previous review flagged the partial-write risk and it's been addressed. The testCopyDataIntoFloat16_overflowIsAtomic test (TensorTest.kt:350-362) verifies dst.position() stays at 0 on failure. Well done.
  • Test coverage is comprehensive — happy path, buffer reuse, position-awareness, overflow, cross-dtype rejection, and signed/unsigned discrimination are all covered across 15 new test methods.

Minor suggestion

data.rewind() could be data.position(0) (or left as-is)Buffer.rewind() clears the mark, while position(0) preserves it. In practice no code in this class uses marks, so this is purely a hygiene note and consistent with the existing getDataAs*Array() methods. Not worth changing — just noting for awareness.

Summary

Clean PR. The API mirrors the existing getDataAs*Array() pattern, the fp16 widening path has all-or-nothing overflow semantics, and the int8/uint8 disambiguation is well handled. Looks good to land.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: android Android Java and JNI code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants