Skip to content

[core] Introduce 'file-io.atomic-rename.enabled' for atomic rename control#6575

Open
lsm1 wants to merge 3 commits into
apache:masterfrom
lsm1:features/support-ignore-rename-overwrite
Open

[core] Introduce 'file-io.atomic-rename.enabled' for atomic rename control#6575
lsm1 wants to merge 3 commits into
apache:masterfrom
lsm1:features/support-ignore-rename-overwrite

Conversation

@lsm1

@lsm1 lsm1 commented Nov 10, 2025

Copy link
Copy Markdown
Contributor

Purpose

Add a new configuration option file-io.atomic-rename.enabled to control whether to attempt atomic rename for file overwrite operations.

When enabled (default), Paimon attempts to use atomic rename (write to temp file then rename with OVERWRITE option) via reflection on the FileSystem's 3-parameter rename method. This is supported on distributed file systems like HDFS (DistributedFileSystem). On object storage systems like S3/OSS that don't implement this method, it automatically falls back to direct overwrite.

When disabled, Paimon skips the atomic rename attempt and always uses direct overwrite, which can avoid the overhead of reflection calls and temporary file operations, especially useful on object storage systems where atomic rename is not supported.

Tests

API and Format

Documentation

@github-actions

github-actions Bot commented May 9, 2026

Copy link
Copy Markdown

This pull request has had no activity for 90 days. If you'd like to keep it open, please push a new commit or leave a comment. Thanks for the contribution.

@github-actions github-actions Bot added the stale label May 9, 2026

@JingsongLi JingsongLi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful configuration for object storage environments.

Review:

  1. Default is enabled (attempts atomic rename) — this preserves existing behavior. Users on object storage can set file-io.atomic-rename.enabled=false to skip the reflection overhead. Good.

  2. +26/-2 is minimal. The config option is in CatalogOptions and the behavior change is in HadoopFileIO.

  3. Naming: file-io.atomic-rename.enabled is clear. Consider whether this should be per-FileIO rather than global (e.g., for catalogs that use HDFS for metadata but S3 for data).

  4. Documentation: The generated config HTML is updated. Good.

  5. Relationship to #7223 (Hadoop 3.4+ atomic writes): These are related — #7223 adds native conditional writes while this PR allows disabling the reflection-based rename. Ensure they don't conflict.

Minor: No tests added. Since this is a simple boolean flag that gates an existing code path, existing tests should cover the default behavior. But consider adding a test that verifies the "disabled" path actually skips reflection.

LGTM with the above noted.

@JingsongLi

Copy link
Copy Markdown
Contributor

Please rebase master.

@lsm1 lsm1 force-pushed the features/support-ignore-rename-overwrite branch from 91d445c to db8ed9e Compare May 24, 2026 08:38
@github-actions github-actions Bot removed the stale label May 26, 2026
@lsm1 lsm1 force-pushed the features/support-ignore-rename-overwrite branch from db8ed9e to 81658e0 Compare May 26, 2026 07:03

private org.apache.paimon.options.Options options;

private boolean atomicRenameEnabled = true;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we avoid adding this as a primitive boolean with only the field initializer? HadoopFileIO is Serializable (and has a serialization test), and with the unchanged serialVersionUID, instances serialized by an older Paimon version will deserialize this newly added field as Java default false, not the option default true. After a rolling upgrade or restoring from an old checkpoint/savepoint, overwriteFileUtf8 would silently skip the atomic rename path even though file-io.atomic-rename.enabled defaults to true. A nullable Boolean treated as true when null, or a custom readObject that initializes the field to true, would preserve the previous behavior.

@JingsongLi

Copy link
Copy Markdown
Contributor

I don't think this readObject gives the intended default for old serialized HadoopFileIO instances.

The code sets atomicRenameEnabled = true before calling in.defaultReadObject(). For streams written by older versions, the new primitive boolean field is absent, and Java deserialization applies the default value for that missing field. That means defaultReadObject() can overwrite the earlier true with false, so old serialized objects would deserialize with atomic rename disabled.

Could we detect the missing field explicitly, e.g. via readFields().defaulted("atomicRenameEnabled"), or use a nullable/boxed representation plus defaulting after deserialization? Otherwise upgrade compatibility appears to regress the old HDFS atomic overwrite behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants