[core] Introduce 'file-io.atomic-rename.enabled' for atomic rename control#6575
[core] Introduce 'file-io.atomic-rename.enabled' for atomic rename control#6575lsm1 wants to merge 3 commits into
Conversation
|
This pull request has had no activity for 90 days. If you'd like to keep it open, please push a new commit or leave a comment. Thanks for the contribution. |
JingsongLi
left a comment
There was a problem hiding this comment.
Useful configuration for object storage environments.
Review:
-
Default is enabled (attempts atomic rename) — this preserves existing behavior. Users on object storage can set
file-io.atomic-rename.enabled=falseto skip the reflection overhead. Good. -
+26/-2 is minimal. The config option is in
CatalogOptionsand the behavior change is inHadoopFileIO. -
Naming:
file-io.atomic-rename.enabledis clear. Consider whether this should be per-FileIO rather than global (e.g., for catalogs that use HDFS for metadata but S3 for data). -
Documentation: The generated config HTML is updated. Good.
-
Relationship to #7223 (Hadoop 3.4+ atomic writes): These are related — #7223 adds native conditional writes while this PR allows disabling the reflection-based rename. Ensure they don't conflict.
Minor: No tests added. Since this is a simple boolean flag that gates an existing code path, existing tests should cover the default behavior. But consider adding a test that verifies the "disabled" path actually skips reflection.
LGTM with the above noted.
|
Please rebase master. |
91d445c to
db8ed9e
Compare
db8ed9e to
81658e0
Compare
|
|
||
| private org.apache.paimon.options.Options options; | ||
|
|
||
| private boolean atomicRenameEnabled = true; |
There was a problem hiding this comment.
Could we avoid adding this as a primitive boolean with only the field initializer? HadoopFileIO is Serializable (and has a serialization test), and with the unchanged serialVersionUID, instances serialized by an older Paimon version will deserialize this newly added field as Java default false, not the option default true. After a rolling upgrade or restoring from an old checkpoint/savepoint, overwriteFileUtf8 would silently skip the atomic rename path even though file-io.atomic-rename.enabled defaults to true. A nullable Boolean treated as true when null, or a custom readObject that initializes the field to true, would preserve the previous behavior.
|
I don't think this The code sets Could we detect the missing field explicitly, e.g. via |
…ng deserialization
Purpose
Add a new configuration option
file-io.atomic-rename.enabledto control whether to attempt atomic rename for file overwrite operations.When enabled (default), Paimon attempts to use atomic rename (write to temp file then rename with OVERWRITE option) via reflection on the FileSystem's 3-parameter rename method. This is supported on distributed file systems like HDFS (DistributedFileSystem). On object storage systems like S3/OSS that don't implement this method, it automatically falls back to direct overwrite.
When disabled, Paimon skips the atomic rename attempt and always uses direct overwrite, which can avoid the overhead of reflection calls and temporary file operations, especially useful on object storage systems where atomic rename is not supported.
Tests
API and Format
Documentation