Skip to content

[spark] Make saveAsTable+overwrite behave as INSERT OVERWRITE#8225

Merged
JingsongLi merged 3 commits into
apache:masterfrom
Zouxxyy:xinyu/df-overwrite-drop-table
Jun 14, 2026
Merged

[spark] Make saveAsTable+overwrite behave as INSERT OVERWRITE#8225
JingsongLi merged 3 commits into
apache:masterfrom
Zouxxyy:xinyu/df-overwrite-drop-table

Conversation

@Zouxxyy

@Zouxxyy Zouxxyy commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Purpose

Previously, df.write.mode("overwrite").saveAsTable("t") produced a ReplaceTableAsSelect plan, which could drop + recreate the table when the user did not re-specify partitionBy() and primary-key options — silently losing the partition spec, primary keys, and table properties.

This PR makes saveAsTable + overwrite on an existing table (Spark 3.4+) be rewritten to OverwriteByExpression (or OverwritePartitionsDynamic when partitionOverwriteMode=dynamic), preserving the existing table definition. This aligns with the behavior of INSERT OVERWRITE and is consistent with Delta Lake.

SQL CREATE OR REPLACE TABLE AS SELECT and V2 writeTo().replace() are not affected.

Tests

Added cases in DataFrameWriteTestBase:

  • saveAsTable overwrite preserves table definition and snapshots
  • saveAsTable overwrite on non-partitioned table
  • saveAsTable overwrite creates table when not exists
  • saveAsTable overwrite respects dynamic partition overwrite mode

Previously, df.write.mode("overwrite").saveAsTable("t") produced a
ReplaceTableAsSelect plan which could drop+recreate the table if the
user did not re-specify partitionBy() and primary-key options.

Now on Spark 3.4+, saveAsTable+overwrite is rewritten to
OverwriteByExpression (or OverwritePartitionsDynamic when
partitionOverwriteMode=dynamic), preserving the existing table
definition (partitions, primary keys, properties). This aligns with
the behavior of INSERT OVERWRITE and is consistent with Delta Lake.

SQL CREATE OR REPLACE TABLE AS SELECT and V2 writeTo().replace()
are not affected.
@Zouxxyy Zouxxyy force-pushed the xinyu/df-overwrite-drop-table branch from d7c976f to b37fdaa Compare June 13, 2026 04:39
@JingsongLi

Copy link
Copy Markdown
Contributor

+1

@JingsongLi JingsongLi merged commit bd9274d into apache:master Jun 14, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants