feat(parquet): apply column default values when reading missing fields (2/4) by huan233usc · Pull Request #792 · apache/iceberg-cpp

huan233usc · 2026-06-29T21:48:20Z

What

Part 2 of 4 of Iceberg v3 column default-value support (POC #731), built on the
schema-layer support merged in #746.

When a column is present in the read (table) schema but absent from a Parquet
data file — because the column was added after those rows were written — fill it
with the column's v3 initial-default instead of null.

Changes

arrow/literal_util (new): a shared Arrow materializer that converts a
Literal to an Arrow scalar (ToArrowScalar), to a constant-filled array
(MakeDefaultArray), or appends it once to a builder
(AppendDefaultToBuilder). Covers all primitive types, including
decimal/uuid/fixed and the four timestamp variants.
Parquet projection (parquet_schema_util.cc): when a field is missing
from the file and carries an initial-default, project it as
FieldProjection::Kind::kDefault, mirroring the generic schema_util.cc path
already merged in feat(schema): represent, serialize and validate v3 column default values (1/4) #746.
Parquet data (parquet_data_util.cc): materialize the kDefault branch
via MakeDefaultArray.

Tests

literal_util_test: scalar/array/builder conversion across primitive types,
null and narrowing-sentinel handling, and casting to the target Arrow type.
parquet_data_test: ProjectRecordBatch fills missing required and optional
default fields, including a nested struct.
parquet_test: end-to-end — write a file with an old schema, then read it
through ReaderFactoryRegistry with an evolved schema carrying defaults
(ReadMissingFieldsWithDefaults).

Stack

feat(schema): represent, serialize and validate v3 column default values (1/4) #746 — schema: represent / serialize / validate (merged)
this PR — read path: Parquet
read path: Avro (follows)
schema evolution: addColumn / updateColumnDefault (follows)

…s (2/4) When a column is present in the read schema but missing from a Parquet data file (written before the column existed), fill it with the column's v3 initial-default instead of null. Adds a shared Arrow materializer (arrow/literal_util) that turns a Literal into an Arrow scalar/array, and a kDefault projection branch in the Parquet schema/data projection paths. The materializer wraps the storage array in the extension type for extension types such as `arrow.uuid` (compute::Cast has no storage->extension kernel). Part 2 of the v3 column-default-values work (POC apache#731), built on the schema support merged in apache#746.

huan233usc marked this pull request as draft June 29, 2026 21:51

huan233usc mentioned this pull request Jun 29, 2026

feat(evolution): support v3 column default values in UpdateSchema (3/4) #793

Open

huan233usc force-pushed the feat/default-values-read-parquet branch from 925d8cd to 378150c Compare June 29, 2026 22:47

huan233usc closed this Jun 29, 2026

huan233usc reopened this Jun 29, 2026

huan233usc marked this pull request as ready for review June 30, 2026 02:12

huan233usc force-pushed the feat/default-values-read-parquet branch from 6b08bed to 8db3067 Compare June 30, 2026 02:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(parquet): apply column default values when reading missing fields (2/4)#792

feat(parquet): apply column default values when reading missing fields (2/4)#792
huan233usc wants to merge 1 commit into
apache:mainfrom
huan233usc:feat/default-values-read-parquet

huan233usc commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

huan233usc commented Jun 29, 2026

What

Changes

Tests

Stack

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant