Dov/iceberg docs by DAlperin · Pull Request #34781 · MaterializeInc/materialize

DAlperin · 2026-01-21T15:30:01Z

https://gist.github.com/DAlperin/0765693e5afbda68c7a0bb05f63e00eb

Motivation

Tips for reviewer

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

kay-kim · 2026-01-21T16:11:28Z

doc/user/content/sql/create-sink/iceberg.md

+
+## Syntax
+
+```mzsql


Similar to how kakfa has a data/examples/create_sink_kafka.yml file and uses the include-syntax shortcode, could we create a create_sink_iceberg.yml and use the include-syntax?
This way, this file can do {{% include-syntax ... %}} while the
doc/user/content/sql/create-sink/_index.md‎ can use {{% include-example ...%}}

Also, in the yml file, you can include additional blurbs in there for single-sourcing.

kay-kim · 2026-01-21T16:17:38Z

doc/user/content/sql/create-sink/iceberg.md

+| **NOT ENFORCED** | Optional. Disable validation of key uniqueness. Use only when you have outside knowledge that the key is unique. |
+| **COMMIT INTERVAL** `'<interval>'` | **Required.** How frequently to commit snapshots to Iceberg (e.g., `'10s'`, `'1m'`). |
+
+## How Iceberg sinks work


I would just make this ## Details

kay-kim · 2026-01-21T16:22:42Z

doc/user/content/sql/create-sink/iceberg.md

+| `<sink_name>` | The name for the sink. |
+| **IF NOT EXISTS** | Optional. Do not throw an error if a sink with the same name already exists. |
+| **IN CLUSTER** `<cluster_name>` | Optional. The [cluster](/sql/create-cluster) to maintain this sink. |
+| `<item_name>` | The name of the source, table, or materialized view to sink. |


I know this is the same as in Kafka sinks ... (and that content hasn't been vetted) ... but do you know off the top of your head whether by "source" ... whether we actually mean the subsources?

Maybe? Probably? I'll investigate

kay-kim · 2026-01-21T16:23:56Z

doc/user/content/sql/create-sink/iceberg.md

+
+## How Iceberg sinks work
+
+Iceberg sinks continuously stream changes from your source relation to an


instead of "source relation", which is a little ambiguous because of "source" in the general sense and "source" in our Materialize source sense, maybe "from Materialize".

kay-kim · 2026-01-21T16:32:56Z

doc/user/content/sql/create-sink/iceberg.md

+
+Iceberg sinks continuously stream changes from your source relation to an
+Iceberg table. If the table doesn't exist, Materialize automatically creates it
+with a schema matching your source.


Do we need this sentence? since we mention it in the table above?

kay-kim · 2026-01-21T17:55:23Z

doc/user/content/serve-results/sink/iceberg.md

+relation. Materialize uses these columns to generate equality delete files when
+rows are updated or deleted.
+
+If Materialize cannot validate that your key is unique, you'll receive an error.


Mention that Materialize does a validation .... then, the If Materialize cannot verify ...

kay-kim · 2026-01-21T17:55:58Z

doc/user/content/serve-results/sink/iceberg.md

+you may see commit conflict errors. Materialize will automatically retry, but
+if conflicts persist, ensure no other writers are modifying the same table.
+
+## Reference


The sections here can go into the actual create sink reference page below under details.

kay-kim · 2026-01-21T17:56:35Z

doc/user/content/serve-results/sink/iceberg.md

+- **Record types**: Composite/record types are not currently supported. Use
+  scalar types or flatten your data structure.
+
+## Troubleshooting


This probably can also go into yaml and be included here and in the create sink reference page below.

kay-kim · 2026-01-21T17:58:04Z

doc/user/content/sql/create-sink/iceberg.md

+- **Record types**: Composite/record types are not supported. Use scalar types
+  or flatten your data structure.
+
+## Technical reference


Oh ... mentioned above ... but woul move things (or also repeat) various content from the guide into here

kay-kim · 2026-01-21T17:58:32Z

doc/user/content/sql/create-connection.md

+
+#### Syntax {#iceberg-catalog-syntax}
+
+```mzsql


ditto about syntax and options in a yaml file. There's a data/examples/create_connection.yml file.

maheshwarip · 2026-01-22T22:42:01Z

doc/user/content/serve-results/sink/iceberg.md

+{{< public-preview />}}
+
+This guide walks you through the steps required to export results from
+Materialize to [Apache Iceberg](https://iceberg.apache.org/) tables. Iceberg


Let's specify that this is for S3 tables only:

This guide walks you through the steps required to export results from
Materialize to Apache Iceberg tables, hosted on AWS S3 Tables

maheshwarip · 2026-01-22T22:44:16Z

doc/user/content/serve-results/sink/iceberg.md

+
+This guide walks you through the steps required to export results from
+Materialize to [Apache Iceberg](https://iceberg.apache.org/) tables. Iceberg
+sinks are useful for maintaining a continuously updated analytical table that


Let's add a bit more here:

Apache Iceberg is an open table format for large-scale analytics datasets that brings reliable, performant ACID transactions, schema evolution, and time travel to data lakes. It gives you data warehouse-like reliability, with the cost advantages of object storage.

Amazon S3 Tables is an AWS feature that provides fully managed Apache Iceberg tables as a native S3 storage type, eliminating the need to manage separate metadata catalogs or table maintenance operations. It automatically handles compaction & snapshot management.

Iceberg sinks allow you to deliver analytical data from Materialize into an Iceberg table, hosted on AWS S3 Tables. As data changes in Materialize, your Iceberg tables are automatically kept up to date.

Q: Am curious as to why we want to pop blurbs that seem more marketing for Apache Iceberg and Amazon S3 tables into our docs.

maheshwarip · 2026-01-22T22:51:17Z

doc/user/content/serve-results/sink/iceberg.md

+API. Add the following to your IAM policy:
+
+```json
+{


Let's make all the IAM policies a single statement so that it is easier for the user to copy paste

maheshwarip · 2026-01-22T22:53:17Z

doc/user/content/serve-results/sink/iceberg.md

+
+## Step 2. Create connections
+
+Iceberg sinks require two connections:


Nit: let's describe what a connection is.

Connections allow Materialize to authenticate to an external system.

maheshwarip · 2026-01-23T15:51:20Z

doc/user/content/serve-results/sink/iceberg.md

+}
+```
+
+You'll update the external ID after creating the AWS connection in Materialize.


let's make it clear that the user needs to keep track of their ARN: Once you have created the IAM role, you should be able to get the ARN from the AWS console. You'll use the ARN in the next step.

maheshwarip · 2026-01-23T15:56:46Z

doc/user/content/serve-results/sink/iceberg.md

+2. An **Iceberg catalog connection** to interact with the Iceberg catalog
+
+### Create an AWS connection
+


Nit: Fill in the ARN from step 1

maheshwarip · 2026-01-23T15:59:07Z

doc/user/content/serve-results/sink/iceberg.md

+SELECT external_id
+FROM mz_internal.mz_aws_connections awsc
+JOIN mz_connections c ON awsc.id = c.id
+WHERE c.name = 'aws_connection';


let's be more descriptive here:

Run the query below to fetch the external_id. Once you have the external_id, go back to the trust policy for the IAM role created in step 1. Add the external_id to the policy, in the field labeled sts:ExternalId. At the end of this step, your IAM trust policy should look like this:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::664411391173:role/MaterializeConnection" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "mz_5ef0f19e-3172-4b35-a7f8-19862d214677_u191" } } } ] }

maheshwarip · 2026-01-23T16:03:36Z

doc/user/content/serve-results/sink/iceberg.md

+);
+```
+
+Replace `<region>` with your AWS region (e.g., `us-east-1`) and `<table-bucket-name>`


We could also just tell the user to copy the ARN for their table bucket right?

doc/user/content/serve-results/sink/iceberg.md

maheshwarip · 2026-01-23T16:33:13Z

doc/user/content/serve-results/sink/iceberg.md

+        {
+            "Effect": "Allow",
+            "Principal": {
+                "AWS": "arn:aws:iam::664411391173:role/MaterializeConnection"


Just double checking - is this the right aws acct? Would this differ by region? IE for us-east-1 vs us-west?

maheshwarip · 2026-01-23T16:46:11Z

doc/user/content/serve-results/sink/iceberg.md

+JOIN mz_connections c ON awsc.id = c.id
+WHERE c.name = 'aws_connection';
+```
+


Let's add a step here to validate the connection

maheshwarip · 2026-01-23T16:46:31Z

doc/user/content/serve-results/sink/iceberg.md

+
+```mzsql
+CREATE CONNECTION aws_connection
+   TO AWS (ASSUME ROLE ARN = 'arn:aws:iam::<account-id>:role/<role>');


We should be explicit about setting the region / link out to our usual connection creation steps

maheshwarip · 2026-01-23T16:50:01Z

doc/user/content/serve-results/sink/iceberg.md

+```
+
+You'll update the external ID after creating the AWS connection in Materialize.
+


Missing: we need to tell the user to attach the permissions policy they created before

maheshwarip · 2026-01-23T16:53:32Z

doc/user/content/serve-results/sink/iceberg.md

+
+- Ensure you have access to an AWS account with permissions to create and manage
+  IAM policies and roles.
+- Ensure you have an AWS S3 Tables bucket configured in your AWS account.


Also:

Ensure that you have created a namespace

maheshwarip · 2026-01-23T16:56:30Z

doc/user/content/serve-results/sink/iceberg.md

+    NAMESPACE = 'my_namespace',
+    TABLE = 'my_table'
+  )
+  USING AWS CONNECTION aws_connection


missing: envelope upsert

maheshwarip · 2026-01-23T16:57:38Z

doc/user/content/serve-results/sink/iceberg.md

+| `NAMESPACE` | The Iceberg namespace (database) containing the table. |
+| `TABLE` | The name of the Iceberg table to write to. |
+| `KEY` | **Required.** The columns that uniquely identify rows. Used to track updates and deletes. |
+| `COMMIT INTERVAL` | **Required.** How frequently to commit snapshots to Iceberg. See [Commit interval tradeoffs](#commit-interval-tradeoffs) below. |


Also missing envelope here

we should document what the min & max commit intervals are

maheshwarip · 2026-01-23T17:02:31Z

doc/user/content/serve-results/sink/iceberg.md

+        {
+            "Effect": "Allow",
+            "Principal": {
+                "AWS": "arn:aws:iam::664411391173:role/MaterializeConnection"


This might need to change for self-managed, since users wouldn't go through our cloud acct

Also: how would this work for our emulator?

maheshwarip · 2026-01-23T17:20:51Z

doc/user/content/sql/create-sink/iceberg.md

+|---------------------------------|-------------------------------|
+| Lower latency - data visible sooner | Higher latency - data takes longer to appear |
+| More small files - can degrade query performance | Fewer, larger files - better query performance |
+| Higher catalog overhead | Lower catalog overhead |


Also: s3 write costs!

maheshwarip · 2026-01-23T17:22:27Z

doc/user/content/sql/create-sink/iceberg.md

+- **Partitioning**: Materialize creates unpartitioned tables. Partitioned tables
+  are not yet supported.
+- **Record types**: Composite/record types are not supported. Use scalar types
+  or flatten your data structure.


Let's add the limitations that the users have to deliver data to the same region

maheshwarip · 2026-02-04T20:20:16Z

doc/user/content/sql/create-sink/iceberg.md

+results.
+{{< /warning >}}
+
+## Limitations


Should we add a section around best practices, recommending that Materialize is the only writer to the iceberg table? And that users have only 1 sink to the iceberg table in question @DAlperin ?

def- · 2026-02-04T22:25:51Z

doc/user/content/serve-results/sink/iceberg.md

+                "s3:PutObject",
+                "s3:DeleteObject"
+            ],
+            "Resource": "arn:aws:s3:::<bucket>/<prefix>/*"


What should the be? Just the namespace? What if I have multiple namespaces (I assume it's one per sink)

this will go away as this isn't needed. Only the following is needed.

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3tables:*", "Resource": "*" } ] }

def- · 2026-02-04T22:28:38Z

doc/user/content/serve-results/sink/iceberg.md

+The AWS account ID `664411391173` is the Materialize AWS account. This may
+differ for self-managed deployments.


Why do we say "may differ"? It's definitely different for customers I think.

def- · 2026-02-04T22:29:41Z

doc/user/content/serve-results/sink/iceberg.md

+            "Action": "sts:AssumeRole",
+            "Condition": {
+                "StringEquals": {
+                    "sts:ExternalId": "PENDING"


What is the meaning of PENDING here? Edit: I see, I should have read further! (or we could explain earlier that it will be filled later)

(have a local wip patch to this draft).

def- · 2026-02-04T22:30:34Z

doc/user/content/serve-results/sink/iceberg.md

+
+### Create an IAM role
+
+Create an [IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html)


What kind of role do I need? I see a bunch of trusted entity types to choose from.

For this tutorial "Custom trust policy" (have a local wip patch to this draft).

def- · 2026-02-04T22:39:51Z

doc/user/content/serve-results/sink/iceberg.md

+            "Action": [
+                "s3:ListBucket"
+            ],
+            "Resource": "arn:aws:s3:::<bucket>",


Should probably mention that you have to replace and

Hey Dennis --
I have a local patch on my machine ... it's still a WIP, but this will be updated to:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3tables:*", "Resource": "*" } ] }

That is, that statement allows all the specific actions.

I pushed up my WIP patch ... I'm still working on it ... but figured current work can still help people as they test.

bosconi mentioned this pull request Jan 21, 2026

DNM: iceberg docs variant 2 #34784

Closed

kay-kim reviewed Jan 21, 2026

View reviewed changes

This was referenced Jan 21, 2026

Add Iceberg catalog connection resource MaterializeInc/terraform-provider-materialize#808

Closed

Add Iceberg catalog target in materialize_sink MaterializeInc/terraform-provider-materialize#809

Closed

This was referenced Jan 21, 2026

DNM: Iceberg docs diffs #34786

Draft

Iceberg docs two llms sequential #34787

Closed

DAlperin force-pushed the dov/iceberg-docs branch from 1f68b39 to 1652830 Compare January 22, 2026 16:58

maheshwarip reviewed Jan 22, 2026

View reviewed changes

maheshwarip reviewed Jan 23, 2026

View reviewed changes

doc/user/content/serve-results/sink/iceberg.md Show resolved Hide resolved

maheshwarip reviewed Jan 23, 2026

View reviewed changes

maheshwarip requested changes Jan 23, 2026

View reviewed changes

DAlperin force-pushed the dov/iceberg-docs branch from 1652830 to 769742c Compare January 23, 2026 18:58

DAlperin added 5 commits January 27, 2026 15:22

First claude pass

a459d3c

cleanup

2133a07

Address PR feedback

c5b0b1a

notes from pranshu

47245e0

Update syntax

0a4339a

DAlperin force-pushed the dov/iceberg-docs branch from 769742c to 0a4339a Compare January 27, 2026 20:52

maheshwarip reviewed Feb 4, 2026

View reviewed changes

def- reviewed Feb 4, 2026

View reviewed changes

kay-kim added 4 commits February 4, 2026 19:18

Tweaks (part 1)

f7753fe

lint fixes for spaces

b672e1f

Tweaks (part2)

bae129b

tweak

8750617

DAlperin marked this pull request as ready for review February 13, 2026 21:25

DAlperin requested a review from a team as a code owner February 13, 2026 21:25


		## How Iceberg sinks work

		Iceberg sinks continuously stream changes from your source relation to an


		## Step 2. Create connections

		Iceberg sinks require two connections:

		2. An Iceberg catalog connection to interact with the Iceberg catalog

		### Create an AWS connection

		```

		You'll update the external ID after creating the AWS connection in Materialize.

		The AWS account ID `664411391173` is the Materialize AWS account. This may
		differ for self-managed deployments.


		### Create an IAM role

		Create an [IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html)


		## Syntax

		```mzsql


		#### Syntax {#iceberg-catalog-syntax}

		```mzsql

Conversation

DAlperin commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Tips for reviewer

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maheshwarip Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DAlperin commented Jan 21, 2026 •

edited

Loading

maheshwarip Jan 23, 2026 •

edited

Loading

kay-kim Feb 4, 2026 •

edited

Loading

def- Feb 4, 2026 •

edited

Loading