4027 Commits

Author SHA1 Message Date
anuunchin
6f925caa89 Fix/3464 sync error results in success label (#3492)
* Last executed info

* Any step failure results in failure badge

* Test adjustments
2025-12-17 17:12:32 +01:00
daniel-nagish
302dec4e20 feat: Support OAuth and base GCP credentials for BigQuery destination (#3382)
* feat: Support OAuth and base GCP credentials for BigQuery destination

Fixes #3380

- Add Union type to allow GcpOAuthCredentials and GcpCredentials
- Maintains backward compatibility with GcpServiceAccountCredentials
- Enables OAuth authentication for Workload Identity Federation
- Add tests for OAuth credentials acceptance

This change allows BigQuery destination to work with OAuth tokens
from GitHub Actions Workload Identity Federation and other OAuth flows,
without breaking existing service account authentication.

* remove baseclass from union

* fix tests

* lazy import google library

* type update

---------

Co-authored-by: djudjuu <djudju@proton.me>
2025-12-16 17:11:50 +01:00
anuunchin
266052eb76 Docs: Converting Jupyter notebooks in education to marimo notebooks (#3068)
* Initial commit

* lesson_1_quick_start adjusted for marimo

* lesson_2_dlt_sources_and_resources_create_first_dlt_pipeline marimo

* Fundamentals course 3 improved

* Marimo badges added

* Fundamenta: course 8

* Marimo badge link fix

* Fundamentals: course 7

* Fundamentals: course 6

* Fundamentals: course 5

* Fundamentals: cousre 4

* Fundamentals: course 3

* Fundamentals: course 2

* Fundmantals: course 1

* marimo links corrected

* Inline deps

* Fundamentals: fix lesson 2

* Fundamentals: fix lesson 3

* Fundamentals: fix lesson 4

* Formatting moved to build-molabs

* Fundamentals: fix lesson 5

* Removal of scrolls

* Fundamentals: fix lesson 6

* Fundamentals: fix lesson 7

* Fundamentals: fix lesson 8

* os.environ replaced with dlt.secrets where relevant

* Advanced: fix lesson 5

* Advanced fix lesson 9

* os.environ fixes

* Advanced: fix lesson 1

* Comments cleanup

* Additional comment removal, fix lesson 6 advanced

* Clean main makefile

* Get rid of constants.py

* Nicer json.loads()

* Better functions in preprocess_to_molab

* Tests for doc tooling funcs

* Validate molab command

* Marimo check added

* docs pages adjustment

* limits sqlglot in dev group until fixed

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-16 16:30:32 +01:00
anuunchin
052a15803d poke test disabled (#3487) 2025-12-16 11:53:44 +01:00
Somasundaram Sekar
87b812e3a6 feat(snowflake): add column_comment/description hint support (#3462)
Add support for column comments in Snowflake adapter, following the same
pattern as the Databricks adapter.

Changes:
- Add escape_snowflake_literal() function for proper SQL escaping
- Add COLUMN_COMMENT_HINT constant for Snowflake-specific hints
- Override _get_column_def_sql() to append COMMENT clause
- Support both generic "description" field and Snowflake-specific hint
- Add tests for column comments including special character escaping

Fixes #3312

Co-authored-by: Somasundaram Sekar <somasundaramsekar.1986@gmail.com>
2025-12-15 18:11:32 +01:00
anuunchin
f4169878ac Docstring fix (#3482) 2025-12-15 13:27:42 +01:00
Thierry Jean
bb8ab6272b add None to Container._Instance typing (#3469) 2025-12-12 14:47:22 +01:00
Rakesh V.
34669f1ac7 Feat/iceberg advanced partitioning (#3053)
* feat: implement advanced Iceberg partitioning with explicit ordering

- Add support for advanced partition transforms (year, month, day, hour, bucket, truncate)
- Implement explicit partition ordering via index property
- Add custom partition naming support
- Implement priority system: advanced partitioning overrides legacy partition: True
- Add comprehensive validation for partition specifications
- Add graceful error handling for PyIceberg limitations
- Add performance optimization with early exit for non-partitioned schemas
- Update schema typing to support dict/list partition syntax
- Add pyiceberg-core>=0.6.0 dependency for advanced transforms
- Add comprehensive test suite with 22+ test cases covering all scenarios

Backward compatible: existing partition: True syntax continues to work
Resolves partition ordering limitations in Iceberg table format

* Port iceberg_partition and build_iceberg_partition_spec to dlt core

* update type hint in IcebergLoadFilesystemJob

* Add tests for Iceberg advanced partitioning; remove unused partition extraction code

* Add docs for iceberg_adapter

---------

Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
2025-12-12 10:57:56 +01:00
Menna
6658d5468d Fix load retrieval to only show loads that contain a schema name that is in the pipeline.schema_names (#3446)
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-10 14:56:12 +01:00
Anton Burnashev
1c49c2081c wait in marimo UI in test_e2e.py to for schema selection (#3453)
Should fix flaky test
2025-12-10 14:32:18 +01:00
djudjuu
be9aa1bf03 pyarrow: respect resource hints before extract (#3436)
* merge resource hints before extract for all backends

* check load package directly

* better type check

* log if unsupported hints

* better log message

* do not use ensure_table_schema_columns

* test for desired behavior

* refactor

* clarified test assertions

* lint
2025-12-10 14:31:39 +01:00
rudolfix
db8a66b085 (chores) bumps to version 1.20.0 (#3443)
* bumps to version 1.20.0

* use dependency specifier in hub for plugin version check

* minimum dlt runtime cli check

* rollaback to old fsspec min version

* fixes test_hub ci workflow

* fixes flaky test
2025-12-09 20:18:30 +01:00
ivasio
99207237fe docs: add runtime docs to CLI reference (#3445)
* bumps to version 1.20.0

* update the hub reference docs, add CI check

* use dependency specifier in hub for plugin version check

* minimum dlt runtime cli check

* rollaback to old fsspec min version

* fixes test_hub ci workflow

* fixes flaky test

* bumps hub extra

* updates cli docs linting

* fixes docs lock

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: ivasio <ivan@dlthub.com>
2025-12-09 17:30:53 +01:00
Violetta Mishechkina
d0dc21bd45 Add runtime tutorial draft (#3449)
* Add tutorial draft

* lint: Line breaks in tutorial

* improves workspace and profiles docs

* moves snowflake docs

* fixes deprecated docusaurus broken links handlers

* updates docs lock

* Update the runtime part

* Final fixes

---------

Co-authored-by: elvis kahoro <github@elvis.ai>
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-09 17:09:58 +01:00
Thierry Jean
d17b0cb93d docs: LLM workflow update (#3422)
* working copy of docs

* added diagram; wip

* checkpoint

* Misc docusaurus fixes

* Remove placeholder text and whitespace

* Move images to the gcp bucket

* add data quality section

* fixed linting

* Escape curly braces

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
2025-12-09 16:56:39 +01:00
rudolfix
a9b526e751 (feat) small dashboard improvements (#3450)
* enables child tables by default

* renames to internal tables
2025-12-09 09:18:06 +01:00
segetsy
c678d35343 [fix/3358] add pagination stopping to JSONResponseCursorPaginator (#3374)
* [fix/3358] add pagination stopping to JSONResponseCursorPaginator
* [fix/3358] add some tests when there are more pages
* [fix/3358] fix naming
* [fix/3374] make stop_after_empty_page robust to data = None
* [fix/3358] align has more handling with RangePaginator and add test cases
* Compile path in __init__
short-circuit on empty page before touching has_more

---------

Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
2025-12-08 16:29:43 +01:00
King Chung Huang
c4515d7112 Add offset/limit body_path fields to OffsetPaginatorConfig (#3260)
* Add offset/limit body_path fields
* Add offset/limit body_path
* Remove duplicate line

---------

Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
2025-12-08 14:42:27 +01:00
Anton Burnashev
1713342624 Fix race condition in LimitItem causing extra items with parallelized resources (#3442) 2025-12-08 14:10:55 +01:00
rudolfix
3e11effbdb implements cancellation of normalize jobs (#3444)
* allows load jobs to separately set failed message and exception to be re-raised

* allows to cancel normalize via flag in load package, returns metrics when failed

* corrects cleaning of current load id, re-raises job exceptions in load, passes load and job ids in exception chain

* adds warnings on pending and partially loaded packages in pipeline failed exception

* creates schema when package is created

* makes internal pipeline load storage readonly

* fixes test

* fixes utime on windows

* review code reorg
2025-12-08 14:10:19 +01:00
Anton Burnashev
d0fb75b747 Skip examples requiring secrets on fork PRs (#3438)
Fixes CI failures for external contributors
2025-12-08 12:09:56 +01:00
djudjuu
289e00dece data quality checks cell in dashboard (#3413)
* adds hub extra

* makes hub module more user friendly when hub not installed

* test and lint fixes

* adds plugin version check util function

* basic cell appearing if installed

* use data quality cell

* show raw data too

* adds dlt-runtime to hub extra, minimal import tests

* bumps to dlthub 0.20.0 alpha

* lists pipelines with cli using the same functions as dashboard, dlt pipeline will list pipelines by default

* adds configured propfiles method on context so only profiles with configs or pipelines are listed

* adds list of locations that contained actual configs to provider interface

* improves workspace and profile commands

* test fixes

* fixes tests

* update text

* adds quality widget as python functions

* adds data_quality as module to hub

* adds hub extra to docs deps

* fixes dashboard imports

* bumps to alpha x.20.0a1

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-07 12:59:21 +01:00
anuunchin
c3d8afe51a Color fix in dashboard (#3439) 2025-12-06 12:53:27 +01:00
rudolfix
06bc05848b (chore) adds hub extra (#3428)
* adds hub extra

* makes hub module more user friendly when hub not installed

* test and lint fixes

* adds plugin version check util function

* adds dlt-runtime to hub extra, minimal import tests

* bumps to dlthub 0.20.0 alpha

* lists pipelines with cli using the same functions as dashboard, dlt pipeline will list pipelines by default

* adds configured propfiles method on context so only profiles with configs or pipelines are listed

* adds list of locations that contained actual configs to provider interface

* improves workspace and profile commands

* test fixes

* fixes tests
2025-12-05 16:15:19 +01:00
anuunchin
aed34a065d extensive .gitignore (#3437) 2025-12-05 14:57:12 +01:00
ivasio
e8d45369f1 implements run artifacts sync to a bucket using filesystem (#3339)
* a tracker that sends pipeline trace, schemas and trace to a bucket is activated when RUN_ID and workspace context are present
* a sync step is executed under the conditions above when workspace dashboard starts
* improves deployment packager (hash computation)
2025-12-04 15:48:39 +01:00
ivasio
28557bc82f Add config value for Runtime CLI invite code (#3432)
* add invite code to WorkspaceRuntimeConfiguration

* add default values for api urls

* fix the url

---------

Co-authored-by: ivasio <ivan@dlthub.com>
2025-12-04 13:49:27 +01:00
ivasio
044ea90d0b add runtime CLI configs in WorkspaceRuntimeConfiguration (#3424)
* add runtime CLI configs in WorkspaceRuntimeConfiguration

* set default URLs to None

---------

Co-authored-by: ivasio <ivan@dlthub.com>
2025-12-03 16:37:27 +01:00
ivasio
8608197026 Fix: reset config in PluggableRunContext.reload_providers (#3409)
* implement RunContext.reset_config, call it in PluggableRunContext.reload_providers

* fix _config access

* reiinitialize RunContext._runtime_config on access

* adjust the test to .runtime_config being always available

* fixes dlthub tests

---------

Co-authored-by: ivasio <ivan@dlthub.com>
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-03 01:24:05 +01:00
ivasio
af8908968e reimplement, add tests (#3418)
Co-authored-by: ivasio <ivan@dlthub.com>
2025-12-02 23:02:28 +01:00
rudolfix
3e84f7aaa9 blocks failed sqlglot version, bumps sqlglot in lockfile (#3420) 2025-12-02 22:23:54 +01:00
rudolfix
dd38c80fb4 fixes arrow import in sql_database (#3411)
* fixes pyarrow import in sql_database

* bumps to 1.19.1

* linter fix

* fixes common workflow
2025-12-02 18:33:03 +01:00
David Scharf
e5977c1ace Fixes historic builds (#3412)
* fixes historic builds

* fix broken link

* constrain docs build env to python 3.10

* switch snippets testing to python 3.10

* allows python up to py3.12 in docs project

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-02 16:51:41 +01:00
rudolfix
a0e5bd073d bumps to version 1.19.0 (#3401)
* bumps to version 1.19.0

* fixes lakeformation test
2025-12-01 11:38:02 +01:00
Luqman
110b640a0a chore: add proper optional typehint to dlt/extract/hints.py module (#3332)
* chore: add proper optional typehint

* simplify write_disposition logic
2025-12-01 09:25:19 +01:00
rudolfix
f0349d7efc does not overwrite local file context in destination factory (#3398) 2025-11-28 21:39:53 +01:00
rudolfix
fc47edd280 ingests parquet into mssql, mysql and sqlite via ADBC (#3333)
* extracts adbc parquet load job with file format selector

* ports postgres parquet job to base job

* implements mssql adbc job

* adds pickle test for all destination caps

* adds dbc to adbc group, updates test workflow

* fixes sqlglot from find

* fixes docs

* adds sqlalchemy adbc docs

* adds support from sqllite and mysql in sqlalchemy

* fixes and tests str annotation resolving

* allows to disable adbc and does that in tests

* fixes imports

* docs lock bump

* fixes globalns extraction

* clarifies how adbc drivers are installed, implements fallback for postgres

* improves dashboard multi schema test

* fixes followup jobs

* fixes connection string escaping

* Update docs/website/docs/dlt-ecosystem/destinations/sqlalchemy.md

Co-authored-by: djudjuu <djudju@proton.me>

* removes code dedup

* fixes columns that receive None, simple and nested values

---------

Co-authored-by: djudjuu <djudju@proton.me>
2025-11-28 17:13:19 +01:00
anuunchin
e128a9e876 Programmatic section colors in dashboard (#3393) 2025-11-28 13:48:17 +01:00
rudolfix
e15f5510b3 sets ducklake fingerprint to storage fingerprint (#3388) 2025-11-27 17:08:46 +01:00
Thierry Jean
ff6d28185d docs: data_quality concept page (#3341)
* wrote data quality docs page
2025-11-26 10:18:11 -05:00
anuunchin
91eacbff4c Explicit passing of arguments to drop (#3386) 2025-11-26 15:54:44 +01:00
Will Russell
1ef1d37c0b Fix a few broken links on the Kestra page in the docs. It also updates the Docker image to use latest 2025-11-26 14:07:37 +01:00
Katharina Lenz
6c5e43218c docs/snowflake native app architecture docs (#3359) 2025-11-26 13:23:50 +01:00
rudolfix
cc3b88d73a (fix) 3351 fixes default type var (#3373)
* tests minimal typing extensions in alpine docker

* keeps typevar default but does not use it in the code for backwart compat
2025-11-26 09:26:52 +01:00
anuunchin
7d7b7af00c docs: lifecycle of @dlt.hub.transformation and dlt.Relation (#3329)
* Lifecycle of a dlt transformation

* Added test to match lifecycle docs
2025-11-25 14:55:51 -05:00
Jorrit Sandbrink
9619002c04 feat: snowflake clustering key modifications (#3365)
* add support for snowflake clustering key modifications

* add cluster column order test case

* update snowflake cluster hint docs

* switch to reading snowflake cluster hints from table schema
2025-11-25 17:39:13 +01:00
Menna
1e73d678ff Refactor boundary timestamp handling in SqlMergeFollowupJob and SqlalchemyMergeFollowupJob to ensure current load package creation time is used when no boundary timestamp is provided. Update DltResourceHints class to streamline timestamp validation for active_record_timestamp and boundary_timestamp. Adjust tests accordingly. (#3378) 2025-11-25 17:34:11 +01:00
Thierry Jean
382eb6bab7 feat: Schema.to_mermaid() (#3364)
* Add dlt.Schema.to_mermaid() method

---------

Co-authored-by: jayant <jayant746@gmail.com>
2025-11-24 22:31:59 -05:00
rudolfix
661c6c1ada fix flaky dashboard tests (#3370)
* improves dashboard multi schema test

* closes and waits for sections in multi-schema test

* removes command line snippet with generic text in exceptions

* disables transformers pokeapi test
2025-11-24 22:52:36 +01:00
anuunchin
81ebbcca43 Uncalled source in pipeline.run( (#3369) 2025-11-24 13:45:12 +01:00