1350 Commits

Author SHA1 Message Date
anuunchin
6f925caa89 Fix/3464 sync error results in success label (#3492)
* Last executed info

* Any step failure results in failure badge

* Test adjustments
2025-12-17 17:12:32 +01:00
daniel-nagish
302dec4e20 feat: Support OAuth and base GCP credentials for BigQuery destination (#3382)
* feat: Support OAuth and base GCP credentials for BigQuery destination

Fixes #3380

- Add Union type to allow GcpOAuthCredentials and GcpCredentials
- Maintains backward compatibility with GcpServiceAccountCredentials
- Enables OAuth authentication for Workload Identity Federation
- Add tests for OAuth credentials acceptance

This change allows BigQuery destination to work with OAuth tokens
from GitHub Actions Workload Identity Federation and other OAuth flows,
without breaking existing service account authentication.

* remove baseclass from union

* fix tests

* lazy import google library

* type update

---------

Co-authored-by: djudjuu <djudju@proton.me>
2025-12-16 17:11:50 +01:00
anuunchin
052a15803d poke test disabled (#3487) 2025-12-16 11:53:44 +01:00
Somasundaram Sekar
87b812e3a6 feat(snowflake): add column_comment/description hint support (#3462)
Add support for column comments in Snowflake adapter, following the same
pattern as the Databricks adapter.

Changes:
- Add escape_snowflake_literal() function for proper SQL escaping
- Add COLUMN_COMMENT_HINT constant for Snowflake-specific hints
- Override _get_column_def_sql() to append COMMENT clause
- Support both generic "description" field and Snowflake-specific hint
- Add tests for column comments including special character escaping

Fixes #3312

Co-authored-by: Somasundaram Sekar <somasundaramsekar.1986@gmail.com>
2025-12-15 18:11:32 +01:00
Rakesh V.
34669f1ac7 Feat/iceberg advanced partitioning (#3053)
* feat: implement advanced Iceberg partitioning with explicit ordering

- Add support for advanced partition transforms (year, month, day, hour, bucket, truncate)
- Implement explicit partition ordering via index property
- Add custom partition naming support
- Implement priority system: advanced partitioning overrides legacy partition: True
- Add comprehensive validation for partition specifications
- Add graceful error handling for PyIceberg limitations
- Add performance optimization with early exit for non-partitioned schemas
- Update schema typing to support dict/list partition syntax
- Add pyiceberg-core>=0.6.0 dependency for advanced transforms
- Add comprehensive test suite with 22+ test cases covering all scenarios

Backward compatible: existing partition: True syntax continues to work
Resolves partition ordering limitations in Iceberg table format

* Port iceberg_partition and build_iceberg_partition_spec to dlt core

* update type hint in IcebergLoadFilesystemJob

* Add tests for Iceberg advanced partitioning; remove unused partition extraction code

* Add docs for iceberg_adapter

---------

Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
2025-12-12 10:57:56 +01:00
Menna
6658d5468d Fix load retrieval to only show loads that contain a schema name that is in the pipeline.schema_names (#3446)
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-10 14:56:12 +01:00
Anton Burnashev
1c49c2081c wait in marimo UI in test_e2e.py to for schema selection (#3453)
Should fix flaky test
2025-12-10 14:32:18 +01:00
djudjuu
be9aa1bf03 pyarrow: respect resource hints before extract (#3436)
* merge resource hints before extract for all backends

* check load package directly

* better type check

* log if unsupported hints

* better log message

* do not use ensure_table_schema_columns

* test for desired behavior

* refactor

* clarified test assertions

* lint
2025-12-10 14:31:39 +01:00
ivasio
99207237fe docs: add runtime docs to CLI reference (#3445)
* bumps to version 1.20.0

* update the hub reference docs, add CI check

* use dependency specifier in hub for plugin version check

* minimum dlt runtime cli check

* rollaback to old fsspec min version

* fixes test_hub ci workflow

* fixes flaky test

* bumps hub extra

* updates cli docs linting

* fixes docs lock

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: ivasio <ivan@dlthub.com>
2025-12-09 17:30:53 +01:00
rudolfix
a9b526e751 (feat) small dashboard improvements (#3450)
* enables child tables by default

* renames to internal tables
2025-12-09 09:18:06 +01:00
segetsy
c678d35343 [fix/3358] add pagination stopping to JSONResponseCursorPaginator (#3374)
* [fix/3358] add pagination stopping to JSONResponseCursorPaginator
* [fix/3358] add some tests when there are more pages
* [fix/3358] fix naming
* [fix/3374] make stop_after_empty_page robust to data = None
* [fix/3358] align has more handling with RangePaginator and add test cases
* Compile path in __init__
short-circuit on empty page before touching has_more

---------

Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
2025-12-08 16:29:43 +01:00
rudolfix
3e11effbdb implements cancellation of normalize jobs (#3444)
* allows load jobs to separately set failed message and exception to be re-raised

* allows to cancel normalize via flag in load package, returns metrics when failed

* corrects cleaning of current load id, re-raises job exceptions in load, passes load and job ids in exception chain

* adds warnings on pending and partially loaded packages in pipeline failed exception

* creates schema when package is created

* makes internal pipeline load storage readonly

* fixes test

* fixes utime on windows

* review code reorg
2025-12-08 14:10:19 +01:00
djudjuu
289e00dece data quality checks cell in dashboard (#3413)
* adds hub extra

* makes hub module more user friendly when hub not installed

* test and lint fixes

* adds plugin version check util function

* basic cell appearing if installed

* use data quality cell

* show raw data too

* adds dlt-runtime to hub extra, minimal import tests

* bumps to dlthub 0.20.0 alpha

* lists pipelines with cli using the same functions as dashboard, dlt pipeline will list pipelines by default

* adds configured propfiles method on context so only profiles with configs or pipelines are listed

* adds list of locations that contained actual configs to provider interface

* improves workspace and profile commands

* test fixes

* fixes tests

* update text

* adds quality widget as python functions

* adds data_quality as module to hub

* adds hub extra to docs deps

* fixes dashboard imports

* bumps to alpha x.20.0a1

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-07 12:59:21 +01:00
rudolfix
06bc05848b (chore) adds hub extra (#3428)
* adds hub extra

* makes hub module more user friendly when hub not installed

* test and lint fixes

* adds plugin version check util function

* adds dlt-runtime to hub extra, minimal import tests

* bumps to dlthub 0.20.0 alpha

* lists pipelines with cli using the same functions as dashboard, dlt pipeline will list pipelines by default

* adds configured propfiles method on context so only profiles with configs or pipelines are listed

* adds list of locations that contained actual configs to provider interface

* improves workspace and profile commands

* test fixes

* fixes tests
2025-12-05 16:15:19 +01:00
ivasio
e8d45369f1 implements run artifacts sync to a bucket using filesystem (#3339)
* a tracker that sends pipeline trace, schemas and trace to a bucket is activated when RUN_ID and workspace context are present
* a sync step is executed under the conditions above when workspace dashboard starts
* improves deployment packager (hash computation)
2025-12-04 15:48:39 +01:00
ivasio
8608197026 Fix: reset config in PluggableRunContext.reload_providers (#3409)
* implement RunContext.reset_config, call it in PluggableRunContext.reload_providers

* fix _config access

* reiinitialize RunContext._runtime_config on access

* adjust the test to .runtime_config being always available

* fixes dlthub tests

---------

Co-authored-by: ivasio <ivan@dlthub.com>
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-03 01:24:05 +01:00
ivasio
af8908968e reimplement, add tests (#3418)
Co-authored-by: ivasio <ivan@dlthub.com>
2025-12-02 23:02:28 +01:00
rudolfix
3e84f7aaa9 blocks failed sqlglot version, bumps sqlglot in lockfile (#3420) 2025-12-02 22:23:54 +01:00
rudolfix
dd38c80fb4 fixes arrow import in sql_database (#3411)
* fixes pyarrow import in sql_database

* bumps to 1.19.1

* linter fix

* fixes common workflow
2025-12-02 18:33:03 +01:00
rudolfix
a0e5bd073d bumps to version 1.19.0 (#3401)
* bumps to version 1.19.0

* fixes lakeformation test
2025-12-01 11:38:02 +01:00
rudolfix
f0349d7efc does not overwrite local file context in destination factory (#3398) 2025-11-28 21:39:53 +01:00
rudolfix
fc47edd280 ingests parquet into mssql, mysql and sqlite via ADBC (#3333)
* extracts adbc parquet load job with file format selector

* ports postgres parquet job to base job

* implements mssql adbc job

* adds pickle test for all destination caps

* adds dbc to adbc group, updates test workflow

* fixes sqlglot from find

* fixes docs

* adds sqlalchemy adbc docs

* adds support from sqllite and mysql in sqlalchemy

* fixes and tests str annotation resolving

* allows to disable adbc and does that in tests

* fixes imports

* docs lock bump

* fixes globalns extraction

* clarifies how adbc drivers are installed, implements fallback for postgres

* improves dashboard multi schema test

* fixes followup jobs

* fixes connection string escaping

* Update docs/website/docs/dlt-ecosystem/destinations/sqlalchemy.md

Co-authored-by: djudjuu <djudju@proton.me>

* removes code dedup

* fixes columns that receive None, simple and nested values

---------

Co-authored-by: djudjuu <djudju@proton.me>
2025-11-28 17:13:19 +01:00
rudolfix
e15f5510b3 sets ducklake fingerprint to storage fingerprint (#3388) 2025-11-27 17:08:46 +01:00
anuunchin
91eacbff4c Explicit passing of arguments to drop (#3386) 2025-11-26 15:54:44 +01:00
rudolfix
cc3b88d73a (fix) 3351 fixes default type var (#3373)
* tests minimal typing extensions in alpine docker

* keeps typevar default but does not use it in the code for backwart compat
2025-11-26 09:26:52 +01:00
anuunchin
7d7b7af00c docs: lifecycle of @dlt.hub.transformation and dlt.Relation (#3329)
* Lifecycle of a dlt transformation

* Added test to match lifecycle docs
2025-11-25 14:55:51 -05:00
Jorrit Sandbrink
9619002c04 feat: snowflake clustering key modifications (#3365)
* add support for snowflake clustering key modifications

* add cluster column order test case

* update snowflake cluster hint docs

* switch to reading snowflake cluster hints from table schema
2025-11-25 17:39:13 +01:00
Menna
1e73d678ff Refactor boundary timestamp handling in SqlMergeFollowupJob and SqlalchemyMergeFollowupJob to ensure current load package creation time is used when no boundary timestamp is provided. Update DltResourceHints class to streamline timestamp validation for active_record_timestamp and boundary_timestamp. Adjust tests accordingly. (#3378) 2025-11-25 17:34:11 +01:00
Thierry Jean
382eb6bab7 feat: Schema.to_mermaid() (#3364)
* Add dlt.Schema.to_mermaid() method

---------

Co-authored-by: jayant <jayant746@gmail.com>
2025-11-24 22:31:59 -05:00
rudolfix
661c6c1ada fix flaky dashboard tests (#3370)
* improves dashboard multi schema test

* closes and waits for sections in multi-schema test

* removes command line snippet with generic text in exceptions

* disables transformers pokeapi test
2025-11-24 22:52:36 +01:00
anuunchin
81ebbcca43 Uncalled source in pipeline.run( (#3369) 2025-11-24 13:45:12 +01:00
anuunchin
033312d373 Fix: The child table column remains in the schema as a partial column with seen-null-first=True (#3131)
* child table column removed from parent

* A utility functin that checks whether a column has seen-null-first set

* Improved comments and docstrings, separate method in worker

* null column not inferred if exists as compound

* Column level x-normalizer cleaning moved outside of worker

* Test for empty column becoming compound

* Test clean_seen_null_first_hint
2025-11-23 18:42:33 +01:00
rudolfix
5242790b13 (fix) use sparse checkout for dlt init dlthub (#3356)
* adds option to sparse checkout repo

* use sparse checkout for llm context

* fixes sqlglot from find

* adds checkout after sparse clone

* explains unknown path tests
2025-11-22 12:15:40 +01:00
djudjuu
bbc1cb81cd fix: dashboard no longer crashes on broken home cell (#3348)
* split home and workspace render methods

* header row dry-er

* catch-all errors in home()-cell

* local try-catch for broken traces

* e2e test for broken trace

* removes this

* shows navigation on pipeline attach error

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-11-21 20:56:28 +01:00
rudolfix
c29264114f fix: backwards compatible traces (#3354)
* makes trace backward compat with 1.17.0 and earlier

* skips trace if any error in unpickle

* always saves merged pipeline trace to have consistent pipeline.last_trace property

* tests for past traces, broken traces and other improvements
2025-11-21 09:09:05 -05:00
rudolfix
8bd0b116fb fixes athena refresh mode (#3313)
* adds filter to exclude dropped tables in staging destination, implements for athena

* enables refresh mode tests for athena, fixes tests

* fixes staging_allowed_local_path on databricks, bumps databricks connector in lockfile

* passes dropped tables schemas to filter, adjust athena filter

* allows to disable lake formation
2025-11-21 10:58:54 +01:00
rudolfix
3bd5099951 fixes sqlglot from find (#3357) 2025-11-20 22:42:36 +01:00
David Scharf
4a5ffd82b3 Chore: Update docs npm dependencies and clean up docs build tooling (#3247)
* bump npm deps

* remove unneeded netlify redirects file

* remove unneeded lockfile

* remove another unneeded lockfile

* post rebase lockfile update

* remove old netlify command

* create new docs tools project and move api docs gen there

* tmp

* add uv to build docs workflow

* move docs pyproject

* re-org docs pcakage and move snippet linter

* move notebook linting commands and deps to tools folder
add flake8 to tools linting

* remove unneeded files

* fix linting and formatting errors

* remove wrong file

* move docs processing script to new package

* fix gen api ref

* clean up package json and use commands from parent makefile

* update build website workflow

* move linting to docs makefile partially

* fix python version for docs project

* consolidate docs commands in docs makefile

* fix docs linter

* fully update docs test flow

* fixes some linting and dependency problems

* fix constants

* move notebook formatting to docs project

* fix lint embedded snippets

* fix examples tests

* add missing dependencies

* fix snippet linting

* add missing lint dependencies to core and missing test dependencies to docs

* add missing weaviate

* add missing regex module

* add forked dependency and updates readme file

* revert accidental change to example

* fix main linter

* * Move relevant pytest options to subproject
* Remove shims / path inserts that are now managed by pytest options
* Some typing fixes
* Clean up base project pytest ini
* Enable transformation snippets tests

* remove unneeded raw import of intro snippets

* downgrade alive progress

* uses dlt logger which also fixes internal alive error

* enables transformation snippets linting

* fixes dashboard races again

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-11-16 18:01:30 +01:00
Taha Muzammil
7b6f8c4ebd fix: minor typos and redundant variable (#3314) 2025-11-14 21:52:56 +01:00
anuunchin
e7e54b2cdf Feat: last pipeline run section in dashboard (#3250)
* Initial commit

* Html cleaned

* Summary moved to home section, migration badge added

* Load package status badges improved

* Test getting steps data, migrations count

* Various tests

* Fix in test

* Styles moved, improved ui

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-11-14 19:58:06 +01:00
ivasio
dc1a0467f8 Feat: support return_type = arrow_stream for connectorx backend (#3218)
* make arrow_stream default return_type for connectorx backend

* formatting

* bump connectorx version

* return to arrow by default, keep arrow_stream support, add info message

* document arrow_stream cornercases in the docs

* add the test for connectorx arrow_stream return type

* fix formatting

* fix test typo

* fix the tests

* fix package version check, return original version constraint

* adds utils function to losless cast date64 to timestamp[us]

* cast date64 to timestamp for connectorx, update test

---------

Co-authored-by: ivasio <ivan@dlthub.com>
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-11-14 19:57:06 +01:00
Menna
4d25a6c5b5 feat/3198-add-workspace-info-and-profile-selection
Added a dropdown for profile selection in the dashboard interface and updated the layout to display profile and workspace information inline with pipeline selection.
2025-11-14 18:44:45 +01:00
Menna
8a16442293 fix/3165: Athena LakeFormation permissions are required even tho Lakeformation is not used
Fixed the bug that enforces the need for Lakeformation permissions when Lakeformation is not being used.
2025-11-12 14:09:36 +01:00
rudolfix
d671376e68 fixes default limit in ibis backend (#3273) 2025-11-01 16:34:14 -04:00
rudolfix
4a431d60ed refresh docs intro (#3270)
* renames pipeline to workspace dashboard

* refreshes intro

* review changes

* sidebar, references, dataset.table( cleanup
2025-10-31 17:14:49 +01:00
rudolfix
192296f4f8 fixes git import and enables tests (#3262)
* enable hub tests

* removes erroneous git import

* enables tests with importing dlt into minimal alpine container

* imports workspace modules on demand

* bumps dlt to version 1.18.1

* fixes mssql hub test on mac

* review fixes
2025-10-29 21:32:07 +01:00
Thierry Jean
0bdf8dc424 feat: add dlt.hub.data_quality entrypoint (#3259) 2025-10-29 08:12:49 -04:00
Marcin Rudolf
df8ccecbb8 fixes flaky signal tests in pipelines 2025-10-28 14:03:36 +01:00
rudolfix
e56f617c0e adds more signal options (#3248)
* adds option in load that prevents draining pool on signal

* adds runtime pipeline option to not intercept signals

* refactors signal module

* tests new cases

* describes signal handling in running in prod docs

* bumps dlt to 1.18.0

* fixes tests forked

* removes logging and buffered console output from signals

* adds retry count to load job metrics, generates started_at in init of runnable load job

* allows to update existing metrics in load step

* finalized jobs require start and finish dates

* generates metrics in each job state and in each completed loop, does not complete package if pool drained but jobs left, adds detailed tests for metrics

* fixes remote metrics

* replaces event with package bound semaphore to complete load jobs early

* fixes dashboard to on windows

* improves signals docs

* renames delayed_signals to intercepted_signals
2025-10-28 13:56:24 +01:00
anuunchin
449d914d7a Fix: Empty columns that were previously flattened into compound ones violate freeze contract (#3226)
* Initial commit

* adds commented out test case that leaves columns with None

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-10-27 23:20:05 +01:00