4027 Commits

Author SHA1 Message Date
Anton Burnashev
c42ca688d5 rest_api: remove the unused exceptions file (#3143) 2025-09-30 10:52:29 +02:00
djudjuu
2f518dae4a docs/prefect integration (#3037)
* prefect integration docs

* decomposition helper

* rewrite

* decomposition image

* text update

* images from bucket

* fixed absolute link

* fix broken link to custom callbacks

* bad snippet

* show it in sidebar

* Apply suggestions from code review

Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>

* rewording

* updated parallelization docs

* missing comma

* post merge fixes

---------

Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>
Co-authored-by: dave <shrps@posteo.net>
2025-09-29 16:41:47 +02:00
David Scharf
210dd3780f move test of newest lib version to macos (#3142) 2025-09-29 14:15:37 +02:00
Jorrit Sandbrink
195a029685 Remove obsolete instruction from CONTRIBUTING.md (#3135)
* remove obsolete instruction

* Revert "remove obsolete instruction"

This reverts commit 24cde85adc.

* change instruction for venv activation
2025-09-29 09:58:37 +02:00
David Scharf
c1af18819d Fixes more links (#3127) 2025-09-24 15:23:08 +02:00
David Scharf
133b60a7cf Fix remaining routing problems in cloudflare setup (#3124)
* remove dev vars
route to devel docs in preview
remove unneeded path normalization

* fix links on sql_database page
handle trailing slashes in cloudflare worker settings

* remove staging deployment

* fix two more links and temporarily disable error on broken links to test deployment

* re-enable throwing on broken links in docusaurus

* fix some more links

* Revert "Auxiliary commit to revert individual files from 5ff91d5dad5a70f80a1876e3f58b3c9a8fd66d53"

This reverts commit 29f79b93dac885c0137845230dfe5fc47a66bef3.

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-09-24 15:16:27 +02:00
David Scharf
3765dd83b1 Fix remaining routing problems in cloudflare setup (#3126)
* remove dev vars
route to devel docs in preview
remove unneeded path normalization

* fix links on sql_database page
handle trailing slashes in cloudflare worker settings

* remove staging deployment

* fix two more links and temporarily disable error on broken links to test deployment

* re-enable throwing on broken links in docusaurus

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-09-24 15:01:12 +02:00
rudolfix
4361afe8de Merge pull request #3121 from dlt-hub/devel
master merge for 1.17.0 release
1.17.0
2025-09-24 12:50:40 +02:00
Marcin Rudolf
53b94352cb bump to version 1.17.0 2025-09-24 08:30:48 +02:00
Thierry Jean
8565a2ac06 feat: ducklake destination (#3015)
* move duckdb capabilities to utility function

* add basic DuckLake files based on DuckDB / Motherduck

* refactor ducklake config

* wip; ducklake destination

* simplified testing

* ignore ducklake files

* completed default config; TODO fix write

* unicode issues

* commented out patches

* lint

* uses destination_type as final fallback when creating default local file names, allows to copy local file context in WithLocalFiles

* creates connection pool for duckdb

* fixes exception handling in open_connection in sql_client, fixes racing when connections opened in duckdb, improves error handling if commit tx fails

* handles ducklake attach/detach in sql_client

* modifes ducklake configuration to: (1) use sqllite as default catalog (2) point all local files to local_dir (3) allow various urls to configure ducklake name (4) uses parquet as default file format

* adjust caps to execute load jobs sequentially for duckdb and sqllite catalogs

* passes ducklake conn to ibis, improves how duckb conn is passed (via open_connection which provides full context)

* adds configuration and credential tests, smoke tests for supported catalogs

* enables ducklake on ci

* fixes ducklake imports

* fixes how secrets are created from filesystem

* generates remote_url in load job metrics with real url of the ducklake table

* tests for all buckets

* adds ducklake extra

* adds hints for secrets.toml gen

* implements cursor for ducklake with correct df vector size

* forces use of ducklake/duckdb datasets in ibis handover, tests non existing dataset behavior

* removes dashboard e2e from common tests on ci

* docs WIP

* implements field resolution check and recursive copy for base configuration

* copies credentials before using as default when resolving capabilities

* allows recursive resolution traces in config field missing exception

* improves config resolve: collects traces recursive, keeps resolving if embedded config fails, collects resolved keys

* decouples connection string credentials and base duckdb credentials

* improves how duckdb handles exceptions when executing query

* makes catalog name explicit in ducklake credentials, creates default db and storage folder names after it

* supports ducklake partitioning on duckdb 1.4

* supports metadata schema on postgres, adds experimental ducklake catalog support on Motherduck

* fixes union config resolve with single base config in union

* docs WIP

* enabled ducklake remote test

* improves ibis filesystem con handover, enables databricks

* fixes tests

* fixes lancedb default name

* propagates only top level config section, replaces with embedded field name in other cases

* adds tests and examples for programmatic creation of ducklake facotry

* adds merge selector in duckdb caps to enable upsert on 1.4

* ducklake code cleanups

* makes sure pipeline is dropped before run_context goes out of scope

* finalizes ducklake docs

* fallback in duckdb merge selector if duckdb not installed

* propagates persist_secret flag in filesystem sql client

* fixes tests and ci

* runs remote ducklake on local postgres catalog for low latency

* uses packaging version, not semver for python packages comparisons

* Update docs/website/docs/dlt-ecosystem/destinations/duckdb.md

* fixes recursive re-raise in sql_client

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
2025-09-24 08:27:16 +02:00
David Scharf
c4d365c106 filter out assets from tracked spans (#3116) 2025-09-23 09:55:36 +02:00
anuunchin
6de404d23a Feat: allowing custom metrics to be added to dlt resources and transform steps (#3078)
* Custom metrics added to resource as well as transform steps

* custom metrics merged into DataWriterAndCustomMetrics

* Resource.md adjusted

* Simplified DataWriterAndCustomMetrics, improved tests

* custom metrics in transforms moved to base class
2025-09-22 21:41:03 +02:00
anuunchin
3879f144af no circular error cause (#3111) 2025-09-22 20:47:58 +02:00
Anton Burnashev
20c44851c9 cli: updated error in the dlt pipeline show command (#3095)
* Updated to correctly reference the "show" command; removed redundant text about additional dependencies

* Update dlt/helpers/dashboard/runner.py

* Detect dashboard command
2025-09-22 20:42:11 +02:00
David Scharf
b923062c51 Docs docusaurus / cloudflare fixes (#3114)
* bump all dependencies

* fix one admonition

* normalize docs urls

* migrate depcreated admonitions

* fix admonition type for source info header

* some comments
2025-09-22 18:32:17 +02:00
David Scharf
7d2dcaa770 Updates CONTRIBUTING.md and README.md to remove outdated information and add more info (#3101)
* grammar correct and format contribution guide

* update existing testing section

* add testing tips and tricks

* small  updates to readme file
2025-09-22 18:07:20 +02:00
David Scharf
024980693f Docs Cloudflare worker deployment (#3105)
* Docs Cloudflare worker deployment (#3104)

* add wrangler config

* fix wrangler config

* add wrangler to dev deps
add stage domain route
enable preview urls

* change docusaurus base url

* add worker to docs deployment

* add basic roots

* enable logs and add 404 route

* disable worker rewriting

* fix urls locally and deployed

* add tracking to docs deployment
add cloudflare commands to package json

* include old redirects

* update readme file

* add updated routing and updated dataset for production
2025-09-22 13:38:28 +02:00
rudolfix
b062dcafa4 docs/removes dlt plus docs and adds eula (#3079)
* answers defaults in cli if tty disconnected

* adds method to send anon tracker event even if disabled

* fixes types in source/resource build in generator

* adds dlt.hub with transformation decorator

* moves dlt-plus to separate sidebar in docs, renames to dltHub Features, adds EULA

* renamed plus to hub in docs

* fixes docs logos

* removes more dlt+

* renames plus tests

* fixes ci run main

* fixes hub workflows
2025-09-21 00:15:08 +02:00
rudolfix
6f015553eb feat/explains partition and split loading (#2737)
* extracts a method to count rows in items in data writers

* drains mssql cursor from recordsets, disables multi-statement execution due to driver problems

* allows to enable and disable root key propagation via source setting, uses normalizer config prop, adds tests and docs

* allows to use parent_key if nesting level < 2

* documents standard http session settings, sets shorter timeouts in telemetry

* adds way to count rows in add_limit, fixes edge cases

* propagates error when generating sql jobs

* skips two step table create in pyiceberg if no partitions

* adds docs and examples for backfilling

* excludes md from lfs

* fixes incorrect exit condition in python object incremental open start range

* simplifies and documents pipeline drop

* updates tables in schema in nesting order

* makes encoding NotRequired in FileItem

* makes filesystem source to follow row_order

* explains partition and split loading, sql_database tests and examples

* fixes add_limit max_items and legacy root key with tests

* fixes docs link

* fixes and tests schema.drop_tables

* improves docs, fixes links and tests

* tests, docs and regression fixes

* also counts empty pages in add_limit

* fixes scd2 tests

* fixes wrong root_key usage in scd2 sqlalchemy

* fixes tests

* makes Incremental to return None on fully filtered batches

* review fixes

* fixes tx scope in backfill db test
2025-09-20 11:53:26 +02:00
ianedmundson1
855536cab7 fix: fixed error in import of BaseOperator in airflow_helper.py (#2601) (#3043)
Co-authored-by: Ian Edmundson <imedmundson@outlook.com>
2025-09-19 23:29:30 +02:00
David Scharf
58ae6303c7 Run common and dashboard tests also with newest available allowed packages for all deps (#3100)
* run common and dashboard tests also with newest available packages

* fix language in code block

* make basic tests works with updated versions of dependent packages
2025-09-19 08:52:46 +02:00
Menna
8d67e869ed Fix/3047 prevent same naming for staging and final datasets (#3096)
* feat: add method to create dataset names with validation for staging dataset (#XXXX)

* Introduced `create_dataset_names` static method in `WithStagingDataset` class.
* Added validation to ensure staging dataset name is not the same as the final dataset name, raising a ValueError if they match.
* Updated documentation for the new method and error handling.

* refactor: enhance create_dataset_names method with improved error messaging and add unit tests

* Reformatted the `create_dataset_names` method for better readability.
* Improved error messages to clarify the consequences of identical dataset names.
* Added unit tests for `create_dataset_names` to validate functionality and error handling.

* refactor: update dataset name handling in destination clients

* Replaced direct calls to `normalize_dataset_name` and `normalize_staging_dataset_name` with a new method `create_dataset_names` in multiple destination client classes.
* This change improves consistency in dataset name generation across various implementations, ensuring proper handling of dataset names and staging datasets.

* refactor: improve formatting and readability of dataset name assignments in destination clients

* Updated dataset name assignments in multiple destination client classes to enhance readability by breaking long lines.
* Maintained consistency in the use of the `create_dataset_names` method across implementations.

* refactor: improve formatting and readability of dataset name assignments in destination clients

* Updated dataset name assignments in multiple destination client classes to enhance readability by breaking long lines.
* Maintained consistency in the use of the `create_dataset_names` method across implementations.

* Improve pipeline dashboard test coverage (#3091)

* disable most tests

* try correct windows command for runnig marimo e2e tests

* try without timeout

* test only launch marimo

* bump python version

* try install playwright deps

* fix e2e tests for dashboard on windows

* enable e2e tests for dashboard

* test macos 14 for dashboard e2e tests

* add basic tests for ui elements

* improve ui elements tests

* revert changes to main github workflow

* review fixes

---------

Co-authored-by: Your Name <you@example.com>

* Fix/67 normalizer child table behavior (#3048)

* add code to fix behavior of normalizer when None or primitives are encountered for child tables

(cherry picked from commit 5f442781d8c14592db646e3245c7c2a86ada3e3c)

* fixes one existing test that would not work with cached schema otherwise

* add tests and small fixes to dashboard

* fix implementation and add more tests

* Long names handled, get_nested_tables test, cached table lookups

* relational normalizer returns unshortened parent bath

* Schema contract test added

---------

Co-authored-by: anuunchin <88698977+anuunchin@users.noreply.github.com>

* Add redirect from dlt-plus page (#3084)

* docs: update staging dataset naming guidelines to prevent data loss

Added important notes and examples to the staging dataset configuration section, emphasizing the need for unique names between staging and final datasets to avoid `ValueError` and potential data loss during setup commands.

---------

Co-authored-by: David Scharf <shrps@posteo.net>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: anuunchin <88698977+anuunchin@users.noreply.github.com>
Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>
2025-09-19 08:33:10 +02:00
Violetta Mishechkina
47460c2ef6 Add redirect from dlt-plus page (#3084) 2025-09-18 13:47:09 +02:00
David Scharf
c8b3da5b52 Fix/67 normalizer child table behavior (#3048)
* add code to fix behavior of normalizer when None or primitives are encountered for child tables

(cherry picked from commit 5f442781d8c14592db646e3245c7c2a86ada3e3c)

* fixes one existing test that would not work with cached schema otherwise

* add tests and small fixes to dashboard

* fix implementation and add more tests

* Long names handled, get_nested_tables test, cached table lookups

* relational normalizer returns unshortened parent bath

* Schema contract test added

---------

Co-authored-by: anuunchin <88698977+anuunchin@users.noreply.github.com>
2025-09-18 09:57:05 +02:00
David Scharf
d143c29e35 Improve pipeline dashboard test coverage (#3091)
* disable most tests

* try correct windows command for runnig marimo e2e tests

* try without timeout

* test only launch marimo

* bump python version

* try install playwright deps

* fix e2e tests for dashboard on windows

* enable e2e tests for dashboard

* test macos 14 for dashboard e2e tests

* add basic tests for ui elements

* improve ui elements tests

* revert changes to main github workflow

* review fixes

---------

Co-authored-by: Your Name <you@example.com>
2025-09-17 19:58:18 +02:00
Andrei Bondarenko
31a9c64bc1 fix: convert local file path to posix before PUT to Databricks destination (#3086)
* fix: convert local file path to posix before PUT

* fix: formatting

---------

Co-authored-by: Andrei Bondarenko <bondarenko.andrei@DEME-GROUP.COM>
2025-09-17 09:54:10 +02:00
rik-adegeest
3939c4a491 Fix parameter reference in IncrementalCursorPathHasValueNone exception message (#3070) 2025-09-16 16:44:10 +02:00
David Scharf
431c6b6f48 add -s flag to read command in publish-library command in Makefile (#3089) 2025-09-16 13:30:03 +02:00
David Scharf
e0c6d2061b Improved pipeline attach command and Dashboard launcher extensions (#3060)
* prototype for remote attaching and launching dashboard against script path

* Revert "prototype for remote attaching and launching dashboard against script path"

This reverts commit 46edfa06de98d0ffa135a46b989593a9dd2fe1e8.

* sync_destination if pipeline not attachable

* add port argument to dashboard launcher

* add host arg to marimo

* incorporate review notes and test all edge cases

* wording fixes
2025-09-16 13:28:48 +02:00
Anton Burnashev
261efce2ef restclient: misc Paginators improvements (#2924)
* Adding keyword-only arguments separator to `RangePaginator`, `PageNumberPaginator`, and `OffsetPaginator`
* Use str instead of TJsonPath for body paths
* Enforce parameter validation
2025-09-16 13:16:16 +03:00
Thierry Jean
b428bfa3d2 repo(ci): disable docker container autorestart (#3083)
* disable docker container autorestart

* update compose with container_name
2025-09-15 12:50:13 +02:00
rudolfix
5236e15f2a documents standard http session settings, sets shorter timeouts in telemetry (#3074) 2025-09-11 21:02:45 +02:00
rudolfix
61625b047b do not change initial_value of incremental once state is created (#3075) 2025-09-11 21:02:13 +02:00
rudolfix
54ad6373e6 dashboard: fixes file opener on WSL (#3076)
* fixes file opener on WSL

* binds dremio container to 25.0 to avoid MAX() bug
2025-09-11 16:45:41 +02:00
David Scharf
71a2fbb3cb Merge pull request #3073 from dlt-hub/devel
Master Merge for 1.16 Release
1.16.0
2025-09-10 08:50:47 +02:00
David Scharf
90819a9618 Merge branch 'master' into devel 2025-09-10 06:32:48 +02:00
David Scharf
801eb285a2 bumps to version 1.16 (#3071) 2025-09-10 06:31:11 +02:00
David Scharf
5d29c0ded0 Dashboard updates and fixes (#3055)
* fix bug in child tables data browsing

* fixes streamlit launch, prevents streamlit launch after marimo launch

* disables trace json serialization

* removes streamlit hot reload cli flag

* fix smaller bugs and start adding parametrized tests to pipeline utils

* update cli docs

* parametrize utils tests with different pipeline types and states

* start fixing e2e tests

* change filesystem bucket url

* move example pipelines into separate folder

* extracts more helpers into utils
improves error handling and messaging

* add more tests and move sql query under utils exception wrapper

* final fixes to e2e test and add no destination pipeline to unit tests

* render mo tables in unit tests for applicable helper functions
use mo.json object view for state in all cases instead of yaml

* allow map_nested_in_place to also process keys
use this in trace sanitizing
use repr to keep nested hint keys and show a good string representation
add test case that makes sure traces of nested hints can be rendered

* update e2e tests to respect new json view of state

* remove cloning of dict from map_nested_in_place

* remove streamlit mentions and add marimo references in appropriate places

* update dashboard page and insert some images

* separate mapping function for nested keys and values

* update dashboard utils to new mapping function

* post merge fixes

* add dlt+ fix for backwards compatibility

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-09-09 16:01:02 +02:00
Thierry Jean
0e464a65fa feat(dataset): simplify public interface for dlt.Dataset and dlt.Relation (#3059)
* remove Protocols from public interface

* remove use of Protocols internally

moved docstring to implementation;
moved Ibis import to lazy location

* cleaned up dataset public interface

* cleaned up relation public interface

* formatting and linting

* fix recursion error

* fix docs reference

* linting / format

* added test to normalize and qualify queries

* added back .columns_schema; saved destination reference on dataset

* fix; format; lint

* added test for is_instance_lib()

* renamed .scalar() to .fetchscalar()

* renamed sqlglot_dialect to destination_dialect

* rename .cursor to private method

* left TODO for new exception class

* update docs snippets

* add back Relation.query_dialect property

* lint and format

* revert a small change that had unwanted side-effects

---------

Co-authored-by: David Scharf <shrps@posteo.net>
2025-09-09 12:45:59 +02:00
anuunchin
46a64ac4f4 Docs: Forcing root key propagation section improved (#3063)
* Root key propagation example improved
2025-09-08 19:29:52 +02:00
dat-a-man
3812c8f517 Updated resource docs on info for materializing empty tables (#2973)
* Updated resource docs on info for materializing empty tables

* Updated

* updated

* Updated

* Updated
2025-09-08 14:09:08 +02:00
anuunchin
7614bb757e Feat: dataset access telemetry (#3056)
* Dataset access telemetry tracker

* With dataset access telemetry delegated to a simple function

* on_first_dataset_access fun added to track.py
2025-09-04 21:16:09 +02:00
Alena Astrakhantseva
5442f2078f Release notes 1.15 (#3038)
* release highlights v1.15

* add to sidebar

* fix link

* fix snippet

* fix snippet

* remove bold font from titles

* remove diagram from table of content

* remove callbacks, change ai command
2025-09-03 14:23:27 +02:00
David Scharf
a75151e7e4 add up to date check for uv lockfile as first lint step (#3052)
* add check for uv lockfile

* update lockfile

* add some info about lockfiles to contributing.md
2025-09-02 17:09:10 +02:00
anuunchin
096d769828 Docs: Education notebooks formatted and linted (#3017)
* Formated and linted ed content

* Notebook filenames lowercased, no special chars
2025-09-02 08:41:47 +02:00
Jinso-o
9e42cbd621 Jinso o fix/cors playground (#2995)
* Update playground.py

Replace REST demo with DummyJSON users (CORS-safe for Playground)

* update with actual use case

* feat: improve playground example with real API data

- Switch from dummyjson to JSONPlaceholder users API
- Add dev_mode=True for better development experience
- Fix function parameter (remove unused dlt parameter)
- Update table references from 'items' to 'users' consistently
- Fix test assertion to expect 10 users instead of 50
- Add write_disposition='replace' to prevent data duplication
- Improve data display with proper return statements

* feat(playground): make pipeline idempotent with refresh=True

* fix: remove problematic sqlite3 micropip install causing linting failure

* lint: ruff fix & format playground notebook

* Playground: updated assert

* feat: add improved Marimo with better optimization and formatting

* code edit : removed argument refresh to avoid users confusion

* feat: update playground to use customers pipeline

- Changed from users API (jsonplaceholder) to customers API (jaffle-shop)
- Updated resource name from 'users' to 'customers'
- Updated pipeline name from 'users_pipeline' to 'customers_pipeline'
- Updated all table references and test assertions
- Improved response handling with proper yield from pattern

* fix: resolve linting issues in playground notebook

- Fixed unused marimo import (auto-removed by ruff)
- Fixed unused variable 'con' by returning it from connect function
- Applied proper code formatting with ruff
- All tests pass locally

* Address review feedback: update yield and add limit parameter

- Change 'yield from response.json()' to 'yield response.json()' as requested by reviewer
- Add ?limit=100 parameter to API call for consistent results
- Update assertion to expect exactly 100 customers (== 100)
- Addresses feedback from AstrakhantsevaAA in PR #2995
2025-09-01 15:46:57 +02:00
David Scharf
4af60c80e6 fix grammar on timezone specific docs sections (#3044) 2025-09-01 15:36:06 +02:00
rudolfix
823bf3865f fully support naive and tz-aware timestamp/time data types (#2570)
* adds databricks timestamp NTZ

* improves error messages in pyarrow tuples to arrow

* decreases timestamp precision to 6 for mssql

* adds naive datetime to all data types case, enables fallback when testing destinations not supporting it

* other test fixes

* always stores incremental state last value as present in the data, tests tz-awareness edge cases

* fixes ntz timestamp tests

* fixes sqlalchemy destination to work with mssql

* adds func to current module to get current resource instance

* generates LIMIT clause in sql_database when limit step is present

* adds basic tests for mssql in sql_database

* adds docs on tz-awareness in datetime columns in sql_database

* adds naive an tz aware datetimes to destination caps, implements for various destinations

* caches dlt type to python type conversion

* normalizes timezone handling in timestamp and time data types, fixes remaining pendulum timezone problems, applies tz/non-tz preserving methods when necessary, improves test converage

* fixes incremental and lag so they always follow the tz-awareness of the data under cursor column, fixes pendulum tz problems, adds tests

* moves schema inference and data coercion from Schema to item_normalizers, applies timezone normalization to json data, adjusts new columns to destination caps for json data, tests

* casts timezones in arrow table normalizations, datetime and time cases in row tuples to arrow, refactors to get generic method to cast tables to dlt schemas, tests

* tracks resource parent, along pipe parent, fixes resource cloning when adding to source, fixes source and resource iterators, makes sure that list of extracted resources always includes implicit and explicit resources

* updates dbapi sql client for dremio

* adjust column schema inferred from arrow to destination caps in extractor, tests

* moves schema and data setup for all data types tests to common code

* adds option to exclude columns in sql_table, uses LimitItem to generate LIMIT statements, tests incl. proper cursor tests for naive/tz aware incremental cursor columns

* tests sql_database on mssql for all data types and incremental cursor on dates

* improves tests for row tuples to arrow with cast to dlt schema, tests for naive datetimes

* improved test for timestamps and int with precision on duckdb

* disables Python 3.14 tests and dashboard test on mac

* better maybe transaction in job client: takes into account ddl and regular transaction destination caps

* pyodbc py3.13 bump
2025-08-31 20:06:22 +02:00
Thierry Jean
0d90a83b8d repo: add ruff check for linting (#2967)
* Config ruff `check` 

* Add `ruff` to existing `flake8` linting for transition period
2025-08-29 11:13:26 -04:00
Thierry Jean
b29f33c5ed feat: dlt widgets for marimo (#3021)
* added marimo widget + tutorial

* load package inspector added

* added schema inspector

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-08-27 13:25:23 -04:00