* feat: implement advanced Iceberg partitioning with explicit ordering
- Add support for advanced partition transforms (year, month, day, hour, bucket, truncate)
- Implement explicit partition ordering via index property
- Add custom partition naming support
- Implement priority system: advanced partitioning overrides legacy partition: True
- Add comprehensive validation for partition specifications
- Add graceful error handling for PyIceberg limitations
- Add performance optimization with early exit for non-partitioned schemas
- Update schema typing to support dict/list partition syntax
- Add pyiceberg-core>=0.6.0 dependency for advanced transforms
- Add comprehensive test suite with 22+ test cases covering all scenarios
Backward compatible: existing partition: True syntax continues to work
Resolves partition ordering limitations in Iceberg table format
* Port iceberg_partition and build_iceberg_partition_spec to dlt core
* update type hint in IcebergLoadFilesystemJob
* Add tests for Iceberg advanced partitioning; remove unused partition extraction code
* Add docs for iceberg_adapter
---------
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
* bumps to version 1.20.0
* update the hub reference docs, add CI check
* use dependency specifier in hub for plugin version check
* minimum dlt runtime cli check
* rollaback to old fsspec min version
* fixes test_hub ci workflow
* fixes flaky test
* bumps hub extra
* updates cli docs linting
* fixes docs lock
---------
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: ivasio <ivan@dlthub.com>
* adds hub extra
* makes hub module more user friendly when hub not installed
* test and lint fixes
* adds plugin version check util function
* basic cell appearing if installed
* use data quality cell
* show raw data too
* adds dlt-runtime to hub extra, minimal import tests
* bumps to dlthub 0.20.0 alpha
* lists pipelines with cli using the same functions as dashboard, dlt pipeline will list pipelines by default
* adds configured propfiles method on context so only profiles with configs or pipelines are listed
* adds list of locations that contained actual configs to provider interface
* improves workspace and profile commands
* test fixes
* fixes tests
* update text
* adds quality widget as python functions
* adds data_quality as module to hub
* adds hub extra to docs deps
* fixes dashboard imports
* bumps to alpha x.20.0a1
---------
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
* adds hub extra
* makes hub module more user friendly when hub not installed
* test and lint fixes
* adds plugin version check util function
* adds dlt-runtime to hub extra, minimal import tests
* bumps to dlthub 0.20.0 alpha
* lists pipelines with cli using the same functions as dashboard, dlt pipeline will list pipelines by default
* adds configured propfiles method on context so only profiles with configs or pipelines are listed
* adds list of locations that contained actual configs to provider interface
* improves workspace and profile commands
* test fixes
* fixes tests
* extracts adbc parquet load job with file format selector
* ports postgres parquet job to base job
* implements mssql adbc job
* adds pickle test for all destination caps
* adds dbc to adbc group, updates test workflow
* fixes sqlglot from find
* fixes docs
* adds sqlalchemy adbc docs
* adds support from sqllite and mysql in sqlalchemy
* fixes and tests str annotation resolving
* allows to disable adbc and does that in tests
* fixes imports
* docs lock bump
* fixes globalns extraction
* clarifies how adbc drivers are installed, implements fallback for postgres
* improves dashboard multi schema test
* fixes followup jobs
* fixes connection string escaping
* Update docs/website/docs/dlt-ecosystem/destinations/sqlalchemy.md
Co-authored-by: djudjuu <djudju@proton.me>
* removes code dedup
* fixes columns that receive None, simple and nested values
---------
Co-authored-by: djudjuu <djudju@proton.me>
* adds option in load that prevents draining pool on signal
* adds runtime pipeline option to not intercept signals
* refactors signal module
* tests new cases
* describes signal handling in running in prod docs
* bumps dlt to 1.18.0
* fixes tests forked
* removes logging and buffered console output from signals
* adds retry count to load job metrics, generates started_at in init of runnable load job
* allows to update existing metrics in load step
* finalized jobs require start and finish dates
* generates metrics in each job state and in each completed loop, does not complete package if pool drained but jobs left, adds detailed tests for metrics
* fixes remote metrics
* replaces event with package bound semaphore to complete load jobs early
* fixes dashboard to on windows
* improves signals docs
* renames delayed_signals to intercepted_signals
* Feature, Add support of http based paths
* Feature, Add support of http resources
* Feature, Enforce coercion to pendulum types. Add support of RFC 1123 format
* Feature, Add cloudfront base_url to the configurations
* Feature, Add a test for http based resources
* Feature, Add a test case for RFC 1123 datetime format
* Feature, Remove test cases related to datetime parsing in RFC and timestamp formats
* Revert "Feature, Enforce coercion to pendulum types. Add support of RFC 1123 format"
This reverts commit 142624b24a.
* Feature, Restore the structure of the url for the cdn
* Feature, Replace custom datetime parser function with a single dispatched one
* Feature, Add a stub package for singledispatch
* Feature, Reffactor pendulume datetime processing functions
* Feature, Fix the linting errors in time related tests
* Feature, Fix the declaration
* Feature, Revert the changes related to datetime parsing
* Feature, Add http schema for testing. Add pendulum parser to support RFC 1123 format
* Feature, Update the configuration for http bucket
* Feature, Add a http server. Update the test for http fs
* Feature, Upgrade fsspec
* Feature, Fix codestyle
* Feature, Fix the protocol validation for fsspec args
* Feature, Fix the typing annotations
* Add an example for http filesystem
* Feature, Add schema to the urlparse call
* Feature, Fix the codestyle for http entries in MIME_DISPATCH
* Feature, Expand the list of supported locations in the docs
* uses more random port and closes httpd to release it properly, drops auto fixture as it would be attached to all tests
* moves httpd tests to common tests
* adds http extra to support fsspec
---------
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
* does not fail config resolution if native valued provided to a config that does not implement native values
* updates databricks docs
* allows to replace hints regexes on schema
* removes partition hint on eth merge test on databricks
* adds pokemon table count consts
* reorgs databricks dlt fix
* fixes lancedb custom destination example
* fixes lancedb custom destination example
* reduces no sql_database examples run on ci
* fixes merge
* marks and skips rfam tests
* Refactor: Replace hexbytes dependency with custom HexBytes implementation
* Removed the hexbytes library and integrated a custom HexBytes class to ensure compatibility with the codebase.
* Updated imports across multiple files to use the new HexBytes class.
* Added tests for the HexBytes class to validate its functionality and ensure proper behavior with various input types.
* Update hexbytes error handling test to reject lists as input type
* Remove TypeError test for unsupported list input in HexBytes error handling
* Refactor: Improve formatting of hex method in HexBytes class for better readability
* Refactor: Clean up comments and improve readability in hex method of HexBytes class
* Refactor: Rename methods in HexBytes class for clarity and consistency
* Updated method names from `to_bytes` to `_to_bytes` and `hexstr_to_bytes` to `_hexstr_to_bytes` to indicate their private nature.
* Adjusted method calls within the class to reflect the new names, enhancing code readability and maintainability.
* * Removed support for bool and int types in HexBytes constructor, streamlining input handling and Introduced a new fromhex method to create HexBytes from hex strings, improving clarity.
* Remove hexbytes dependency from lockfile and related configurations
* Enhance hex method in HexBytes class to support custom separators and bytes per separator. This improves flexibility in hex encoding output while maintaining the existing functionality.
* Refactor hex method in HexBytes class to improve parameter handling and readability. Updated the method signature to clarify the use of custom separators and bytes per separator, ensuring consistent behavior with existing functionality.
* Update hex method in HexBytes class to remove unnecessary noqa comments, enhancing code clarity and consistency.
* adds selective required context, checks profile support in switch_profile
* creates and tests hub module
* adds plugin version to telemetry
* renames imports in docs
* renames ci workflows
* fixes lint
* tests deploy command on duckdb
* moves cli module to workspace
* moves cli tests to workspace module
* renames fixtures, rewrites fixture to patch run context to _storage
* allows to patch global dir in workspace context
* when finding git repo, does not look up if GIT_CEILING_DIRECTORIES is set
* imports git utils only when need to clone package in dbt runner
* runs workspace tests as part of common
* fixes tests, config tests sideeffects
* moves dashboards to workspace
* fixes pipeline trace test
* moves dashboard helper tests
* excludes additional secret files and pinned profile from gitignore
* cleansup hatchling files in pyproject
* fixes dashboard running tests in ci
* moves git module to libs
* diff fix
* fixes fixture names
* ports toml config provider with profiles
* supports run context with profiles
* separates pluggy hooks from impls, uses pyproject and __plugins__.py for self-plugging
* implements workspace run context with profiles and basic cli
* displays workspace name and profile name before executing cli commands if run context supports profiles
* exposes dlt.current.workspace()
* converts run context protocol into abstract class
* fixes plugins tests
* refactors _workspace: private and public modules
* adds workspace test cases
* launches workspace and pipeline mpc with cli, sse by default
* tests basic workspace behaviors
* refactors code to switch context and profile
* adds default profile to run context interface
* ports pipeline and oss mcp, changes derivation structure
* adds safeguards and tests to workspace cleanup cli helper
* adds run_context to SupportsPipeline, checks run_context change on pipeline activation
* adds mcp dependency to workspace extra, fixes types
* renames test fixture
* mcp export tweak
* updates cli reference and common ci workflow
* disables dlt-plus deps in ci
* removes df from mcp tools, fixes workspace tests
* fixes tests
* move duckdb capabilities to utility function
* add basic DuckLake files based on DuckDB / Motherduck
* refactor ducklake config
* wip; ducklake destination
* simplified testing
* ignore ducklake files
* completed default config; TODO fix write
* unicode issues
* commented out patches
* lint
* uses destination_type as final fallback when creating default local file names, allows to copy local file context in WithLocalFiles
* creates connection pool for duckdb
* fixes exception handling in open_connection in sql_client, fixes racing when connections opened in duckdb, improves error handling if commit tx fails
* handles ducklake attach/detach in sql_client
* modifes ducklake configuration to: (1) use sqllite as default catalog (2) point all local files to local_dir (3) allow various urls to configure ducklake name (4) uses parquet as default file format
* adjust caps to execute load jobs sequentially for duckdb and sqllite catalogs
* passes ducklake conn to ibis, improves how duckb conn is passed (via open_connection which provides full context)
* adds configuration and credential tests, smoke tests for supported catalogs
* enables ducklake on ci
* fixes ducklake imports
* fixes how secrets are created from filesystem
* generates remote_url in load job metrics with real url of the ducklake table
* tests for all buckets
* adds ducklake extra
* adds hints for secrets.toml gen
* implements cursor for ducklake with correct df vector size
* forces use of ducklake/duckdb datasets in ibis handover, tests non existing dataset behavior
* removes dashboard e2e from common tests on ci
* docs WIP
* implements field resolution check and recursive copy for base configuration
* copies credentials before using as default when resolving capabilities
* allows recursive resolution traces in config field missing exception
* improves config resolve: collects traces recursive, keeps resolving if embedded config fails, collects resolved keys
* decouples connection string credentials and base duckdb credentials
* improves how duckdb handles exceptions when executing query
* makes catalog name explicit in ducklake credentials, creates default db and storage folder names after it
* supports ducklake partitioning on duckdb 1.4
* supports metadata schema on postgres, adds experimental ducklake catalog support on Motherduck
* fixes union config resolve with single base config in union
* docs WIP
* enabled ducklake remote test
* improves ibis filesystem con handover, enables databricks
* fixes tests
* fixes lancedb default name
* propagates only top level config section, replaces with embedded field name in other cases
* adds tests and examples for programmatic creation of ducklake facotry
* adds merge selector in duckdb caps to enable upsert on 1.4
* ducklake code cleanups
* makes sure pipeline is dropped before run_context goes out of scope
* finalizes ducklake docs
* fallback in duckdb merge selector if duckdb not installed
* propagates persist_secret flag in filesystem sql client
* fixes tests and ci
* runs remote ducklake on local postgres catalog for low latency
* uses packaging version, not semver for python packages comparisons
* Update docs/website/docs/dlt-ecosystem/destinations/duckdb.md
* fixes recursive re-raise in sql_client
---------
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
* adds databricks timestamp NTZ
* improves error messages in pyarrow tuples to arrow
* decreases timestamp precision to 6 for mssql
* adds naive datetime to all data types case, enables fallback when testing destinations not supporting it
* other test fixes
* always stores incremental state last value as present in the data, tests tz-awareness edge cases
* fixes ntz timestamp tests
* fixes sqlalchemy destination to work with mssql
* adds func to current module to get current resource instance
* generates LIMIT clause in sql_database when limit step is present
* adds basic tests for mssql in sql_database
* adds docs on tz-awareness in datetime columns in sql_database
* adds naive an tz aware datetimes to destination caps, implements for various destinations
* caches dlt type to python type conversion
* normalizes timezone handling in timestamp and time data types, fixes remaining pendulum timezone problems, applies tz/non-tz preserving methods when necessary, improves test converage
* fixes incremental and lag so they always follow the tz-awareness of the data under cursor column, fixes pendulum tz problems, adds tests
* moves schema inference and data coercion from Schema to item_normalizers, applies timezone normalization to json data, adjusts new columns to destination caps for json data, tests
* casts timezones in arrow table normalizations, datetime and time cases in row tuples to arrow, refactors to get generic method to cast tables to dlt schemas, tests
* tracks resource parent, along pipe parent, fixes resource cloning when adding to source, fixes source and resource iterators, makes sure that list of extracted resources always includes implicit and explicit resources
* updates dbapi sql client for dremio
* adjust column schema inferred from arrow to destination caps in extractor, tests
* moves schema and data setup for all data types tests to common code
* adds option to exclude columns in sql_table, uses LimitItem to generate LIMIT statements, tests incl. proper cursor tests for naive/tz aware incremental cursor columns
* tests sql_database on mssql for all data types and incremental cursor on dates
* improves tests for row tuples to arrow with cast to dlt schema, tests for naive datetimes
* improved test for timestamps and int with precision on duckdb
* disables Python 3.14 tests and dashboard test on mac
* better maybe transaction in job client: takes into account ddl and regular transaction destination caps
* pyodbc py3.13 bump
* Add pkey, disabled_algorithms, transport_factory and auth_strategy parameters to paramiko.client.connect. Also update filesystem docs for SFTP creds
* Move paramiko imports after the pytest.skip
---------
Co-authored-by: Ayush Patel <Ayush.Patel@imc.com>
* bumps to version 1.15.0
* handled duckdb 1.3.2 in iceberg scanner and bumps dev version - seems to work with adlfs
* binds old dev duckb on windows until segfault is fixed
* test fixes, docs update
* replace arrow2 with arrow backend for connectorx
* updated docs/
* updated minimal deps
* update docs and pyproject.toml deps
* updated minimal deps to support 3.9
* converts +00:00 to UTC right after handover from connectorx
* fixes examples connectorx lint
---------
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
* dbml WIP
* dbml exporter
* full reference support; Schema.to_dbml()
* revert uv.lock changes
* fixed condition for _dlt tables ref
* rename _dbml.py to private module; use json encoder
* use TStoredSchema as entrypoint
* implementation completed
* added documentation
* added CLI support
* support unknown data type
* please the linter gods
* minified the image
* enables dbml for schema export
* image link to bucket; renamed constant
* updated docs linting
* include recommended VSCode extension
---------
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
* enable 3.14 with orjson branch
* make example plugin a uv project
* post rebase pyproject update
* fix one dependency
update readme
* update readme about python 3.14
* run full linter step on docs changes
* disable dashboard e2e tests on 3.11
enable dashboard e2e and unit tests on 3.13
* bump marimo min dependency
* Revert "Auxiliary commit to revert individual files from 52165eaeeb543932bc917bb5efc373c02ab2937b"
This reverts commit b7c5baf7c0c51e67ad323cd1b2cb9423f48f4165.
* re-lock changes
* revert incorrect change in secrets toml
* adds dlt workspace extra, updates exception and github workflows
* renames app from "marimo app" to "pipeline dashboard"
updates --marimo flag to --dashboard
* rename studio folders to dashboard
* removes all other references to studio
* exclude lockfile and markdown files from lfs
* update workspace extra dependency versions
* bump version
* bump to latest lancedb
* do not pass api-key to embedding_func, align schema for orphan deletion
* bump lancedb
* updated example
* use pyarrow helpers in type mapper
* removes code duplication from lancedb_client, moves jobs to a separate module
* sets nullability, fixes schema on merge to include vector column if not added by the user, removes nullability on auto-embed columns in adapter
* read vector field from config
* fix nullability test hint
* unit test add_vector_column
* more specific ValueError parsing
* no longer accept value error when opening table
* schema alignment test next versions
* no fusion datatype typecasting
* refactor
* problems with json loading
* test fixes
* fixes column normalization when reading existing schema
* warn against orphan removal without settings
* added docs
* todos, check for merge-disposition
* fixed missing load tests
* fixed tests
* fixed multiple merge keys condition
* pyarrow precision types
* remove unused code
* added max precision in LanceDB tests
* remove arrow to fsiont_tupe tests
* refactor
* prepare_load_table in orphan removal job
* documentation update
* refactor
* adds method to get dict of non-default values from configuration
* moves parquet and csv format configuration from data writers to destination
* adds parquet format to destination caps to allow lancedb to have custom settings
* adds more lancedb configs, moves connect method to credentials, allows lancedb client to be passed instead of creds
* forces arrow list struct to be saved in parquet, not the parquet default
* looks for row key only for merge disposition
* moves fill_empty_source_column_values_with_placeholder to pyarrow helper
* tests bring own vector and explicit client as credentials
* ignores lancedb in mypy.ini
* adds missing docs
* deprecates file format configs in data writers
* fix unit tests for add_vector_column
* adjust example code to updated lancedb exceptions
* skip lancedb example (because running on fork breaks)
---------
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: MOLKA ZHANI <molka@dlthub.com>
* moves source state handling to extract, uses contextvars to propagate current pipe context, does not store last state in global var
* implements thread pool with shutdown timeout, adds warning when threads do not join, switch default method to spawn if in orchestrator
* detects prefect, dagster and marimo in telemetry
* propagates pipe context in pipe iterator using contextvars
* cleansup dlt.current module
* enables running in wasm/pyodide
* bumps for 1.12.4a0 wasm release
* Update tests/common/runners/test_runners.py
Co-authored-by: djudjuu <djudju@proton.me>
---------
Co-authored-by: djudjuu <djudju@proton.me>
* feat(athena): apply lakeformation tags on database
* uses credentials from destination to create tags, creates tags together with schema migration
* extracts athena sql_client to a separate module
* fixes lakeformation tests to use ci credentials and to run tests selectively
* adds snippet lang in docs
---------
Co-authored-by: Alexander Hagelborn <alex@datadao.se>
* make dlt app ejectable
* update app file url in makefile and tests
add missing stylesheet to package
* start marimo app in process
* convert caching toggle to button for clearer use
* exlcude incomplete columns
* adds a bunch of tests for marimo app utils
* make normalized query output pretty and disable tests on 3.9
* filter out incomplete tables
* update cli strings and small changes to app ejection
* run all common tests with resolution-lowest on sync
* make model item normalizer tests pass, disable on time test for now
* fix duckdb instantiation for old versions
bump pyarrow to have version that supports "append_column" on recordbatch
exclude deltalake tests for too low pyarrow versions
* fixes errors in makefile
bump minimum pytest version to what was in lockfile
* bump pendulum min requirement
* fix common test file
* bump ibis dependency
* go back to old version of pendulum
bump to prerelease