* feat: implement advanced Iceberg partitioning with explicit ordering
- Add support for advanced partition transforms (year, month, day, hour, bucket, truncate)
- Implement explicit partition ordering via index property
- Add custom partition naming support
- Implement priority system: advanced partitioning overrides legacy partition: True
- Add comprehensive validation for partition specifications
- Add graceful error handling for PyIceberg limitations
- Add performance optimization with early exit for non-partitioned schemas
- Update schema typing to support dict/list partition syntax
- Add pyiceberg-core>=0.6.0 dependency for advanced transforms
- Add comprehensive test suite with 22+ test cases covering all scenarios
Backward compatible: existing partition: True syntax continues to work
Resolves partition ordering limitations in Iceberg table format
* Port iceberg_partition and build_iceberg_partition_spec to dlt core
* update type hint in IcebergLoadFilesystemJob
* Add tests for Iceberg advanced partitioning; remove unused partition extraction code
* Add docs for iceberg_adapter
---------
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
* bumps to version 1.20.0
* update the hub reference docs, add CI check
* use dependency specifier in hub for plugin version check
* minimum dlt runtime cli check
* rollaback to old fsspec min version
* fixes test_hub ci workflow
* fixes flaky test
* bumps hub extra
* updates cli docs linting
* fixes docs lock
---------
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: ivasio <ivan@dlthub.com>
* working copy of docs
* added diagram; wip
* checkpoint
* Misc docusaurus fixes
* Remove placeholder text and whitespace
* Move images to the gcp bucket
* add data quality section
* fixed linting
* Escape curly braces
---------
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
* adds hub extra
* makes hub module more user friendly when hub not installed
* test and lint fixes
* adds plugin version check util function
* basic cell appearing if installed
* use data quality cell
* show raw data too
* adds dlt-runtime to hub extra, minimal import tests
* bumps to dlthub 0.20.0 alpha
* lists pipelines with cli using the same functions as dashboard, dlt pipeline will list pipelines by default
* adds configured propfiles method on context so only profiles with configs or pipelines are listed
* adds list of locations that contained actual configs to provider interface
* improves workspace and profile commands
* test fixes
* fixes tests
* update text
* adds quality widget as python functions
* adds data_quality as module to hub
* adds hub extra to docs deps
* fixes dashboard imports
* bumps to alpha x.20.0a1
---------
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
* a tracker that sends pipeline trace, schemas and trace to a bucket is activated when RUN_ID and workspace context are present
* a sync step is executed under the conditions above when workspace dashboard starts
* improves deployment packager (hash computation)
* fixes historic builds
* fix broken link
* constrain docs build env to python 3.10
* switch snippets testing to python 3.10
* allows python up to py3.12 in docs project
---------
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
* extracts adbc parquet load job with file format selector
* ports postgres parquet job to base job
* implements mssql adbc job
* adds pickle test for all destination caps
* adds dbc to adbc group, updates test workflow
* fixes sqlglot from find
* fixes docs
* adds sqlalchemy adbc docs
* adds support from sqllite and mysql in sqlalchemy
* fixes and tests str annotation resolving
* allows to disable adbc and does that in tests
* fixes imports
* docs lock bump
* fixes globalns extraction
* clarifies how adbc drivers are installed, implements fallback for postgres
* improves dashboard multi schema test
* fixes followup jobs
* fixes connection string escaping
* Update docs/website/docs/dlt-ecosystem/destinations/sqlalchemy.md
Co-authored-by: djudjuu <djudju@proton.me>
* removes code dedup
* fixes columns that receive None, simple and nested values
---------
Co-authored-by: djudjuu <djudju@proton.me>
* add support for snowflake clustering key modifications
* add cluster column order test case
* update snowflake cluster hint docs
* switch to reading snowflake cluster hints from table schema
* improves dashboard multi schema test
* closes and waits for sections in multi-schema test
* removes command line snippet with generic text in exceptions
* disables transformers pokeapi test
* split home and workspace render methods
* header row dry-er
* catch-all errors in home()-cell
* local try-catch for broken traces
* e2e test for broken trace
* removes this
* shows navigation on pipeline attach error
---------
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
* make arrow_stream default return_type for connectorx backend
* formatting
* bump connectorx version
* return to arrow by default, keep arrow_stream support, add info message
* document arrow_stream cornercases in the docs
* add the test for connectorx arrow_stream return type
* fix formatting
* fix test typo
* fix the tests
* fix package version check, return original version constraint
* adds utils function to losless cast date64 to timestamp[us]
* cast date64 to timestamp for connectorx, update test
---------
Co-authored-by: ivasio <ivan@dlthub.com>
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
* updated the sql databases configuration docs
* Updated sql database and table sources as well which is nice
* updated
* Updated
* Updated docstrings for defer_table_reflect parameter in SQL Database source.
* Updated
* adds tools to generate api reference for workspace
* writes install, mcp, api reference and improves other docs in hub
* Apply suggestions from code review
Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>
* fixes free tier
---------
Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>
* Minor hub docs polishing
* fixes workflow setup wrt not running certain steps if there are only docs changes
* Remove the duplicate content
* Fix build
---------
Co-authored-by: David Scharf <shrps@posteo.net>
* adds option in load that prevents draining pool on signal
* adds runtime pipeline option to not intercept signals
* refactors signal module
* tests new cases
* describes signal handling in running in prod docs
* bumps dlt to 1.18.0
* fixes tests forked
* removes logging and buffered console output from signals
* adds retry count to load job metrics, generates started_at in init of runnable load job
* allows to update existing metrics in load step
* finalized jobs require start and finish dates
* generates metrics in each job state and in each completed loop, does not complete package if pool drained but jobs left, adds detailed tests for metrics
* fixes remote metrics
* replaces event with package bound semaphore to complete load jobs early
* fixes dashboard to on windows
* improves signals docs
* renames delayed_signals to intercepted_signals
* use dlt.Dataset query normalization in _DltBackend
* pass dlt SQL cursor to _DltBackend instead of return values
---------
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
* implements signal handlers that allow graceful shutdown on a first signal. tests pipelines in forked tests
* includes KeyboardInterrupt in exception handlers in Pipeline to leave proper trace
* saves package state on each batch in custom destination
* initializes step in progress collectors
* Add new dlthub structure for docs (#3199)
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
* fix: it should be destination (#3217)
* adds pokemon table count consts (#3232)
* fixes docstrings on signals
---------
Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>
Co-authored-by: Xiatong 夏童 <40656281+Magicbeanbuyer@users.noreply.github.com>
* add support for yield_map in rest resource, add tests
* fix tests
* document usage of yield_map in rest_api resource
* add record count asserts in tests
* formatting
---------
Co-authored-by: ivasio <ivan@dlthub.com>
* Feature, Add support of http based paths
* Feature, Add support of http resources
* Feature, Enforce coercion to pendulum types. Add support of RFC 1123 format
* Feature, Add cloudfront base_url to the configurations
* Feature, Add a test for http based resources
* Feature, Add a test case for RFC 1123 datetime format
* Feature, Remove test cases related to datetime parsing in RFC and timestamp formats
* Revert "Feature, Enforce coercion to pendulum types. Add support of RFC 1123 format"
This reverts commit 142624b24a.
* Feature, Restore the structure of the url for the cdn
* Feature, Replace custom datetime parser function with a single dispatched one
* Feature, Add a stub package for singledispatch
* Feature, Reffactor pendulume datetime processing functions
* Feature, Fix the linting errors in time related tests
* Feature, Fix the declaration
* Feature, Revert the changes related to datetime parsing
* Feature, Add http schema for testing. Add pendulum parser to support RFC 1123 format
* Feature, Update the configuration for http bucket
* Feature, Add a http server. Update the test for http fs
* Feature, Upgrade fsspec
* Feature, Fix codestyle
* Feature, Fix the protocol validation for fsspec args
* Feature, Fix the typing annotations
* Add an example for http filesystem
* Feature, Add schema to the urlparse call
* Feature, Fix the codestyle for http entries in MIME_DISPATCH
* Feature, Expand the list of supported locations in the docs
* uses more random port and closes httpd to release it properly, drops auto fixture as it would be attached to all tests
* moves httpd tests to common tests
* adds http extra to support fsspec
---------
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>