1782 Commits

Author SHA1 Message Date
anuunchin
266052eb76 Docs: Converting Jupyter notebooks in education to marimo notebooks (#3068)
* Initial commit

* lesson_1_quick_start adjusted for marimo

* lesson_2_dlt_sources_and_resources_create_first_dlt_pipeline marimo

* Fundamentals course 3 improved

* Marimo badges added

* Fundamenta: course 8

* Marimo badge link fix

* Fundamentals: course 7

* Fundamentals: course 6

* Fundamentals: course 5

* Fundamentals: cousre 4

* Fundamentals: course 3

* Fundamentals: course 2

* Fundmantals: course 1

* marimo links corrected

* Inline deps

* Fundamentals: fix lesson 2

* Fundamentals: fix lesson 3

* Fundamentals: fix lesson 4

* Formatting moved to build-molabs

* Fundamentals: fix lesson 5

* Removal of scrolls

* Fundamentals: fix lesson 6

* Fundamentals: fix lesson 7

* Fundamentals: fix lesson 8

* os.environ replaced with dlt.secrets where relevant

* Advanced: fix lesson 5

* Advanced fix lesson 9

* os.environ fixes

* Advanced: fix lesson 1

* Comments cleanup

* Additional comment removal, fix lesson 6 advanced

* Clean main makefile

* Get rid of constants.py

* Nicer json.loads()

* Better functions in preprocess_to_molab

* Tests for doc tooling funcs

* Validate molab command

* Marimo check added

* docs pages adjustment

* limits sqlglot in dev group until fixed

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-16 16:30:32 +01:00
Rakesh V.
34669f1ac7 Feat/iceberg advanced partitioning (#3053)
* feat: implement advanced Iceberg partitioning with explicit ordering

- Add support for advanced partition transforms (year, month, day, hour, bucket, truncate)
- Implement explicit partition ordering via index property
- Add custom partition naming support
- Implement priority system: advanced partitioning overrides legacy partition: True
- Add comprehensive validation for partition specifications
- Add graceful error handling for PyIceberg limitations
- Add performance optimization with early exit for non-partitioned schemas
- Update schema typing to support dict/list partition syntax
- Add pyiceberg-core>=0.6.0 dependency for advanced transforms
- Add comprehensive test suite with 22+ test cases covering all scenarios

Backward compatible: existing partition: True syntax continues to work
Resolves partition ordering limitations in Iceberg table format

* Port iceberg_partition and build_iceberg_partition_spec to dlt core

* update type hint in IcebergLoadFilesystemJob

* Add tests for Iceberg advanced partitioning; remove unused partition extraction code

* Add docs for iceberg_adapter

---------

Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
2025-12-12 10:57:56 +01:00
ivasio
99207237fe docs: add runtime docs to CLI reference (#3445)
* bumps to version 1.20.0

* update the hub reference docs, add CI check

* use dependency specifier in hub for plugin version check

* minimum dlt runtime cli check

* rollaback to old fsspec min version

* fixes test_hub ci workflow

* fixes flaky test

* bumps hub extra

* updates cli docs linting

* fixes docs lock

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: ivasio <ivan@dlthub.com>
2025-12-09 17:30:53 +01:00
Violetta Mishechkina
d0dc21bd45 Add runtime tutorial draft (#3449)
* Add tutorial draft

* lint: Line breaks in tutorial

* improves workspace and profiles docs

* moves snowflake docs

* fixes deprecated docusaurus broken links handlers

* updates docs lock

* Update the runtime part

* Final fixes

---------

Co-authored-by: elvis kahoro <github@elvis.ai>
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-09 17:09:58 +01:00
Thierry Jean
d17b0cb93d docs: LLM workflow update (#3422)
* working copy of docs

* added diagram; wip

* checkpoint

* Misc docusaurus fixes

* Remove placeholder text and whitespace

* Move images to the gcp bucket

* add data quality section

* fixed linting

* Escape curly braces

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
2025-12-09 16:56:39 +01:00
King Chung Huang
c4515d7112 Add offset/limit body_path fields to OffsetPaginatorConfig (#3260)
* Add offset/limit body_path fields
* Add offset/limit body_path
* Remove duplicate line

---------

Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
2025-12-08 14:42:27 +01:00
Anton Burnashev
d0fb75b747 Skip examples requiring secrets on fork PRs (#3438)
Fixes CI failures for external contributors
2025-12-08 12:09:56 +01:00
djudjuu
289e00dece data quality checks cell in dashboard (#3413)
* adds hub extra

* makes hub module more user friendly when hub not installed

* test and lint fixes

* adds plugin version check util function

* basic cell appearing if installed

* use data quality cell

* show raw data too

* adds dlt-runtime to hub extra, minimal import tests

* bumps to dlthub 0.20.0 alpha

* lists pipelines with cli using the same functions as dashboard, dlt pipeline will list pipelines by default

* adds configured propfiles method on context so only profiles with configs or pipelines are listed

* adds list of locations that contained actual configs to provider interface

* improves workspace and profile commands

* test fixes

* fixes tests

* update text

* adds quality widget as python functions

* adds data_quality as module to hub

* adds hub extra to docs deps

* fixes dashboard imports

* bumps to alpha x.20.0a1

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-07 12:59:21 +01:00
ivasio
e8d45369f1 implements run artifacts sync to a bucket using filesystem (#3339)
* a tracker that sends pipeline trace, schemas and trace to a bucket is activated when RUN_ID and workspace context are present
* a sync step is executed under the conditions above when workspace dashboard starts
* improves deployment packager (hash computation)
2025-12-04 15:48:39 +01:00
David Scharf
e5977c1ace Fixes historic builds (#3412)
* fixes historic builds

* fix broken link

* constrain docs build env to python 3.10

* switch snippets testing to python 3.10

* allows python up to py3.12 in docs project

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-02 16:51:41 +01:00
rudolfix
fc47edd280 ingests parquet into mssql, mysql and sqlite via ADBC (#3333)
* extracts adbc parquet load job with file format selector

* ports postgres parquet job to base job

* implements mssql adbc job

* adds pickle test for all destination caps

* adds dbc to adbc group, updates test workflow

* fixes sqlglot from find

* fixes docs

* adds sqlalchemy adbc docs

* adds support from sqllite and mysql in sqlalchemy

* fixes and tests str annotation resolving

* allows to disable adbc and does that in tests

* fixes imports

* docs lock bump

* fixes globalns extraction

* clarifies how adbc drivers are installed, implements fallback for postgres

* improves dashboard multi schema test

* fixes followup jobs

* fixes connection string escaping

* Update docs/website/docs/dlt-ecosystem/destinations/sqlalchemy.md

Co-authored-by: djudjuu <djudju@proton.me>

* removes code dedup

* fixes columns that receive None, simple and nested values

---------

Co-authored-by: djudjuu <djudju@proton.me>
2025-11-28 17:13:19 +01:00
Thierry Jean
ff6d28185d docs: data_quality concept page (#3341)
* wrote data quality docs page
2025-11-26 10:18:11 -05:00
Will Russell
1ef1d37c0b Fix a few broken links on the Kestra page in the docs. It also updates the Docker image to use latest 2025-11-26 14:07:37 +01:00
Katharina Lenz
6c5e43218c docs/snowflake native app architecture docs (#3359) 2025-11-26 13:23:50 +01:00
anuunchin
7d7b7af00c docs: lifecycle of @dlt.hub.transformation and dlt.Relation (#3329)
* Lifecycle of a dlt transformation

* Added test to match lifecycle docs
2025-11-25 14:55:51 -05:00
Jorrit Sandbrink
9619002c04 feat: snowflake clustering key modifications (#3365)
* add support for snowflake clustering key modifications

* add cluster column order test case

* update snowflake cluster hint docs

* switch to reading snowflake cluster hints from table schema
2025-11-25 17:39:13 +01:00
Menna
1e73d678ff Refactor boundary timestamp handling in SqlMergeFollowupJob and SqlalchemyMergeFollowupJob to ensure current load package creation time is used when no boundary timestamp is provided. Update DltResourceHints class to streamline timestamp validation for active_record_timestamp and boundary_timestamp. Adjust tests accordingly. (#3378) 2025-11-25 17:34:11 +01:00
Thierry Jean
382eb6bab7 feat: Schema.to_mermaid() (#3364)
* Add dlt.Schema.to_mermaid() method

---------

Co-authored-by: jayant <jayant746@gmail.com>
2025-11-24 22:31:59 -05:00
rudolfix
661c6c1ada fix flaky dashboard tests (#3370)
* improves dashboard multi schema test

* closes and waits for sections in multi-schema test

* removes command line snippet with generic text in exceptions

* disables transformers pokeapi test
2025-11-24 22:52:36 +01:00
djudjuu
bbc1cb81cd fix: dashboard no longer crashes on broken home cell (#3348)
* split home and workspace render methods

* header row dry-er

* catch-all errors in home()-cell

* local try-catch for broken traces

* e2e test for broken trace

* removes this

* shows navigation on pipeline attach error

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-11-21 20:56:28 +01:00
rudolfix
c943d1c898 (docs) adds community destinations (#3326)
* adds community destinations

* Apply suggestions from code review

applies crate fixes

Co-authored-by: Andreas Motl <andreas.motl@elmyra.de>

---------

Co-authored-by: Andreas Motl <andreas.motl@elmyra.de>
2025-11-21 20:12:05 +01:00
Violetta Mishechkina
b08f2334a8 docs: update weaviate destination docs and version (#3352) 2025-11-20 15:45:00 -05:00
Anton Burnashev
fa06885fe2 Fix DocSearch v4 styles (#3338)
* Fix DocSearch v4 styles
* Fix search input styles for light and dark modes
2025-11-20 15:21:02 +01:00
David Scharf
4a5ffd82b3 Chore: Update docs npm dependencies and clean up docs build tooling (#3247)
* bump npm deps

* remove unneeded netlify redirects file

* remove unneeded lockfile

* remove another unneeded lockfile

* post rebase lockfile update

* remove old netlify command

* create new docs tools project and move api docs gen there

* tmp

* add uv to build docs workflow

* move docs pyproject

* re-org docs pcakage and move snippet linter

* move notebook linting commands and deps to tools folder
add flake8 to tools linting

* remove unneeded files

* fix linting and formatting errors

* remove wrong file

* move docs processing script to new package

* fix gen api ref

* clean up package json and use commands from parent makefile

* update build website workflow

* move linting to docs makefile partially

* fix python version for docs project

* consolidate docs commands in docs makefile

* fix docs linter

* fully update docs test flow

* fixes some linting and dependency problems

* fix constants

* move notebook formatting to docs project

* fix lint embedded snippets

* fix examples tests

* add missing dependencies

* fix snippet linting

* add missing lint dependencies to core and missing test dependencies to docs

* add missing weaviate

* add missing regex module

* add forked dependency and updates readme file

* revert accidental change to example

* fix main linter

* * Move relevant pytest options to subproject
* Remove shims / path inserts that are now managed by pytest options
* Some typing fixes
* Clean up base project pytest ini
* Enable transformation snippets tests

* remove unneeded raw import of intro snippets

* downgrade alive progress

* uses dlt logger which also fixes internal alive error

* enables transformation snippets linting

* fixes dashboard races again

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-11-16 18:01:30 +01:00
ivasio
dc1a0467f8 Feat: support return_type = arrow_stream for connectorx backend (#3218)
* make arrow_stream default return_type for connectorx backend

* formatting

* bump connectorx version

* return to arrow by default, keep arrow_stream support, add info message

* document arrow_stream cornercases in the docs

* add the test for connectorx arrow_stream return type

* fix formatting

* fix test typo

* fix the tests

* fix package version check, return original version constraint

* adds utils function to losless cast date64 to timestamp[us]

* cast date64 to timestamp for connectorx, update test

---------

Co-authored-by: ivasio <ivan@dlthub.com>
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-11-14 19:57:06 +01:00
Martin Bach
4224e88c29 Docs: fix footer in darkmode, add scaffoldigns link (#3309) 2025-11-14 13:15:49 +01:00
dat-a-man
7dfa61fc60 updated the sql databases configuration docs (#3107)
* updated the sql databases configuration docs

* Updated sql database and table sources as well which is nice

* updated

* Updated

* Updated docstrings for defer_table_reflect parameter in SQL Database source.

* Updated
2025-11-14 11:55:16 +01:00
Alena Astrakhantseva
f18f6b8d4a Update deploy-with-dagster.md (#3287) 2025-11-13 16:36:33 +01:00
dat-a-man
4fa832ee9e Add example to SQL docs: updated docs on how to filter rows using query_adapter_callback (#3253)
* Updated docs final

* Updating the section, making it LLM friendly

* Minor liniting errors
2025-11-13 16:12:59 +01:00
ivasio
0243f95781 fix formatting (#3305)
Co-authored-by: ivasio <ivan@dlthub.com>
2025-11-12 16:56:44 -05:00
molkazhani2001
229f05f42f add init_replication description and required permissions (#3020)
* add init_replication description and required permissions

* add unit 3

* pl_replciation cleaned up

* pg replication docs improved

* pg_replicaiton.md made clearer

---------

Co-authored-by: anuunchin <88698977+anuunchin@users.noreply.github.com>
2025-11-10 16:14:54 +01:00
Aashish Nair
b71feca1b0 Marimo docs page: added quotations to pip install ibis-framework[duckdb] command (#3304) 2025-11-10 16:07:46 +01:00
Jay Jaisankar
83245608b0 Basics course - Reinitialize packages after exit() is called (#3300)
* Add package after exit()

* Add TDataItems module
2025-11-10 15:39:09 +01:00
rudolfix
928310aefb docs - improves hub docs (#3282)
* adds tools to generate api reference for workspace

* writes install, mcp, api reference and improves other docs in hub

* Apply suggestions from code review

Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>

* fixes free tier

---------

Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>
2025-11-04 15:14:34 +01:00
Violetta Mishechkina
eb2d3a21fe Minor hub docs polishing (#3284)
* Minor hub docs polishing

* fixes workflow setup wrt not running certain steps if there are only docs changes

* Remove the duplicate content

* Fix build

---------

Co-authored-by: David Scharf <shrps@posteo.net>
2025-11-04 12:39:40 +01:00
Marcin Rudolf
9a2b7a7db0 fixes typo in docs intro 2025-11-03 13:31:40 +01:00
rudolfix
4a431d60ed refresh docs intro (#3270)
* renames pipeline to workspace dashboard

* refreshes intro

* review changes

* sidebar, references, dataset.table( cleanup
2025-10-31 17:14:49 +01:00
rudolfix
192296f4f8 fixes git import and enables tests (#3262)
* enable hub tests

* removes erroneous git import

* enables tests with importing dlt into minimal alpine container

* imports workspace modules on demand

* bumps dlt to version 1.18.1

* fixes mssql hub test on mac

* review fixes
2025-10-29 21:32:07 +01:00
David Scharf
cbf9db47c4 Fix docs deployment (#3266)
* install watchdog in install command

* remove uneeded file

* amend update versions to run outside of uv
2025-10-29 15:14:29 +01:00
rudolfix
c050556cc0 fixes installation and intro pages in hub (#3257) 2025-10-28 18:50:36 +01:00
rudolfix
e56f617c0e adds more signal options (#3248)
* adds option in load that prevents draining pool on signal

* adds runtime pipeline option to not intercept signals

* refactors signal module

* tests new cases

* describes signal handling in running in prod docs

* bumps dlt to 1.18.0

* fixes tests forked

* removes logging and buffered console output from signals

* adds retry count to load job metrics, generates started_at in init of runnable load job

* allows to update existing metrics in load step

* finalized jobs require start and finish dates

* generates metrics in each job state and in each completed loop, does not complete package if pool drained but jobs left, adds detailed tests for metrics

* fixes remote metrics

* replaces event with package bound semaphore to complete load jobs early

* fixes dashboard to on windows

* improves signals docs

* renames delayed_signals to intercepted_signals
2025-10-28 13:56:24 +01:00
Thierry Jean
718b636045 fix: .to_ibis() query normalization + docs update (#3225)
* use dlt.Dataset query normalization in _DltBackend

* pass dlt SQL cursor to _DltBackend instead of return values

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-10-27 13:50:04 -04:00
Violetta Mishechkina
38b0dec5a1 Add dlthub intro docs (#3241)
* Add dlthub intro

* Update with comments
2025-10-27 16:23:37 +01:00
Alena Astrakhantseva
73e861f850 add profiles (#3252) 2025-10-25 08:14:04 +02:00
Alena Astrakhantseva
b87923673f init pipeline in three ways page (#3222)
* init pipeline in three ways page

* add run pipelines and what is workspace

* move install workspace as step 0

* remove dashboard

* Update docs/website/docs/hub/init.md

Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>

* Update docs/website/docs/hub/init.md

Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>

* Update docs/website/docs/hub/init.md

Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>

* Update docs/website/docs/hub/init.md

Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>

* Update docs/website/docs/hub/init.md

Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>

* resolve Violettas comments

* fix lang in snippets

* Update docs/website/docs/hub/init.md

Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>

* Update docs/website/docs/hub/init.md

Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>

* fix links

* fix link

* move dashboard link on top

* add init to sidebar

---------

Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>
2025-10-24 17:25:49 +02:00
rudolfix
a94f5c7c0f graceful signal handler (#3234)
* implements signal handlers that allow graceful shutdown on a first signal. tests pipelines in forked tests

* includes KeyboardInterrupt in exception handlers in Pipeline to leave proper trace

* saves package state on each batch in custom destination

* initializes step in progress collectors

* Add new dlthub structure for docs (#3199)

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>

* fix: it should be destination (#3217)

* adds pokemon table count consts (#3232)

* fixes docstrings on signals

---------

Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>
Co-authored-by: Xiatong 夏童 <40656281+Magicbeanbuyer@users.noreply.github.com>
2025-10-24 11:35:31 +02:00
ivasio
26d0cfa6ae Fix: add support for yield_map in rest resource (#3211)
* add support for yield_map in rest resource, add tests

* fix tests

* document usage of yield_map in rest_api resource

* add record count asserts in tests

* formatting

---------

Co-authored-by: ivasio <ivan@dlthub.com>
2025-10-23 19:56:46 +02:00
Willi Müller
a3d73a51f6 Fixes docs on schema file naming convention (#3244)
Neither `{source name}_schema.yml` nor `{source name}.schema.yml` worked in my experiments.
2025-10-23 18:54:36 +02:00
Max Yakovenko
98c81466ea Feature: Introduce support of http based resources for fs source (#3029)
* Feature, Add support of http based paths

* Feature, Add support of http resources

* Feature, Enforce coercion to pendulum types. Add support of RFC 1123 format

* Feature, Add cloudfront base_url to the configurations

* Feature, Add a test for http based resources

* Feature, Add a test case for RFC 1123 datetime format

* Feature, Remove test cases related to datetime parsing in RFC and timestamp formats

* Revert "Feature, Enforce coercion to pendulum types. Add support of RFC 1123 format"

This reverts commit 142624b24a.

* Feature, Restore the structure of the url for the cdn

* Feature, Replace custom datetime parser function with a single dispatched one

* Feature, Add a stub package for singledispatch

* Feature, Reffactor pendulume datetime processing functions

* Feature, Fix the linting errors in time related tests

* Feature, Fix the declaration

* Feature, Revert the changes related to datetime parsing

* Feature, Add http schema for testing. Add pendulum parser to support RFC 1123 format

* Feature, Update the configuration for http bucket

* Feature, Add a http server. Update the test for http fs

* Feature, Upgrade fsspec

* Feature, Fix codestyle

* Feature, Fix the protocol validation for fsspec args

* Feature, Fix the typing annotations

* Add an example for http filesystem

* Feature, Add schema to the urlparse call

* Feature, Fix the codestyle for http entries in MIME_DISPATCH

* Feature, Expand the list of supported locations in the docs

* uses more random port and closes httpd to release it properly, drops auto fixture as it would be attached to all tests

* moves httpd tests to common tests

* adds http extra to support fsspec

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-10-23 17:08:15 +02:00
anuunchin
f14eca1cfb Initial commit with add_metrics (#3240) 2025-10-23 15:33:47 +02:00