4027 Commits

Author SHA1 Message Date
David Scharf
b7c8eee206 Small dashboard fixes (#3036)
* move dashboard tests to own workflow

* * do not crash dashboard app if credentials not available
* do not sort columns in dataset browser

* try sleep in e2e tests

* disable python 3.14 tests for now

* disable mac e2e tests for dashboard
clean up step conditions
2025-08-27 11:52:35 +02:00
Thierry Jean
9da4787406 improve type hints for dataset and relation (#2997) 2025-08-27 10:16:24 +02:00
djudjuu
a81aed6224 docs: dlt_plus.runner docs (#2886)
* basic runner docs

* fix extension

* removed unintended file

* extra section on current.runner(), linted snippets

* fix typo

* fix example

* refactor: dlt_plus.PipelineRunner

* dlt_plus.runner()

* proper keywords and an extra note

* better text

* change requests

* fix link

* fix snippets

* fix title

* updates, removed runner page

* changed order of clean folder and trace-saving sections

* full config in python api example

* forgotten comma

* better project runner vs pipeline runner description

* typo

* more tabs

* missing link

* improved docs

* another update

* observability docs

* mention custom callbacks in runner

* broken link fix

* i give up on this link

* change requests

* another final attempt at this broken link
2025-08-26 16:15:38 +02:00
djudjuu
ec1f8e4851 docs: pip install marimo -> dlt[workspace] (#3035) 2025-08-26 14:58:08 +02:00
Alena Astrakhantseva
ec567ce708 Update advanced-course.md
unified course names
2025-08-25 13:37:35 +02:00
anuunchin
3939751729 Page description fix in llm native workflow (#3033) 2025-08-25 13:22:08 +02:00
David Scharf
5bf932f69e use license command for plus connection test (#3026) 2025-08-25 12:20:41 +02:00
anuunchin
dc5cfba292 llm workflow docs updated with AI editors other than cursor (#3001) 2025-08-22 16:57:18 +02:00
Thierry Jean
c2dc9bbc20 fix: dlt.Pipeline.__repr__ (#3022) 2025-08-22 09:01:02 -04:00
Tomas Pulmano
f6b785ffc6 fix: avoid setting "None" string for aws session token (#2978)
* avoid setting "None" string for aws session token

* Update aws_credentials.py
2025-08-18 21:17:31 +02:00
Thierry Jean
b49edd3c61 ignore temporary __marimo__/ folders (#3008) 2025-08-18 10:58:45 -04:00
anuunchin
684dccad4a Fix: Max table nesting is ignored for the first run when import schema path is specified (#2992)
* Max nesting is preserved when creating an import schema

* import version hash correctly set when import schema is first created
2025-08-18 16:37:36 +02:00
anuunchin
378b7ce624 docs: move educational content to core repo (#2996)
* dlt fundamental and advanced courses

* branch reference in colab links set to master

* Fundamental and advanced courses live in separate pages
2025-08-15 15:57:34 -04:00
David Scharf
b75e4aa721 Dashboard Improvements (#2965)
* remove uneeded file

* fix forwarding of pipelines dir to marimo app

* disable state sync and display all schemas and remote state and schemas in pipeline overview

* add support for multiple schemas

* fix e2e tests, further updates pending

* use dropdown instead of multiselect for schema selection
add multi schema pipeline to fixtures

* add last run info in pipeline overview
add buttons to open pipeline folder and local data folder if present

* fix loads browser to select correct schema

* allow to start dashboard for a pipeline that is not there yet and add helpful error message in this case

* nicer last run time formatting
show pipeline error screen also when manually chnaing the pipeline name in the url

* move buttons to top, add refresh buttons to sections

* use raw query when constructing queries

* lazy load remote state tab

* fix traces and trace typing (mostly)

* add exception traces to ui

* add file watcher

* remove test code

* add source and resource state viewer to data panel

* update existing unit tests

* add unit test for new utils

* make marimo dashboard the default app for pipeline show

* update docs

* update existing e2e tests for new yaml based rendering of state

* move streamlit app down in sidebar

* grammar fixes for dashboard strings

* open duckdb in readme mode in datapanel in dashboard

* remove old tests
re-enable dashboard main command

* add missing args to dashboard command

* small fixes to e2e tests

* add tests for exceptions

* re-organize e2e tests into invidual tests

* add basic schema selection checks

* improve dashboard help and dashboard docs page

* short some strings in testing to make selecting predictable

* merge devel

* typo

---------

Co-authored-by: djudjuu <djudju@proton.me>
2025-08-15 16:56:52 +02:00
Thierry Jean
0e19b5f0d0 docs: added docs page to index (#2994) 2025-08-13 16:56:45 -04:00
Thierry Jean
518e71cb90 feat: add top-level dlt.dataset() (#2983)
* renamed protocol classes Dataset to SuportsDataset

* rename ReadableDBAPIRelation to Relation

* redirect deprecated entrypoints

* fix imports in tests and dlt.Pipeline
2025-08-13 09:56:26 -04:00
Thierry Jean
3be08570d4 feat: dlt.Schema.to_dot() graphviz export (#2959)
* graphviz renderer added

* dlt.Schema._repr_html_ added

* updated docs

* update CLI docs

* updated linting rule

* added tests for formatting kwargs

* added utility to validate dot
2025-08-12 14:02:15 -04:00
Jinso-o
6348115c91 Jinso o fix/cors playground (#2986)
* Update playground.py

Replace REST demo with DummyJSON users (CORS-safe for Playground)

* update with actual use case

* feat: improve playground example with real API data

- Switch from dummyjson to JSONPlaceholder users API
- Add dev_mode=True for better development experience
- Fix function parameter (remove unused dlt parameter)
- Update table references from 'items' to 'users' consistently
- Fix test assertion to expect 10 users instead of 50
- Add write_disposition='replace' to prevent data duplication
- Improve data display with proper return statements

* feat(playground): make pipeline idempotent with refresh=True

* fix: remove problematic sqlite3 micropip install causing linting failure

* lint: ruff fix & format playground notebook

* Playground: updated assert

* feat: add improved Marimo with better optimization and formatting

* code edit : removed argument refresh to avoid users confusion
2025-08-11 17:36:18 +02:00
AyushPatel101
f17e98122d Add remaining paramiko connect params to SFTP filesystem (#2823)
* Add pkey, disabled_algorithms, transport_factory and auth_strategy parameters to paramiko.client.connect. Also update filesystem docs for SFTP creds

* Move paramiko imports after the pytest.skip

---------

Co-authored-by: Ayush Patel <Ayush.Patel@imc.com>
2025-08-11 10:46:22 +02:00
Thierry Jean
440a7a35a4 fix: MissingDependencyException now inherits ImportError (#2977) 2025-08-08 07:36:04 -04:00
David Scharf
105904fd25 re-enable python 3.10 common tests (#2979) 2025-08-08 11:28:46 +02:00
David Scharf
ef92ffcd77 Refactor transformations (#2970)
* remove transformation code and tests that now live in dlt_plus

* move lineage code and tests into dataset folder scope

* start fixing model item format tests

* revert model item format tests back to version before last big change (with some updates)

* disable transformations snippets linting and testing for now

* remove uneeded test
2025-08-06 15:28:29 +02:00
Thierry Jean
3273d27f6c fix: avoid private interfaces; explicit compiler mapping (#2966)
* use public `ibis` and `sqlglot` interfaces
2025-08-06 07:41:24 -04:00
rudolfix
273420b257 Merge pull request #2962 from dlt-hub/devel
master merge for 1.15.0 release
1.15.0
2025-08-05 19:40:39 +02:00
David Scharf
dc0ab55976 fix failing top level module imports (#2963) 2025-08-05 16:38:39 +02:00
anuunchin
514393872c Allow setting streamed_exec in delta upsert (#2961) 2025-08-05 14:15:28 +02:00
rudolfix
e9b64d6f09 bumps to version 1.15.0 (#2958)
* bumps to version 1.15.0

* handled duckdb 1.3.2 in iceberg scanner and bumps dev version - seems to work with adlfs

* binds old dev duckb on windows until segfault is fixed

* test fixes, docs update
2025-08-05 14:08:02 +02:00
anuunchin
e23a302a88 AI Command: extended with IDEs (#2937)
* Initial commit

* Test improved for readability, ai command helper reverted

* dlt init dlthub asks for ide

* editor choice in dlt init tests

* 20 hardcoded vibe source
2025-08-04 17:06:42 +02:00
Thierry Jean
eb95c36f3c fix: replace arrow2 with arrow backend for connectorx (#2933)
* replace arrow2 with arrow backend for connectorx

* updated docs/

* updated minimal deps

* update docs and pyproject.toml deps

* updated minimal deps to support 3.9

* converts +00:00 to UTC right after handover from connectorx

* fixes examples connectorx lint

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-08-04 17:01:34 +02:00
rudolfix
edad825a59 2946 sqlalchemy destination fixes (#2951)
* fixes sqlalchemy destination to work with mssql

* do not generate ; in merge jobs

* fixes engine version type

* demonstrates plugging TypeMapper into sqlalchemy destination

* excludes temp files from snippet linting

* adds precision to _dlt_load_id and _dlt_id columns

* adds json field support for mssql

* fallback for alembic migrations when dialect not supported ie trino

* normalizes use of ; to separate queries

* adds type mappers for mysql, mssql and trino

* fixes type mapper import

* updates destination caps from explicit destination params at the end to overwrite adjustments

* normalizes ; usage, forward trackebacks when handling database exception

* fixes sqlalchemy merge eq condition and tests

* fixes clickhouse temporary table engine

* synth unpickle synthesizes on any error

* fixes duckdb with table scanners accessing self.execute... in open_connection

* fixes synpase json column fallback for index

* moves adding _dlt_load_id to arrow table after it is merged and normalized

* fixes more tests

* moves _dlt_load_id add in arrow extractor after normalization, before table merrging

* tests run context plug passthrough

* fixes BIGQUERY numeric creation

* fixes databricks PRIMARY KEY injection in tests
2025-08-04 16:59:45 +02:00
djudjuu
93483191d9 QoL: improve DataValidationError output: use identifying columns if present (#2915)
* improve DataValidationError output:  use identifying columns if present

* removing duplicate `schema: {schema_name}`  from error message

* refactor
2025-08-04 14:50:05 +02:00
Thierry Jean
02c461a09c feat: Schema.to_dbml() (#2929)
* dbml WIP

* dbml exporter

* full reference support; Schema.to_dbml()

* revert uv.lock changes

* fixed condition for _dlt tables ref

* rename _dbml.py to private module; use json encoder

* use TStoredSchema as entrypoint

* implementation completed

* added documentation

* added CLI support

* support unknown data type

* please the linter gods

* minified the image

* enables dbml for schema export

* image link to bucket; renamed constant

* updated docs linting

* include recommended VSCode extension

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-08-04 13:00:49 +02:00
Fran Lozano
84bdc4272c fix: prevent DuplicateSchema error when using public schema in Redshift (#2953)
* fix: prevent DuplicateSchema error when using public schema in Redshift

- Override has_dataset() method in RedshiftSqlClient to return True for 'public' schema
- This prevents dlt from trying to CREATE SCHEMA public when it already exists
- Add test to verify the fix works correctly
- Fixes issue where pipeline fails with 'Schema "public" already exists' error

Closes #2770

* Fix flake8 error

* Fix flake8 error

* Fix black errors
2025-08-04 12:17:08 +02:00
anuunchin
007953acc4 Fix: saving compressed load files with .gz extension (#2835)
* enable_gz_extension added to client configs

--amend

* docs added

* Unnecessary flag removed, fs storage versioning added

* Redundancies removed, storage version cached

* Test for imported files improved

* Initial version stored separately
2025-08-04 12:10:30 +02:00
Anton Burnashev
17fe9f83ef fix: restclient: handle null data in response (#2936)
- Added a check in the RESTClient to return an empty list when the extracted data is None + test
- Added a log warning for None data extraction
2025-08-04 11:41:28 +02:00
dat-a-man
a5c37befc2 Updated merge loading docs on scd2 strategy handling nested structures (#2944)
* Updated

* Update docs/website/docs/general-usage/merge-loading.md

---------

Co-authored-by: Alena Astrakhantseva <alena@dlthub.com>
2025-08-04 11:28:13 +02:00
Katharina Lenz
c89ef1c461 Docs/dlt plus project docs rest api restructuring (#2911)
* change project structure

* add sql databse source to source config doc

* add rest api and sql database example

* add filesystem docs

* finalize index file

* update sidebars.js and typos

* fix links and sidebars.js

* change headlines and resolve linkes

* correct typos and add definitions

* resolve pr comments

* link troubleshoot#
2025-08-04 11:08:38 +02:00
Thierry Jean
a5fbf756c0 added repr (#2940) 2025-08-03 23:09:50 +02:00
Violetta Mishechkina
abaed03e50 Optimize add_limit docs title for the search (#2949) 2025-07-31 17:08:22 +02:00
Alena Astrakhantseva
64181c6d39 Release highlights 1.12.3-1.14.1 (#2939)
* add release highlights 1.12.3-1.14.1

* fix dlthub/docs links

* fix snippets language

* fix json lang

* spelling

---------

Co-authored-by: adrianbr <adrian.brudaru@gmail.com>
2025-07-31 10:18:43 +02:00
rudolfix
8aea949975 skips inferring incomplete column when already incomplete (#2935)
* skips infering incomplete column when already incomplete

* handles seen-null-first properly
2025-07-30 12:02:57 +02:00
adrianbr
9a3f6ef63c adjust cursor docs to new flow (#2885)
* adjust to new flow

* adjust to new flow

* adjust to new flow

* fixes links in docs

* Update instructions

* improves restapi-cursor workflow and leaves several todos

* Minor cleaning

* Rename the page, add redirect

* Reverse redirect

---------

Co-authored-by: Adrian <Adrian>
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>
2025-07-30 11:02:01 +02:00
djudjuu
68271e3ce6 callback collector (#2922)
* simple callback collector

* prefect-collector

* inherit from Tracking, no circular imports

* Collector -> SupportsTracking with noop implementation

* plus_log_collector plus tests

* keep prfect-collector for dlt-plus

* lint

* test subclass receives callbacks and has counter access

* simpler comment (re-trigger CI pipeline)

* change requests
2025-07-30 10:53:32 +02:00
molkazhani2001
1265a5074e link to add_map and add_yield_map usage example (#2916)
* link to add_map and add_yield_map usage example

* fixing
2025-07-28 16:25:52 +02:00
Anton Burnashev
81eb87118a docs: rest_api: add tip for escaping curly braces (#2925) 2025-07-25 16:27:13 +02:00
David Scharf
5f087a4863 fix sync destination warning (#2927) 2025-07-24 19:45:14 +02:00
Giacomo Gamba
6a6b32e25f restclient: json param range paginator (#2917) 2025-07-23 17:59:51 +02:00
dat-a-man
f70b50a46e Updating custom configurations with @configspec decorator (#2826) 2025-07-23 15:55:02 +02:00
rudolfix
c7bda1e1a5 removes init files from dlt tables in filesystem (#2868)
* sets request and response validation in aws credentials

* removes init files from dlt tables, makes folders only when schema is updated

* fixes tests

* fixes table data exists/not exists check in filesystem

* unifies when data tables and dlt tables in filesystem are considered empty vs not existing

* fixes synth unpickler to handle enums
2025-07-23 14:38:36 +02:00
anuunchin
d09e3044f9 Docs adjusted for filesystem gdrive (#2912) 2025-07-23 10:34:45 +02:00