289 Commits

Author SHA1 Message Date
anuunchin
266052eb76 Docs: Converting Jupyter notebooks in education to marimo notebooks (#3068)
* Initial commit

* lesson_1_quick_start adjusted for marimo

* lesson_2_dlt_sources_and_resources_create_first_dlt_pipeline marimo

* Fundamentals course 3 improved

* Marimo badges added

* Fundamenta: course 8

* Marimo badge link fix

* Fundamentals: course 7

* Fundamentals: course 6

* Fundamentals: course 5

* Fundamentals: cousre 4

* Fundamentals: course 3

* Fundamentals: course 2

* Fundmantals: course 1

* marimo links corrected

* Inline deps

* Fundamentals: fix lesson 2

* Fundamentals: fix lesson 3

* Fundamentals: fix lesson 4

* Formatting moved to build-molabs

* Fundamentals: fix lesson 5

* Removal of scrolls

* Fundamentals: fix lesson 6

* Fundamentals: fix lesson 7

* Fundamentals: fix lesson 8

* os.environ replaced with dlt.secrets where relevant

* Advanced: fix lesson 5

* Advanced fix lesson 9

* os.environ fixes

* Advanced: fix lesson 1

* Comments cleanup

* Additional comment removal, fix lesson 6 advanced

* Clean main makefile

* Get rid of constants.py

* Nicer json.loads()

* Better functions in preprocess_to_molab

* Tests for doc tooling funcs

* Validate molab command

* Marimo check added

* docs pages adjustment

* limits sqlglot in dev group until fixed

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-16 16:30:32 +01:00
ivasio
99207237fe docs: add runtime docs to CLI reference (#3445)
* bumps to version 1.20.0

* update the hub reference docs, add CI check

* use dependency specifier in hub for plugin version check

* minimum dlt runtime cli check

* rollaback to old fsspec min version

* fixes test_hub ci workflow

* fixes flaky test

* bumps hub extra

* updates cli docs linting

* fixes docs lock

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: ivasio <ivan@dlthub.com>
2025-12-09 17:30:53 +01:00
djudjuu
289e00dece data quality checks cell in dashboard (#3413)
* adds hub extra

* makes hub module more user friendly when hub not installed

* test and lint fixes

* adds plugin version check util function

* basic cell appearing if installed

* use data quality cell

* show raw data too

* adds dlt-runtime to hub extra, minimal import tests

* bumps to dlthub 0.20.0 alpha

* lists pipelines with cli using the same functions as dashboard, dlt pipeline will list pipelines by default

* adds configured propfiles method on context so only profiles with configs or pipelines are listed

* adds list of locations that contained actual configs to provider interface

* improves workspace and profile commands

* test fixes

* fixes tests

* update text

* adds quality widget as python functions

* adds data_quality as module to hub

* adds hub extra to docs deps

* fixes dashboard imports

* bumps to alpha x.20.0a1

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-07 12:59:21 +01:00
rudolfix
06bc05848b (chore) adds hub extra (#3428)
* adds hub extra

* makes hub module more user friendly when hub not installed

* test and lint fixes

* adds plugin version check util function

* adds dlt-runtime to hub extra, minimal import tests

* bumps to dlthub 0.20.0 alpha

* lists pipelines with cli using the same functions as dashboard, dlt pipeline will list pipelines by default

* adds configured propfiles method on context so only profiles with configs or pipelines are listed

* adds list of locations that contained actual configs to provider interface

* improves workspace and profile commands

* test fixes

* fixes tests
2025-12-05 16:15:19 +01:00
ivasio
8608197026 Fix: reset config in PluggableRunContext.reload_providers (#3409)
* implement RunContext.reset_config, call it in PluggableRunContext.reload_providers

* fix _config access

* reiinitialize RunContext._runtime_config on access

* adjust the test to .runtime_config being always available

* fixes dlthub tests

---------

Co-authored-by: ivasio <ivan@dlthub.com>
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-03 01:24:05 +01:00
rudolfix
dd38c80fb4 fixes arrow import in sql_database (#3411)
* fixes pyarrow import in sql_database

* bumps to 1.19.1

* linter fix

* fixes common workflow
2025-12-02 18:33:03 +01:00
David Scharf
e5977c1ace Fixes historic builds (#3412)
* fixes historic builds

* fix broken link

* constrain docs build env to python 3.10

* switch snippets testing to python 3.10

* allows python up to py3.12 in docs project

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-02 16:51:41 +01:00
rudolfix
fc47edd280 ingests parquet into mssql, mysql and sqlite via ADBC (#3333)
* extracts adbc parquet load job with file format selector

* ports postgres parquet job to base job

* implements mssql adbc job

* adds pickle test for all destination caps

* adds dbc to adbc group, updates test workflow

* fixes sqlglot from find

* fixes docs

* adds sqlalchemy adbc docs

* adds support from sqllite and mysql in sqlalchemy

* fixes and tests str annotation resolving

* allows to disable adbc and does that in tests

* fixes imports

* docs lock bump

* fixes globalns extraction

* clarifies how adbc drivers are installed, implements fallback for postgres

* improves dashboard multi schema test

* fixes followup jobs

* fixes connection string escaping

* Update docs/website/docs/dlt-ecosystem/destinations/sqlalchemy.md

Co-authored-by: djudjuu <djudju@proton.me>

* removes code dedup

* fixes columns that receive None, simple and nested values

---------

Co-authored-by: djudjuu <djudju@proton.me>
2025-11-28 17:13:19 +01:00
David Scharf
4a5ffd82b3 Chore: Update docs npm dependencies and clean up docs build tooling (#3247)
* bump npm deps

* remove unneeded netlify redirects file

* remove unneeded lockfile

* remove another unneeded lockfile

* post rebase lockfile update

* remove old netlify command

* create new docs tools project and move api docs gen there

* tmp

* add uv to build docs workflow

* move docs pyproject

* re-org docs pcakage and move snippet linter

* move notebook linting commands and deps to tools folder
add flake8 to tools linting

* remove unneeded files

* fix linting and formatting errors

* remove wrong file

* move docs processing script to new package

* fix gen api ref

* clean up package json and use commands from parent makefile

* update build website workflow

* move linting to docs makefile partially

* fix python version for docs project

* consolidate docs commands in docs makefile

* fix docs linter

* fully update docs test flow

* fixes some linting and dependency problems

* fix constants

* move notebook formatting to docs project

* fix lint embedded snippets

* fix examples tests

* add missing dependencies

* fix snippet linting

* add missing lint dependencies to core and missing test dependencies to docs

* add missing weaviate

* add missing regex module

* add forked dependency and updates readme file

* revert accidental change to example

* fix main linter

* * Move relevant pytest options to subproject
* Remove shims / path inserts that are now managed by pytest options
* Some typing fixes
* Clean up base project pytest ini
* Enable transformation snippets tests

* remove unneeded raw import of intro snippets

* downgrade alive progress

* uses dlt logger which also fixes internal alive error

* enables transformation snippets linting

* fixes dashboard races again

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-11-16 18:01:30 +01:00
Menna
4d25a6c5b5 feat/3198-add-workspace-info-and-profile-selection
Added a dropdown for profile selection in the dashboard interface and updated the layout to display profile and workspace information inline with pipeline selection.
2025-11-14 18:44:45 +01:00
David Scharf
619402857b enable ci runs for PRs against the runtime branch (#3317) 2025-11-14 11:54:02 +01:00
Violetta Mishechkina
eb2d3a21fe Minor hub docs polishing (#3284)
* Minor hub docs polishing

* fixes workflow setup wrt not running certain steps if there are only docs changes

* Remove the duplicate content

* Fix build

---------

Co-authored-by: David Scharf <shrps@posteo.net>
2025-11-04 12:39:40 +01:00
rudolfix
d671376e68 fixes default limit in ibis backend (#3273) 2025-11-01 16:34:14 -04:00
rudolfix
4a431d60ed refresh docs intro (#3270)
* renames pipeline to workspace dashboard

* refreshes intro

* review changes

* sidebar, references, dataset.table( cleanup
2025-10-31 17:14:49 +01:00
rudolfix
192296f4f8 fixes git import and enables tests (#3262)
* enable hub tests

* removes erroneous git import

* enables tests with importing dlt into minimal alpine container

* imports workspace modules on demand

* bumps dlt to version 1.18.1

* fixes mssql hub test on mac

* review fixes
2025-10-29 21:32:07 +01:00
rudolfix
e56f617c0e adds more signal options (#3248)
* adds option in load that prevents draining pool on signal

* adds runtime pipeline option to not intercept signals

* refactors signal module

* tests new cases

* describes signal handling in running in prod docs

* bumps dlt to 1.18.0

* fixes tests forked

* removes logging and buffered console output from signals

* adds retry count to load job metrics, generates started_at in init of runnable load job

* allows to update existing metrics in load step

* finalized jobs require start and finish dates

* generates metrics in each job state and in each completed loop, does not complete package if pool drained but jobs left, adds detailed tests for metrics

* fixes remote metrics

* replaces event with package bound semaphore to complete load jobs early

* fixes dashboard to on windows

* improves signals docs

* renames delayed_signals to intercepted_signals
2025-10-28 13:56:24 +01:00
David Scharf
84de7115a2 Re-enable python 3.14 common tests (#3242)
* enable python 3.14

* try on mac

* remove beta 4 disclaimer

* adds sleep before starting windows e2e tests

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-10-24 09:32:25 +02:00
Max Yakovenko
98c81466ea Feature: Introduce support of http based resources for fs source (#3029)
* Feature, Add support of http based paths

* Feature, Add support of http resources

* Feature, Enforce coercion to pendulum types. Add support of RFC 1123 format

* Feature, Add cloudfront base_url to the configurations

* Feature, Add a test for http based resources

* Feature, Add a test case for RFC 1123 datetime format

* Feature, Remove test cases related to datetime parsing in RFC and timestamp formats

* Revert "Feature, Enforce coercion to pendulum types. Add support of RFC 1123 format"

This reverts commit 142624b24a.

* Feature, Restore the structure of the url for the cdn

* Feature, Replace custom datetime parser function with a single dispatched one

* Feature, Add a stub package for singledispatch

* Feature, Reffactor pendulume datetime processing functions

* Feature, Fix the linting errors in time related tests

* Feature, Fix the declaration

* Feature, Revert the changes related to datetime parsing

* Feature, Add http schema for testing. Add pendulum parser to support RFC 1123 format

* Feature, Update the configuration for http bucket

* Feature, Add a http server. Update the test for http fs

* Feature, Upgrade fsspec

* Feature, Fix codestyle

* Feature, Fix the protocol validation for fsspec args

* Feature, Fix the typing annotations

* Add an example for http filesystem

* Feature, Add schema to the urlparse call

* Feature, Fix the codestyle for http entries in MIME_DISPATCH

* Feature, Expand the list of supported locations in the docs

* uses more random port and closes httpd to release it properly, drops auto fixture as it would be attached to all tests

* moves httpd tests to common tests

* adds http extra to support fsspec

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-10-23 17:08:15 +02:00
rudolfix
0dcdcf0e33 ignores native config values if config spec does not implement those (#3233)
* does not fail config resolution if native valued provided to a config that does not implement native values

* updates databricks docs

* allows to replace hints regexes on schema

* removes partition hint on eth merge test on databricks

* adds pokemon table count consts

* reorgs databricks dlt fix

* fixes lancedb custom destination example

* fixes lancedb custom destination example

* reduces no sql_database examples run on ci

* fixes merge

* marks and skips rfam tests
2025-10-22 22:48:13 +02:00
Menna
773a649c19 Feat/3154 convert script preprocess docs to python and add destination capabilities section to destination pages (#3188)
* Add DLT destination capabilities tags to documentation files

This commit introduces the `<!--@@@DLT_DESTINATION_CAPABILITIES <destination>-->` tags to various destination documentation files. The following files were updated:
- athena.md
- bigquery.md
- clickhouse.md
- databricks.md
- destination.md
- dremio.md
- duckdb.md
- ducklake.md
- filesystem.md
- lancedb.md
- motherduck.md
- mssql.md
- postgres.md
- qdrant.md
- redshift.md
- snowflake.md
- sqlalchemy.md
- synapse.md
- weaviate.md

* Enhance documentation by adding destination capabilities sections

This commit adds the `## Destination capabilities` section along with the corresponding `<!--@@@DLT_DESTINATION_CAPABILITIES <destination>-->` tags to various destination documentation files. The following files were updated:
- athena.md
- bigquery.md
- clickhouse.md
- databricks.md
- destination.md
- dremio.md
- duckdb.md
- ducklake.md
- filesystem.md
- lancedb.md
- motherduck.md
- mssql.md
- postgres.md
- qdrant.md
- redshift.md
- snowflake.md
- sqlalchemy.md
- synapse.md
- weaviate.md

* Add new script for inserting DLT destination capabilities

* Update package.json and package-lock.json to include new script for inserting destination capabilities

This commit modifies the `package.json` to add a new script for inserting destination capabilities and updates the `package-lock.json` to reflect the changes in dependencies. The new script allows for better integration of destination capabilities into the documentation process.

* Revert "Update package.json and package-lock.json to include new script for inserting destination capabilities"

This reverts commit cd5d6c2fae.

* Add script for inserting destination capabilities into documentation

This commit introduces a new Python script, `insert_destination_capabilities.py`, It contains only place holder for now that prints to the console for testing the setup.

* Add destination capabilities execution

This commit  introduces a new function, `executeDestinationCapabilities`, which executes a Python script to insert destination capabilities into the documentation process.

* Enhance destination capabilities insertion script

This commit refines the `insert_destination_capabilities.py` script by adding functionality to dynamically generate and insert destination capabilities tables into documentation files. It introduces a new data structure for capabilities, improves file processing logic, and ensures that only relevant files are processed. Additionally, it enhances error handling and logging for better traceability during execution.

* Refactor destination capabilities insertion script

This commit updates the `insert_destination_capabilities.py` script to improve its functionality by dynamically retrieving supported destination names from the source directory. It enhances the file processing logic to ensure only relevant files are processed based on available destinations. Additionally, it improves error handling and logging for better execution traceability.

* Refactor and enhance destination capabilities insertion script

This commit refines the `insert_destination_capabilities.py` script by adding functionality to dynamically retrieve and format destination capabilities into markdown tables. It introduces improved error handling, validation for destination names, and enhances the file processing logic to ensure only relevant files are processed. Additionally, it updates the main function to include pre-checks for source and target directories, ensuring a more robust execution flow.

* Refactor and improve destination capabilities insertion script

This commit enhances the `insert_destination_capabilities.py` script by refining the logic for generating markdown tables of destination capabilities. It introduces new patterns for documentation links, improves error handling, and optimizes the processing of relevant capabilities. Additionally, it streamlines the file processing logic and ensures that only valid capabilities are included in the output, resulting in cleaner and more informative documentation.

* Remove destination capabilities sections from various documentation files

This commit removes the `## Destination capabilities` sections and their corresponding `<!--@@@DLT_DESTINATION_CAPABILITIES <destination>-->` tags from multiple destination documentation files, including athena.md, bigquery.md, clickhouse.md, databricks.md, dremio.md, duckdb.md, ducklake.md, filesystem.md, lancedb.md, motherduck.md, mssql.md, postgres.md, qdrant.md, redshift.md, snowflake.md, sqlalchemy.md, synapse.md, and weaviate.md. This cleanup helps streamline the documentation and focuses on relevant content.

* Add destination capabilities sections to various documentation files

This commit introduces `## Destination capabilities` sections along with their corresponding `<!--@@@DLT_DESTINATION_CAPABILITIES <destination>-->` tags in multiple destination documentation files, including athena.md, bigquery.md, clickhouse.md, databricks.md, dremio.md, duckdb.md, ducklake.md, filesystem.md, lancedb.md, motherduck.md, mssql.md, postgres.md, qdrant.md, redshift.md, snowflake.md, sqlalchemy.md, synapse.md, and weaviate.md. This addition enhances the documentation by providing clear insights into the capabilities of each destination, improving user understanding and usability.

* Update documentation for various destinations with formatting improvements

This commit enhances the documentation for multiple destinations, including BigQuery, ClickHouse, Databricks, Dremio, DuckDB, DuckLake, Filesystem, LanceDB, MotherDuck, MSSQL, Postgres, Qdrant, Redshift, Snowflake, SQLAlchemy, Synapse, and Weaviate. Changes include improved formatting for warnings, notes, and tips, as well as minor adjustments to the content for clarity and consistency. These updates aim to enhance the readability and usability of the documentation for users.

* Remove destination capabilities sections from various documentation files

* Update destinations with capabilities marker

* Added type guard to guard against Any

* Temporarily commit preprocessed docs

* Add new constants for documentation preprocessing and update requirements

This commit introduces a new `constants.py` file containing various constants for documentation preprocessing, including directory paths, file extensions, timing settings, and markers. Additionally, the `requirements.txt` file is updated to include `watchdog` and `requests` packages, enhancing the project's dependencies.

* Add tuba links processing script and remove unused line from constants

This commit introduces a new script, `preprocess_tuba.py`, which handles the fetching and formatting of tuba links for documentation. It includes functions for fetching configuration, extracting tags, and inserting links into markdown files. Additionally, an unused line has been removed from `constants.py` to clean up the code.

* Refactor tuba link processing and extract utility function

This commit refactors the `preprocess_tuba.py` script by moving the `extract_marker_content` function to a new `utils.py` file for better organization and reusability. The logic for checking the presence of the TUBA marker has been simplified, and the formatting function for tuba links has been updated to improve clarity and maintainability. These changes enhance the overall structure of the documentation preprocessing tools.

* Add snippet processing functionality for documentation

This commit introduces a new script, `preprocess_snippets.py`, which provides functions for building a map of code snippets, retrieving snippets from files, and inserting them into markdown documents. The script enhances the documentation preprocessing tools by allowing for better management and formatting of code snippets. Additionally, the `utils.py` file is updated with new utility functions for directory traversal and marker content extraction, improving overall code organization and reusability.

* Add example processing script for documentation generation

This commit introduces a new script, `process_examples.py`, which automates the generation of example documentation from Python files. The script includes functionality to build documentation by extracting headers, comments, and code snippets, while also handling exclusions and errors gracefully. Additionally, the `utils.py` file is updated with a new utility function, `trim_array`, to enhance the management of line arrays. These changes improve the documentation process by streamlining example integration and ensuring better formatting.

* Enhance documentation preprocessing with Python integration and new script

This commit updates the `package.json` to include a new script for installing Python dependencies and modifies the start and build scripts to incorporate Python preprocessing. Additionally, a new `preprocess_docs.py` script is introduced, which automates the processing of markdown files by inserting code snippets, managing links, and syncing examples. The `requirements.txt` is also updated to include a new dependency, `python-debouncer`, improving the documentation workflow.

* Refactor documentation preprocessing scripts for improved async handling and example processing

This commit enhances the `preprocess_docs.py` script by integrating asynchronous file handling and introducing a lock mechanism to manage concurrent processing. The `package.json` is updated to modify the start script for better coordination of preprocessing tasks. Additionally, a new `preprocess_examples.py` script is added to streamline the generation of example documentation, ensuring proper formatting and error handling. The `preprocess_snippets.py` script is also updated to maintain consistency in line reading methods. These changes collectively improve the efficiency and reliability of the documentation workflow.

* Refactor documentation preprocessing scripts for improved efficiency and caching

This commit updates the `package.json` to streamline the start script by removing the lock file mechanism and enhancing the coordination of preprocessing tasks. The `preprocess_docs.py` script is refactored to eliminate the lock file usage, simplifying the processing flow. Additionally, the `preprocess_tuba.py` script introduces a caching mechanism for tuba configuration to reduce redundant network requests, improving performance. These changes collectively enhance the documentation workflow and processing efficiency.

* Refactor file change handling in documentation preprocessing scripts

This commit enhances the `preprocess_docs.py` script by simplifying the file change handling logic through the introduction of a new `handle_change_impl` function. The previous `should_process` function is removed to streamline the decision-making process for file processing. Additionally, whitespace cleanup is performed for better code readability. The `preprocess_tuba.py` script also receives minor whitespace adjustments. These changes collectively improve the maintainability and clarity of the documentation preprocessing workflow.

* Add destination capabilities processing and refactor related scripts

This commit introduces a new script, `preprocess_destination_capabilities.py`, which handles the generation of destination capabilities tables for documentation. It includes caching mechanisms for improved performance and integrates with existing constants for consistency. The `insert_destination_capabilities` function is now called within `preprocess_docs.py` to streamline the documentation processing workflow. Additionally, the `insert_destination_capabilities.py` script is removed as its functionality is now encapsulated in the new script. These changes enhance the documentation generation process by providing structured capabilities information.

* Update package-lock.json and package.json for improved documentation preprocessing

This commit updates the `package-lock.json` to reflect changes in dependencies and their versions, ensuring compatibility and performance enhancements. The `package.json` is modified to streamline the `start` and `preprocess-docs` scripts by removing the installation of Python dependencies from the start command and adjusting the environment variable settings. These changes collectively enhance the efficiency and reliability of the documentation generation workflow.

* Add processed docs entry to .gitignore

This commit updates the .gitignore file to include the 'docs_processed' entry, ensuring that preprocessed documentation files are excluded from version control. This change helps maintain a cleaner repository by preventing unnecessary files from being tracked.

* Stop tracking docs_processed directory

* Remove the `preprocess_docs.js` script, which handled documentation preprocessing tasks including snippet insertion and link management. This deletion streamlines the codebase by eliminating unused functionality, following recent refactoring efforts to improve documentation processing workflows.

* Refactor destination capabilities processing script for type hinting and formatting improvements

This commit updates the `preprocess_destination_capabilities.py` script by adding type hints for caching variables, enhancing code clarity and maintainability. Additionally, it modifies the formatting of the capabilities table to ensure consistent output and appends a newline for better readability. These changes collectively improve the structure and presentation of destination capabilities in the documentation.

* Refactor documentation processing scripts by removing unnecessary argument documentation

This commit simplifies the `insert_destination_capabilities` function in `preprocess_destination_capabilities.py` by removing the detailed argument and return type documentation. Additionally, the `format_tuba_links_section` function in `preprocess_tuba.py` is updated to streamline its docstring, enhancing clarity while maintaining essential information. These changes improve the readability and maintainability of the documentation processing scripts.

* Update package.json to streamline documentation processing scripts

This commit modifies the `package.json` to include a new script for installing Python dependencies and updates the `start` and `build` scripts to ensure a more efficient workflow. The changes enhance the coordination of documentation preprocessing tasks, improving the overall efficiency of the documentation generation process.

* Added dependency installement in start

* Refactor package.json scripts for improved documentation processing

This commit updates the `package.json` to streamline the `start`, `build`, and `build:cloudflare` scripts by removing redundant installation of Python dependencies. The `preprocess-docs` script is now defined separately, enhancing clarity and efficiency in the documentation generation workflow.

* Add type checking configurations for additional modules in mypy.ini

This commit extends the mypy.ini configuration by adding ignore_missing_imports settings for several new modules, including constants and various preprocess modules. These changes aim to improve type checking flexibility and reduce false positives during type analysis, enhancing the overall development experience.

* Enhance type hinting in preprocessing scripts for improved clarity

This commit updates the type hints in `preprocess_destination_capabilities.py`, `preprocess_snippets.py`, and `preprocess_tuba.py` to provide more specific type information. Changes include casting for constants and refining list and dictionary type annotations. These improvements enhance code readability and maintainability, supporting better type checking and development practices.

* Update dependencies and refactor documentation processing scripts

This commit adds the `python-debouncer` dependency to `pyproject.toml` for improved event handling in documentation processing. Additionally, it refines the `package.json` scripts by separating the `preprocess-docs` command and optimizing the `start` script for better efficiency. The `preprocess_docs.py` script is also updated to utilize lazy imports for certain modules, enhancing performance during documentation processing. These changes collectively improve the clarity and efficiency of the documentation generation workflow.

* Remove requirements.txt and clean up whitespace in preprocess_docs.py

This commit deletes the `requirements.txt` file, which is no longer needed, and cleans up unnecessary whitespace in the `preprocess_docs.py` script. These changes help streamline the codebase and improve overall readability.

* Update documentation for Databricks and DuckLake destinations

This commit enhances the documentation for Databricks by adding a note about loading data to Managed Iceberg tables and refining the descriptions of table and column-level hints. Additionally, it updates the DuckLake documentation to recommend using a more explicit catalog name in configuration examples. These changes improve clarity and usability for users working with these destinations.

* Enhance documentation for various destinations and add requirements.txt for project dependencies

* Fix typo in DuckDB documentation regarding spatial extension installation

* Remove destination capabilities section from AWS Athena documentation

* Feat/adds workspace (#3171)

* ports toml config provider with profiles

* supports run context with profiles

* separates pluggy hooks from impls, uses pyproject and __plugins__.py for self-plugging

* implements workspace run context with profiles and basic cli

* displays workspace name and profile name before executing cli commands if run context supports profiles

* exposes dlt.current.workspace()

* converts run context protocol into abstract class

* fixes plugins tests

* refactors _workspace: private and public modules

* adds workspace test cases

* launches workspace and pipeline mpc with cli, sse by default

* tests basic workspace behaviors

* refactors code to switch context and profile

* adds default profile to run context interface

* ports pipeline and oss mcp, changes derivation structure

* adds safeguards and tests to workspace cleanup cli helper

* adds run_context to SupportsPipeline, checks run_context change on pipeline activation

* adds mcp dependency to workspace extra, fixes types

* renames test fixture

* mcp export tweak

* updates cli reference and common ci workflow

* disables dlt-plus deps in ci

* removes df from mcp tools, fixes workspace tests

* fixes tests

* Fix build scripts for Cloudflare integration in package.json

* Fix preprocess-docs:cloudflare script to use python directly instead of uv

* Restore preprocess-docs scripts in package.json for consistency

* Update preprocess-docs:cloudflare script to include requirements installation

* Update preprocess-docs:cloudflare script to include requirements installation

* Add __init__.py file to tools directory

* Refactor import statements to use relative imports in preprocessing scripts

* Update import statements to use absolute paths for consistency across preprocessing scripts

* Add mypy configuration for additional modules to ignore missing imports

* Removed duplicated line

* Add mypy configuration to ignore missing imports for tools module

* Update ducklake.md

* temporarily add netlify build command back

* fix typing in snippets and update mypy.ini a bit

* reverse build commands back to previous order

* Fixed watch by changing implementation into queue and locks

* Refactor package.json for improved script organization and maintainability

* Add mypy configuration to ignore missing imports for additional modules

* Add mypy configuration to ignore missing imports for more modules

* Remove mypy configuration for preprocess_examples to streamline settings

* Update mypy configuration: rename dlt hub section to dlt plus and remove unused preprocess settings

* Refactor import statements to remove 'tools' prefix, improving module accessibility across preprocess scripts

* Refactor import statements in preprocessing scripts to use relative imports, enhancing module organization and consistency

* Refactor import statements in preprocessing scripts to use absolute imports from the tools module, improving clarity and consistency across the codebase

* Update mypy.ini

* Fix formatting in _generate_doc_link function by removing unnecessary whitespace in return statement for improved readability

* fix linting and script execution

* remove sleeping after preprocessing in favor of predictable processing before docusaurus launch

* remove unnecessary whitespace in preprocess_docs.py for cleaner code

* Update deployment script in package.json and enhance file change handling in preprocess_docs.py; remove obsolete preprocess_change.py

* Refactor preprocess_docs.py to improve file change handling; replace change counter with a pending changes flag for better processing control and enhance logging for file modifications.

* Enhance capabilities table generation in preprocess_destination_capabilities.py by adding a descriptive header and introductory text for improved clarity and context.

* Remove destination capabilities sections from multiple destination documentation files for consistency and clarity.

* Fix formatting in start script of package.json for improved readability

* Enhance capabilities table generation by improving destination name formatting; streamline file change handling in preprocess_docs.py by removing unnecessary print statements.

* update files incrementally only when in watcher mode
make tuba link generation random per day with a seed

* fix duplicate page at examples error

* remove outdated docs deploy action

* add build docs action for better debugability

* revert unintential change to md file

* add info about where capabilities links should go

* refactor: improve documentation link generation for capabilities

* fix: update documentation link for replace strategy and improve link formatting

---------

Co-authored-by: rudolfix <rudolfix@rudolfix.org>
Co-authored-by: dave <shrps@posteo.net>
2025-10-22 18:59:48 +02:00
rudolfix
fe567414dc chore/moves cli to _workspace module (#3215)
* adds selective required context, checks profile support in switch_profile

* creates and tests hub module

* adds plugin version to telemetry

* renames imports in docs

* renames ci workflows

* fixes lint

* tests deploy command on duckdb

* moves cli module to workspace

* moves cli tests to workspace module

* renames fixtures, rewrites fixture to patch run context to _storage

* allows to patch global dir in workspace context

* when finding git repo, does not look up if GIT_CEILING_DIRECTORIES is set

* imports git utils only when need to clone package in dbt runner

* runs workspace tests as part of common

* fixes tests, config tests sideeffects

* moves dashboards to workspace

* fixes pipeline trace test

* moves dashboard helper tests

* excludes additional secret files and pinned profile from gitignore

* cleansup hatchling files in pyproject

* fixes dashboard running tests in ci

* moves git module to libs

* diff fix

* fixes fixture names
2025-10-19 15:21:42 +02:00
rudolfix
bc2706b63a renames dlt_plus plugin to dlthub (#3192)
* adds selective required context, checks profile support in switch_profile

* creates and tests hub module

* adds plugin version to telemetry

* renames imports in docs

* renames ci workflows

* fixes lint
2025-10-14 11:47:27 +02:00
rudolfix
01698752db Feat/adds workspace (#3171)
* ports toml config provider with profiles

* supports run context with profiles

* separates pluggy hooks from impls, uses pyproject and __plugins__.py for self-plugging

* implements workspace run context with profiles and basic cli

* displays workspace name and profile name before executing cli commands if run context supports profiles

* exposes dlt.current.workspace()

* converts run context protocol into abstract class

* fixes plugins tests

* refactors _workspace: private and public modules

* adds workspace test cases

* launches workspace and pipeline mpc with cli, sse by default

* tests basic workspace behaviors

* refactors code to switch context and profile

* adds default profile to run context interface

* ports pipeline and oss mcp, changes derivation structure

* adds safeguards and tests to workspace cleanup cli helper

* adds run_context to SupportsPipeline, checks run_context change on pipeline activation

* adds mcp dependency to workspace extra, fixes types

* renames test fixture

* mcp export tweak

* updates cli reference and common ci workflow

* disables dlt-plus deps in ci

* removes df from mcp tools, fixes workspace tests

* fixes tests
2025-10-08 20:16:34 +02:00
David Scharf
210dd3780f move test of newest lib version to macos (#3142) 2025-09-29 14:15:37 +02:00
Thierry Jean
8565a2ac06 feat: ducklake destination (#3015)
* move duckdb capabilities to utility function

* add basic DuckLake files based on DuckDB / Motherduck

* refactor ducklake config

* wip; ducklake destination

* simplified testing

* ignore ducklake files

* completed default config; TODO fix write

* unicode issues

* commented out patches

* lint

* uses destination_type as final fallback when creating default local file names, allows to copy local file context in WithLocalFiles

* creates connection pool for duckdb

* fixes exception handling in open_connection in sql_client, fixes racing when connections opened in duckdb, improves error handling if commit tx fails

* handles ducklake attach/detach in sql_client

* modifes ducklake configuration to: (1) use sqllite as default catalog (2) point all local files to local_dir (3) allow various urls to configure ducklake name (4) uses parquet as default file format

* adjust caps to execute load jobs sequentially for duckdb and sqllite catalogs

* passes ducklake conn to ibis, improves how duckb conn is passed (via open_connection which provides full context)

* adds configuration and credential tests, smoke tests for supported catalogs

* enables ducklake on ci

* fixes ducklake imports

* fixes how secrets are created from filesystem

* generates remote_url in load job metrics with real url of the ducklake table

* tests for all buckets

* adds ducklake extra

* adds hints for secrets.toml gen

* implements cursor for ducklake with correct df vector size

* forces use of ducklake/duckdb datasets in ibis handover, tests non existing dataset behavior

* removes dashboard e2e from common tests on ci

* docs WIP

* implements field resolution check and recursive copy for base configuration

* copies credentials before using as default when resolving capabilities

* allows recursive resolution traces in config field missing exception

* improves config resolve: collects traces recursive, keeps resolving if embedded config fails, collects resolved keys

* decouples connection string credentials and base duckdb credentials

* improves how duckdb handles exceptions when executing query

* makes catalog name explicit in ducklake credentials, creates default db and storage folder names after it

* supports ducklake partitioning on duckdb 1.4

* supports metadata schema on postgres, adds experimental ducklake catalog support on Motherduck

* fixes union config resolve with single base config in union

* docs WIP

* enabled ducklake remote test

* improves ibis filesystem con handover, enables databricks

* fixes tests

* fixes lancedb default name

* propagates only top level config section, replaces with embedded field name in other cases

* adds tests and examples for programmatic creation of ducklake facotry

* adds merge selector in duckdb caps to enable upsert on 1.4

* ducklake code cleanups

* makes sure pipeline is dropped before run_context goes out of scope

* finalizes ducklake docs

* fallback in duckdb merge selector if duckdb not installed

* propagates persist_secret flag in filesystem sql client

* fixes tests and ci

* runs remote ducklake on local postgres catalog for low latency

* uses packaging version, not semver for python packages comparisons

* Update docs/website/docs/dlt-ecosystem/destinations/duckdb.md

* fixes recursive re-raise in sql_client

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
2025-09-24 08:27:16 +02:00
David Scharf
024980693f Docs Cloudflare worker deployment (#3105)
* Docs Cloudflare worker deployment (#3104)

* add wrangler config

* fix wrangler config

* add wrangler to dev deps
add stage domain route
enable preview urls

* change docusaurus base url

* add worker to docs deployment

* add basic roots

* enable logs and add 404 route

* disable worker rewriting

* fix urls locally and deployed

* add tracking to docs deployment
add cloudflare commands to package json

* include old redirects

* update readme file

* add updated routing and updated dataset for production
2025-09-22 13:38:28 +02:00
rudolfix
b062dcafa4 docs/removes dlt plus docs and adds eula (#3079)
* answers defaults in cli if tty disconnected

* adds method to send anon tracker event even if disabled

* fixes types in source/resource build in generator

* adds dlt.hub with transformation decorator

* moves dlt-plus to separate sidebar in docs, renames to dltHub Features, adds EULA

* renamed plus to hub in docs

* fixes docs logos

* removes more dlt+

* renames plus tests

* fixes ci run main

* fixes hub workflows
2025-09-21 00:15:08 +02:00
David Scharf
58ae6303c7 Run common and dashboard tests also with newest available allowed packages for all deps (#3100)
* run common and dashboard tests also with newest available packages

* fix language in code block

* make basic tests works with updated versions of dependent packages
2025-09-19 08:52:46 +02:00
David Scharf
d143c29e35 Improve pipeline dashboard test coverage (#3091)
* disable most tests

* try correct windows command for runnig marimo e2e tests

* try without timeout

* test only launch marimo

* bump python version

* try install playwright deps

* fix e2e tests for dashboard on windows

* enable e2e tests for dashboard

* test macos 14 for dashboard e2e tests

* add basic tests for ui elements

* improve ui elements tests

* revert changes to main github workflow

* review fixes

---------

Co-authored-by: Your Name <you@example.com>
2025-09-17 19:58:18 +02:00
David Scharf
a75151e7e4 add up to date check for uv lockfile as first lint step (#3052)
* add check for uv lockfile

* update lockfile

* add some info about lockfiles to contributing.md
2025-09-02 17:09:10 +02:00
anuunchin
096d769828 Docs: Education notebooks formatted and linted (#3017)
* Formated and linted ed content

* Notebook filenames lowercased, no special chars
2025-09-02 08:41:47 +02:00
rudolfix
823bf3865f fully support naive and tz-aware timestamp/time data types (#2570)
* adds databricks timestamp NTZ

* improves error messages in pyarrow tuples to arrow

* decreases timestamp precision to 6 for mssql

* adds naive datetime to all data types case, enables fallback when testing destinations not supporting it

* other test fixes

* always stores incremental state last value as present in the data, tests tz-awareness edge cases

* fixes ntz timestamp tests

* fixes sqlalchemy destination to work with mssql

* adds func to current module to get current resource instance

* generates LIMIT clause in sql_database when limit step is present

* adds basic tests for mssql in sql_database

* adds docs on tz-awareness in datetime columns in sql_database

* adds naive an tz aware datetimes to destination caps, implements for various destinations

* caches dlt type to python type conversion

* normalizes timezone handling in timestamp and time data types, fixes remaining pendulum timezone problems, applies tz/non-tz preserving methods when necessary, improves test converage

* fixes incremental and lag so they always follow the tz-awareness of the data under cursor column, fixes pendulum tz problems, adds tests

* moves schema inference and data coercion from Schema to item_normalizers, applies timezone normalization to json data, adjusts new columns to destination caps for json data, tests

* casts timezones in arrow table normalizations, datetime and time cases in row tuples to arrow, refactors to get generic method to cast tables to dlt schemas, tests

* tracks resource parent, along pipe parent, fixes resource cloning when adding to source, fixes source and resource iterators, makes sure that list of extracted resources always includes implicit and explicit resources

* updates dbapi sql client for dremio

* adjust column schema inferred from arrow to destination caps in extractor, tests

* moves schema and data setup for all data types tests to common code

* adds option to exclude columns in sql_table, uses LimitItem to generate LIMIT statements, tests incl. proper cursor tests for naive/tz aware incremental cursor columns

* tests sql_database on mssql for all data types and incremental cursor on dates

* improves tests for row tuples to arrow with cast to dlt schema, tests for naive datetimes

* improved test for timestamps and int with precision on duckdb

* disables Python 3.14 tests and dashboard test on mac

* better maybe transaction in job client: takes into account ddl and regular transaction destination caps

* pyodbc py3.13 bump
2025-08-31 20:06:22 +02:00
David Scharf
b7c8eee206 Small dashboard fixes (#3036)
* move dashboard tests to own workflow

* * do not crash dashboard app if credentials not available
* do not sort columns in dataset browser

* try sleep in e2e tests

* disable python 3.14 tests for now

* disable mac e2e tests for dashboard
clean up step conditions
2025-08-27 11:52:35 +02:00
anuunchin
378b7ce624 docs: move educational content to core repo (#2996)
* dlt fundamental and advanced courses

* branch reference in colab links set to master

* Fundamental and advanced courses live in separate pages
2025-08-15 15:57:34 -04:00
David Scharf
b75e4aa721 Dashboard Improvements (#2965)
* remove uneeded file

* fix forwarding of pipelines dir to marimo app

* disable state sync and display all schemas and remote state and schemas in pipeline overview

* add support for multiple schemas

* fix e2e tests, further updates pending

* use dropdown instead of multiselect for schema selection
add multi schema pipeline to fixtures

* add last run info in pipeline overview
add buttons to open pipeline folder and local data folder if present

* fix loads browser to select correct schema

* allow to start dashboard for a pipeline that is not there yet and add helpful error message in this case

* nicer last run time formatting
show pipeline error screen also when manually chnaing the pipeline name in the url

* move buttons to top, add refresh buttons to sections

* use raw query when constructing queries

* lazy load remote state tab

* fix traces and trace typing (mostly)

* add exception traces to ui

* add file watcher

* remove test code

* add source and resource state viewer to data panel

* update existing unit tests

* add unit test for new utils

* make marimo dashboard the default app for pipeline show

* update docs

* update existing e2e tests for new yaml based rendering of state

* move streamlit app down in sidebar

* grammar fixes for dashboard strings

* open duckdb in readme mode in datapanel in dashboard

* remove old tests
re-enable dashboard main command

* add missing args to dashboard command

* small fixes to e2e tests

* add tests for exceptions

* re-organize e2e tests into invidual tests

* add basic schema selection checks

* improve dashboard help and dashboard docs page

* short some strings in testing to make selecting predictable

* merge devel

* typo

---------

Co-authored-by: djudjuu <djudju@proton.me>
2025-08-15 16:56:52 +02:00
David Scharf
105904fd25 re-enable python 3.10 common tests (#2979) 2025-08-08 11:28:46 +02:00
David Scharf
ef92ffcd77 Refactor transformations (#2970)
* remove transformation code and tests that now live in dlt_plus

* move lineage code and tests into dataset folder scope

* start fixing model item format tests

* revert model item format tests back to version before last big change (with some updates)

* disable transformations snippets linting and testing for now

* remove uneeded test
2025-08-06 15:28:29 +02:00
David Scharf
d1daade6af Enable and test python 3.14 support (#2789)
* enable 3.14 with orjson branch

* make example plugin a uv project

* post rebase pyproject update

* fix one dependency
update readme

* update readme about python 3.14
2025-07-22 14:29:01 +02:00
David Scharf
4920a15fbc bump default python version in CI from 3.10 to 3.11 (#2904) 2025-07-22 12:12:04 +02:00
David Scharf
682e900492 make docs snippets tests use local secrets (#2903)
* make docs snippets tests independent of secrets
* move examples tests into own workflow and remove github fork marker
* make custom naming example use secrets file instead of hardcoded secrets
2025-07-17 10:42:45 +02:00
David Scharf
ea1447554a deploy docs on push to master (#2902) 2025-07-17 09:58:06 +02:00
rudolfix
c262022bfe fixes arrow/pandas dependencies in extras and dep groups (#2895)
* bumps to version 1.14.1

* removes non dev dependencies from dev group

* sets good pandas dep in extras
2025-07-16 21:49:05 +02:00
David Scharf
983a33e6b6 Run full linter step on docs changes, bump marimo min version, enable marimo tests for python 3.13 (#2884)
* run full linter step on docs changes

* disable dashboard e2e tests on 3.11
enable dashboard e2e and unit tests on 3.13

* bump marimo min dependency

* Revert "Auxiliary commit to revert individual files from 52165eaeeb543932bc917bb5efc373c02ab2937b"

This reverts commit b7c5baf7c0c51e67ad323cd1b2cb9423f48f4165.

* re-lock changes

* revert incorrect change in secrets toml
2025-07-15 14:35:22 +02:00
Anton Burnashev
9ba5681d92 rest_api: Redact secrets in logs, add configurable response body in errors (#2867)
* Redact secrets in URL when logging and raising for status
* Add configuration options to show HTTP response body in exceptions and logs
* Exclude markdown files from size checks
* Moved configuration reading from _dlt_raise_for_status to RESTClient
2025-07-15 11:47:36 +02:00
David Scharf
21b68e61f1 Add workspace extra and rename marimo app to "pipeline dashboard" (#2876)
* adds dlt workspace extra, updates exception and github workflows

* renames app from "marimo app" to "pipeline dashboard"
updates --marimo flag to --dashboard

* rename studio folders to dashboard

* removes all other references to studio

* exclude lockfile and markdown files from lfs

* update workspace extra dependency versions

* bump version
2025-07-14 21:26:50 +02:00
David Scharf
befe9ced13 transformations - updates (#2718)
* rename flag for executing raw queries to "execute_raw_query"

* return sge queries from the internal _query method which removes a lot of unneeded transpiling
clean up make_transformation function
tests still pending

* adds some tests to readable dataset and a test for column hint merging

* allows any dialect when writing queries and fixes tests

* update docs and set correct quoting to queries in normalization and load stage

* fixes normalizer tests

* fix limit on mssql
normalize aliases in normalization step

* add missing quote to alias

* revert identifier normalization step in normalizer_query and use bigquery compiler for bigquery destinations

* post rebase fix

* smallish pr fixes

* add materializable sqlmodel and handle hints in extractor

* add and test always_materialize setting

* add test for sql transformation type

* convert transformation functions to need yield instead of return

* migrate tests and docs snippets to yield in transformations

* add simple test for materializable model

* use correct compiler for converting ibis into sqlglot for each dialect
fixes on transformation test

* add first simple version of using unbound ibis tables in transformations

* skip ibis test on python 3.9

* fix query building in new relation

* return a "real" relation from a transformation

* add ibis option when getting table from dataset
natively support unbound ibis tables in transformations and when getting relations from dataset

* update model item format tests to use relation

* * remove one unneeded test (same thing is already tested in transformations)
* fix wei conversion in linneage

* adds support for adding resource hints to pyarrow items

* switch most read access tests to default dataset

* update datasets and transformations docs pages

* separate ibis and default dbapi datasets and fix typing

* update transformation tests and small typing fixes for updated datasets

* fix default dataset type

* fix wei sqlglot conversion

* add sqlglot dialect type and some cleanup

* fix dataset snippets

* fix sqlglot schema test

* removes ibis relation and dataset
consolidates relation and dataset baseclasses with implementations
updates interfaces/protocols fro relation and dataset and makes those the publicly available interface with "Relation" and "Dataset"
remove query method from relation interface

* fix one doc snippet

* rename dataset and relation interfaces

* fix relation ship between cursor and relation, remove function wiring hack in favor of explicit forwarding for better typing

* clean up readablerelation (no actual code changes)

* fix str test to assume pretty sql (which it is now)
fix one transformation snippet

* small changes from review comments:
* query method on dataset
* typing update of table method

* rename query method to "to_sql" on relation

* clean up transform function a bit (could maybe be even better=
reject non-sql strings in transformation to not shadow errors

* add support for "non-generator" transformations

* move hints computation into resource class

* smallish PR fixes

* add support for dynamic hints in transformations
-> this allows to have multiple relations with different schemas in the relation, so this is allowed now too

* fixes dynamic table caching

* Enhances ReadableDBAPIRelation: min/max, filter with expression (#2833)

* Min max, filter with expr_or_string

* Fix in min max test

* Overload fix and docs

* Test read interfaces partially uses default relation max

* prevent sqglot schema from adding default hints info, only allow parametrized types and don't supply hints if none are present in dlt schema

* make multi schema transformations work again

* move model item format tests to transformations folder

* re-order interface tests and fix playground dataset access

* PR review test updated

* update dataset and transformation pages

* update transformations tests to new fruitshop

* Last PR fixes

* update columns_schema property

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: anuunchin <88698977+anuunchin@users.noreply.github.com>
2025-07-11 17:10:08 +02:00
David Scharf
de53e174ad Add linter step to prevent large files being merged (#2588)
* add lint step to check file sizes

* move filesizes check to bottom
2025-07-09 11:12:08 +02:00
David Scharf
cb8d2dfdae deploy marimo wasm playground to github pages and add to docs (#2832)
* add simple wasm notebook

* add first version of deployment script

* adds pyodid exec_info helper

* small updates to the example notebook

* add example page with transformations notebook into docs

* fix stupid typing error

* disable threading in dlt if platform with out threading detected

* move to playground

* simplify playground notebook
fix typos
add tests for playground notebook

* add missing marimo dependency for tests

* PR reviews plus simple tests

* add playground link to intro page

* adds marimo wasm contributing guide

* one more contributing note

* move notebook deployment to own file with own rules

* add comments to marimo cells
2025-07-04 12:41:39 +02:00
David Scharf
3ba504c65d marimo app updates (#2778)
* make dlt app ejectable

* update app file url in makefile and tests
add missing stylesheet to package

* start marimo app in process

* convert caching toggle to button for clearer use

* exlcude incomplete columns

* adds a bunch of tests for marimo app utils

* make normalized query output pretty and disable tests on 3.9

* filter out incomplete tables

* update cli strings and small changes to app ejection
2025-06-25 13:49:56 +02:00
David Scharf
5245a42536 run all common tests with --resolution lowest-direct on uv sync (#2787)
* run all common tests with resolution-lowest on sync

* make model item normalizer tests pass, disable on time test for now

* fix duckdb instantiation for old versions
bump pyarrow to have version that supports "append_column" on recordbatch
exclude deltalake tests for too low pyarrow versions

* fixes errors in makefile
bump minimum pytest version to what was in lockfile

* bump pendulum min requirement

* fix common test file

* bump ibis dependency

* go back to old version of pendulum
bump to prerelease
2025-06-23 21:30:58 +02:00