38 Commits

Author SHA1 Message Date
rudolfix
fc47edd280 ingests parquet into mssql, mysql and sqlite via ADBC (#3333)
* extracts adbc parquet load job with file format selector

* ports postgres parquet job to base job

* implements mssql adbc job

* adds pickle test for all destination caps

* adds dbc to adbc group, updates test workflow

* fixes sqlglot from find

* fixes docs

* adds sqlalchemy adbc docs

* adds support from sqllite and mysql in sqlalchemy

* fixes and tests str annotation resolving

* allows to disable adbc and does that in tests

* fixes imports

* docs lock bump

* fixes globalns extraction

* clarifies how adbc drivers are installed, implements fallback for postgres

* improves dashboard multi schema test

* fixes followup jobs

* fixes connection string escaping

* Update docs/website/docs/dlt-ecosystem/destinations/sqlalchemy.md

Co-authored-by: djudjuu <djudju@proton.me>

* removes code dedup

* fixes columns that receive None, simple and nested values

---------

Co-authored-by: djudjuu <djudju@proton.me>
2025-11-28 17:13:19 +01:00
David Scharf
4a5ffd82b3 Chore: Update docs npm dependencies and clean up docs build tooling (#3247)
* bump npm deps

* remove unneeded netlify redirects file

* remove unneeded lockfile

* remove another unneeded lockfile

* post rebase lockfile update

* remove old netlify command

* create new docs tools project and move api docs gen there

* tmp

* add uv to build docs workflow

* move docs pyproject

* re-org docs pcakage and move snippet linter

* move notebook linting commands and deps to tools folder
add flake8 to tools linting

* remove unneeded files

* fix linting and formatting errors

* remove wrong file

* move docs processing script to new package

* fix gen api ref

* clean up package json and use commands from parent makefile

* update build website workflow

* move linting to docs makefile partially

* fix python version for docs project

* consolidate docs commands in docs makefile

* fix docs linter

* fully update docs test flow

* fixes some linting and dependency problems

* fix constants

* move notebook formatting to docs project

* fix lint embedded snippets

* fix examples tests

* add missing dependencies

* fix snippet linting

* add missing lint dependencies to core and missing test dependencies to docs

* add missing weaviate

* add missing regex module

* add forked dependency and updates readme file

* revert accidental change to example

* fix main linter

* * Move relevant pytest options to subproject
* Remove shims / path inserts that are now managed by pytest options
* Some typing fixes
* Clean up base project pytest ini
* Enable transformation snippets tests

* remove unneeded raw import of intro snippets

* downgrade alive progress

* uses dlt logger which also fixes internal alive error

* enables transformation snippets linting

* fixes dashboard races again

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-11-16 18:01:30 +01:00
Menna
773a649c19 Feat/3154 convert script preprocess docs to python and add destination capabilities section to destination pages (#3188)
* Add DLT destination capabilities tags to documentation files

This commit introduces the `<!--@@@DLT_DESTINATION_CAPABILITIES <destination>-->` tags to various destination documentation files. The following files were updated:
- athena.md
- bigquery.md
- clickhouse.md
- databricks.md
- destination.md
- dremio.md
- duckdb.md
- ducklake.md
- filesystem.md
- lancedb.md
- motherduck.md
- mssql.md
- postgres.md
- qdrant.md
- redshift.md
- snowflake.md
- sqlalchemy.md
- synapse.md
- weaviate.md

* Enhance documentation by adding destination capabilities sections

This commit adds the `## Destination capabilities` section along with the corresponding `<!--@@@DLT_DESTINATION_CAPABILITIES <destination>-->` tags to various destination documentation files. The following files were updated:
- athena.md
- bigquery.md
- clickhouse.md
- databricks.md
- destination.md
- dremio.md
- duckdb.md
- ducklake.md
- filesystem.md
- lancedb.md
- motherduck.md
- mssql.md
- postgres.md
- qdrant.md
- redshift.md
- snowflake.md
- sqlalchemy.md
- synapse.md
- weaviate.md

* Add new script for inserting DLT destination capabilities

* Update package.json and package-lock.json to include new script for inserting destination capabilities

This commit modifies the `package.json` to add a new script for inserting destination capabilities and updates the `package-lock.json` to reflect the changes in dependencies. The new script allows for better integration of destination capabilities into the documentation process.

* Revert "Update package.json and package-lock.json to include new script for inserting destination capabilities"

This reverts commit cd5d6c2fae.

* Add script for inserting destination capabilities into documentation

This commit introduces a new Python script, `insert_destination_capabilities.py`, It contains only place holder for now that prints to the console for testing the setup.

* Add destination capabilities execution

This commit  introduces a new function, `executeDestinationCapabilities`, which executes a Python script to insert destination capabilities into the documentation process.

* Enhance destination capabilities insertion script

This commit refines the `insert_destination_capabilities.py` script by adding functionality to dynamically generate and insert destination capabilities tables into documentation files. It introduces a new data structure for capabilities, improves file processing logic, and ensures that only relevant files are processed. Additionally, it enhances error handling and logging for better traceability during execution.

* Refactor destination capabilities insertion script

This commit updates the `insert_destination_capabilities.py` script to improve its functionality by dynamically retrieving supported destination names from the source directory. It enhances the file processing logic to ensure only relevant files are processed based on available destinations. Additionally, it improves error handling and logging for better execution traceability.

* Refactor and enhance destination capabilities insertion script

This commit refines the `insert_destination_capabilities.py` script by adding functionality to dynamically retrieve and format destination capabilities into markdown tables. It introduces improved error handling, validation for destination names, and enhances the file processing logic to ensure only relevant files are processed. Additionally, it updates the main function to include pre-checks for source and target directories, ensuring a more robust execution flow.

* Refactor and improve destination capabilities insertion script

This commit enhances the `insert_destination_capabilities.py` script by refining the logic for generating markdown tables of destination capabilities. It introduces new patterns for documentation links, improves error handling, and optimizes the processing of relevant capabilities. Additionally, it streamlines the file processing logic and ensures that only valid capabilities are included in the output, resulting in cleaner and more informative documentation.

* Remove destination capabilities sections from various documentation files

This commit removes the `## Destination capabilities` sections and their corresponding `<!--@@@DLT_DESTINATION_CAPABILITIES <destination>-->` tags from multiple destination documentation files, including athena.md, bigquery.md, clickhouse.md, databricks.md, dremio.md, duckdb.md, ducklake.md, filesystem.md, lancedb.md, motherduck.md, mssql.md, postgres.md, qdrant.md, redshift.md, snowflake.md, sqlalchemy.md, synapse.md, and weaviate.md. This cleanup helps streamline the documentation and focuses on relevant content.

* Add destination capabilities sections to various documentation files

This commit introduces `## Destination capabilities` sections along with their corresponding `<!--@@@DLT_DESTINATION_CAPABILITIES <destination>-->` tags in multiple destination documentation files, including athena.md, bigquery.md, clickhouse.md, databricks.md, dremio.md, duckdb.md, ducklake.md, filesystem.md, lancedb.md, motherduck.md, mssql.md, postgres.md, qdrant.md, redshift.md, snowflake.md, sqlalchemy.md, synapse.md, and weaviate.md. This addition enhances the documentation by providing clear insights into the capabilities of each destination, improving user understanding and usability.

* Update documentation for various destinations with formatting improvements

This commit enhances the documentation for multiple destinations, including BigQuery, ClickHouse, Databricks, Dremio, DuckDB, DuckLake, Filesystem, LanceDB, MotherDuck, MSSQL, Postgres, Qdrant, Redshift, Snowflake, SQLAlchemy, Synapse, and Weaviate. Changes include improved formatting for warnings, notes, and tips, as well as minor adjustments to the content for clarity and consistency. These updates aim to enhance the readability and usability of the documentation for users.

* Remove destination capabilities sections from various documentation files

* Update destinations with capabilities marker

* Added type guard to guard against Any

* Temporarily commit preprocessed docs

* Add new constants for documentation preprocessing and update requirements

This commit introduces a new `constants.py` file containing various constants for documentation preprocessing, including directory paths, file extensions, timing settings, and markers. Additionally, the `requirements.txt` file is updated to include `watchdog` and `requests` packages, enhancing the project's dependencies.

* Add tuba links processing script and remove unused line from constants

This commit introduces a new script, `preprocess_tuba.py`, which handles the fetching and formatting of tuba links for documentation. It includes functions for fetching configuration, extracting tags, and inserting links into markdown files. Additionally, an unused line has been removed from `constants.py` to clean up the code.

* Refactor tuba link processing and extract utility function

This commit refactors the `preprocess_tuba.py` script by moving the `extract_marker_content` function to a new `utils.py` file for better organization and reusability. The logic for checking the presence of the TUBA marker has been simplified, and the formatting function for tuba links has been updated to improve clarity and maintainability. These changes enhance the overall structure of the documentation preprocessing tools.

* Add snippet processing functionality for documentation

This commit introduces a new script, `preprocess_snippets.py`, which provides functions for building a map of code snippets, retrieving snippets from files, and inserting them into markdown documents. The script enhances the documentation preprocessing tools by allowing for better management and formatting of code snippets. Additionally, the `utils.py` file is updated with new utility functions for directory traversal and marker content extraction, improving overall code organization and reusability.

* Add example processing script for documentation generation

This commit introduces a new script, `process_examples.py`, which automates the generation of example documentation from Python files. The script includes functionality to build documentation by extracting headers, comments, and code snippets, while also handling exclusions and errors gracefully. Additionally, the `utils.py` file is updated with a new utility function, `trim_array`, to enhance the management of line arrays. These changes improve the documentation process by streamlining example integration and ensuring better formatting.

* Enhance documentation preprocessing with Python integration and new script

This commit updates the `package.json` to include a new script for installing Python dependencies and modifies the start and build scripts to incorporate Python preprocessing. Additionally, a new `preprocess_docs.py` script is introduced, which automates the processing of markdown files by inserting code snippets, managing links, and syncing examples. The `requirements.txt` is also updated to include a new dependency, `python-debouncer`, improving the documentation workflow.

* Refactor documentation preprocessing scripts for improved async handling and example processing

This commit enhances the `preprocess_docs.py` script by integrating asynchronous file handling and introducing a lock mechanism to manage concurrent processing. The `package.json` is updated to modify the start script for better coordination of preprocessing tasks. Additionally, a new `preprocess_examples.py` script is added to streamline the generation of example documentation, ensuring proper formatting and error handling. The `preprocess_snippets.py` script is also updated to maintain consistency in line reading methods. These changes collectively improve the efficiency and reliability of the documentation workflow.

* Refactor documentation preprocessing scripts for improved efficiency and caching

This commit updates the `package.json` to streamline the start script by removing the lock file mechanism and enhancing the coordination of preprocessing tasks. The `preprocess_docs.py` script is refactored to eliminate the lock file usage, simplifying the processing flow. Additionally, the `preprocess_tuba.py` script introduces a caching mechanism for tuba configuration to reduce redundant network requests, improving performance. These changes collectively enhance the documentation workflow and processing efficiency.

* Refactor file change handling in documentation preprocessing scripts

This commit enhances the `preprocess_docs.py` script by simplifying the file change handling logic through the introduction of a new `handle_change_impl` function. The previous `should_process` function is removed to streamline the decision-making process for file processing. Additionally, whitespace cleanup is performed for better code readability. The `preprocess_tuba.py` script also receives minor whitespace adjustments. These changes collectively improve the maintainability and clarity of the documentation preprocessing workflow.

* Add destination capabilities processing and refactor related scripts

This commit introduces a new script, `preprocess_destination_capabilities.py`, which handles the generation of destination capabilities tables for documentation. It includes caching mechanisms for improved performance and integrates with existing constants for consistency. The `insert_destination_capabilities` function is now called within `preprocess_docs.py` to streamline the documentation processing workflow. Additionally, the `insert_destination_capabilities.py` script is removed as its functionality is now encapsulated in the new script. These changes enhance the documentation generation process by providing structured capabilities information.

* Update package-lock.json and package.json for improved documentation preprocessing

This commit updates the `package-lock.json` to reflect changes in dependencies and their versions, ensuring compatibility and performance enhancements. The `package.json` is modified to streamline the `start` and `preprocess-docs` scripts by removing the installation of Python dependencies from the start command and adjusting the environment variable settings. These changes collectively enhance the efficiency and reliability of the documentation generation workflow.

* Add processed docs entry to .gitignore

This commit updates the .gitignore file to include the 'docs_processed' entry, ensuring that preprocessed documentation files are excluded from version control. This change helps maintain a cleaner repository by preventing unnecessary files from being tracked.

* Stop tracking docs_processed directory

* Remove the `preprocess_docs.js` script, which handled documentation preprocessing tasks including snippet insertion and link management. This deletion streamlines the codebase by eliminating unused functionality, following recent refactoring efforts to improve documentation processing workflows.

* Refactor destination capabilities processing script for type hinting and formatting improvements

This commit updates the `preprocess_destination_capabilities.py` script by adding type hints for caching variables, enhancing code clarity and maintainability. Additionally, it modifies the formatting of the capabilities table to ensure consistent output and appends a newline for better readability. These changes collectively improve the structure and presentation of destination capabilities in the documentation.

* Refactor documentation processing scripts by removing unnecessary argument documentation

This commit simplifies the `insert_destination_capabilities` function in `preprocess_destination_capabilities.py` by removing the detailed argument and return type documentation. Additionally, the `format_tuba_links_section` function in `preprocess_tuba.py` is updated to streamline its docstring, enhancing clarity while maintaining essential information. These changes improve the readability and maintainability of the documentation processing scripts.

* Update package.json to streamline documentation processing scripts

This commit modifies the `package.json` to include a new script for installing Python dependencies and updates the `start` and `build` scripts to ensure a more efficient workflow. The changes enhance the coordination of documentation preprocessing tasks, improving the overall efficiency of the documentation generation process.

* Added dependency installement in start

* Refactor package.json scripts for improved documentation processing

This commit updates the `package.json` to streamline the `start`, `build`, and `build:cloudflare` scripts by removing redundant installation of Python dependencies. The `preprocess-docs` script is now defined separately, enhancing clarity and efficiency in the documentation generation workflow.

* Add type checking configurations for additional modules in mypy.ini

This commit extends the mypy.ini configuration by adding ignore_missing_imports settings for several new modules, including constants and various preprocess modules. These changes aim to improve type checking flexibility and reduce false positives during type analysis, enhancing the overall development experience.

* Enhance type hinting in preprocessing scripts for improved clarity

This commit updates the type hints in `preprocess_destination_capabilities.py`, `preprocess_snippets.py`, and `preprocess_tuba.py` to provide more specific type information. Changes include casting for constants and refining list and dictionary type annotations. These improvements enhance code readability and maintainability, supporting better type checking and development practices.

* Update dependencies and refactor documentation processing scripts

This commit adds the `python-debouncer` dependency to `pyproject.toml` for improved event handling in documentation processing. Additionally, it refines the `package.json` scripts by separating the `preprocess-docs` command and optimizing the `start` script for better efficiency. The `preprocess_docs.py` script is also updated to utilize lazy imports for certain modules, enhancing performance during documentation processing. These changes collectively improve the clarity and efficiency of the documentation generation workflow.

* Remove requirements.txt and clean up whitespace in preprocess_docs.py

This commit deletes the `requirements.txt` file, which is no longer needed, and cleans up unnecessary whitespace in the `preprocess_docs.py` script. These changes help streamline the codebase and improve overall readability.

* Update documentation for Databricks and DuckLake destinations

This commit enhances the documentation for Databricks by adding a note about loading data to Managed Iceberg tables and refining the descriptions of table and column-level hints. Additionally, it updates the DuckLake documentation to recommend using a more explicit catalog name in configuration examples. These changes improve clarity and usability for users working with these destinations.

* Enhance documentation for various destinations and add requirements.txt for project dependencies

* Fix typo in DuckDB documentation regarding spatial extension installation

* Remove destination capabilities section from AWS Athena documentation

* Feat/adds workspace (#3171)

* ports toml config provider with profiles

* supports run context with profiles

* separates pluggy hooks from impls, uses pyproject and __plugins__.py for self-plugging

* implements workspace run context with profiles and basic cli

* displays workspace name and profile name before executing cli commands if run context supports profiles

* exposes dlt.current.workspace()

* converts run context protocol into abstract class

* fixes plugins tests

* refactors _workspace: private and public modules

* adds workspace test cases

* launches workspace and pipeline mpc with cli, sse by default

* tests basic workspace behaviors

* refactors code to switch context and profile

* adds default profile to run context interface

* ports pipeline and oss mcp, changes derivation structure

* adds safeguards and tests to workspace cleanup cli helper

* adds run_context to SupportsPipeline, checks run_context change on pipeline activation

* adds mcp dependency to workspace extra, fixes types

* renames test fixture

* mcp export tweak

* updates cli reference and common ci workflow

* disables dlt-plus deps in ci

* removes df from mcp tools, fixes workspace tests

* fixes tests

* Fix build scripts for Cloudflare integration in package.json

* Fix preprocess-docs:cloudflare script to use python directly instead of uv

* Restore preprocess-docs scripts in package.json for consistency

* Update preprocess-docs:cloudflare script to include requirements installation

* Update preprocess-docs:cloudflare script to include requirements installation

* Add __init__.py file to tools directory

* Refactor import statements to use relative imports in preprocessing scripts

* Update import statements to use absolute paths for consistency across preprocessing scripts

* Add mypy configuration for additional modules to ignore missing imports

* Removed duplicated line

* Add mypy configuration to ignore missing imports for tools module

* Update ducklake.md

* temporarily add netlify build command back

* fix typing in snippets and update mypy.ini a bit

* reverse build commands back to previous order

* Fixed watch by changing implementation into queue and locks

* Refactor package.json for improved script organization and maintainability

* Add mypy configuration to ignore missing imports for additional modules

* Add mypy configuration to ignore missing imports for more modules

* Remove mypy configuration for preprocess_examples to streamline settings

* Update mypy configuration: rename dlt hub section to dlt plus and remove unused preprocess settings

* Refactor import statements to remove 'tools' prefix, improving module accessibility across preprocess scripts

* Refactor import statements in preprocessing scripts to use relative imports, enhancing module organization and consistency

* Refactor import statements in preprocessing scripts to use absolute imports from the tools module, improving clarity and consistency across the codebase

* Update mypy.ini

* Fix formatting in _generate_doc_link function by removing unnecessary whitespace in return statement for improved readability

* fix linting and script execution

* remove sleeping after preprocessing in favor of predictable processing before docusaurus launch

* remove unnecessary whitespace in preprocess_docs.py for cleaner code

* Update deployment script in package.json and enhance file change handling in preprocess_docs.py; remove obsolete preprocess_change.py

* Refactor preprocess_docs.py to improve file change handling; replace change counter with a pending changes flag for better processing control and enhance logging for file modifications.

* Enhance capabilities table generation in preprocess_destination_capabilities.py by adding a descriptive header and introductory text for improved clarity and context.

* Remove destination capabilities sections from multiple destination documentation files for consistency and clarity.

* Fix formatting in start script of package.json for improved readability

* Enhance capabilities table generation by improving destination name formatting; streamline file change handling in preprocess_docs.py by removing unnecessary print statements.

* update files incrementally only when in watcher mode
make tuba link generation random per day with a seed

* fix duplicate page at examples error

* remove outdated docs deploy action

* add build docs action for better debugability

* revert unintential change to md file

* add info about where capabilities links should go

* refactor: improve documentation link generation for capabilities

* fix: update documentation link for replace strategy and improve link formatting

---------

Co-authored-by: rudolfix <rudolfix@rudolfix.org>
Co-authored-by: dave <shrps@posteo.net>
2025-10-22 18:59:48 +02:00
rudolfix
fe567414dc chore/moves cli to _workspace module (#3215)
* adds selective required context, checks profile support in switch_profile

* creates and tests hub module

* adds plugin version to telemetry

* renames imports in docs

* renames ci workflows

* fixes lint

* tests deploy command on duckdb

* moves cli module to workspace

* moves cli tests to workspace module

* renames fixtures, rewrites fixture to patch run context to _storage

* allows to patch global dir in workspace context

* when finding git repo, does not look up if GIT_CEILING_DIRECTORIES is set

* imports git utils only when need to clone package in dbt runner

* runs workspace tests as part of common

* fixes tests, config tests sideeffects

* moves dashboards to workspace

* fixes pipeline trace test

* moves dashboard helper tests

* excludes additional secret files and pinned profile from gitignore

* cleansup hatchling files in pyproject

* fixes dashboard running tests in ci

* moves git module to libs

* diff fix

* fixes fixture names
2025-10-19 15:21:42 +02:00
rudolfix
bc2706b63a renames dlt_plus plugin to dlthub (#3192)
* adds selective required context, checks profile support in switch_profile

* creates and tests hub module

* adds plugin version to telemetry

* renames imports in docs

* renames ci workflows

* fixes lint
2025-10-14 11:47:27 +02:00
rudolfix
01698752db Feat/adds workspace (#3171)
* ports toml config provider with profiles

* supports run context with profiles

* separates pluggy hooks from impls, uses pyproject and __plugins__.py for self-plugging

* implements workspace run context with profiles and basic cli

* displays workspace name and profile name before executing cli commands if run context supports profiles

* exposes dlt.current.workspace()

* converts run context protocol into abstract class

* fixes plugins tests

* refactors _workspace: private and public modules

* adds workspace test cases

* launches workspace and pipeline mpc with cli, sse by default

* tests basic workspace behaviors

* refactors code to switch context and profile

* adds default profile to run context interface

* ports pipeline and oss mcp, changes derivation structure

* adds safeguards and tests to workspace cleanup cli helper

* adds run_context to SupportsPipeline, checks run_context change on pipeline activation

* adds mcp dependency to workspace extra, fixes types

* renames test fixture

* mcp export tweak

* updates cli reference and common ci workflow

* disables dlt-plus deps in ci

* removes df from mcp tools, fixes workspace tests

* fixes tests
2025-10-08 20:16:34 +02:00
Thierry Jean
b29f33c5ed feat: dlt widgets for marimo (#3021)
* added marimo widget + tutorial

* load package inspector added

* added schema inspector

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-08-27 13:25:23 -04:00
djudjuu
96014481be update lancedb orphan deletion mechanism (#2820)
* bump to latest lancedb

* do not pass api-key to embedding_func, align schema for orphan deletion

* bump lancedb

* updated example

* use pyarrow helpers in type mapper

* removes code duplication from lancedb_client, moves jobs to a separate module

* sets nullability, fixes schema on merge to include vector column if not added by the user, removes nullability on auto-embed columns in adapter

* read vector field from config

* fix nullability test hint

* unit test add_vector_column

* more specific ValueError parsing

* no longer accept value error when opening table

* schema alignment test next versions

* no fusion datatype typecasting

* refactor

* problems with json loading

* test fixes

* fixes column normalization when reading existing schema

* warn against orphan removal without settings

* added docs

* todos, check for merge-disposition

* fixed missing load tests

* fixed tests

* fixed multiple merge keys condition

* pyarrow precision types

* remove unused code

* added max precision in LanceDB tests

* remove arrow to fsiont_tupe tests

* refactor

* prepare_load_table in orphan removal job

* documentation update

* refactor

* adds method to get dict of non-default values from configuration

* moves parquet and csv format configuration from data writers to destination

* adds parquet format to destination caps to allow lancedb to have custom settings

* adds more lancedb configs, moves connect method to credentials, allows lancedb client to be passed instead of creds

* forces arrow list struct to be saved in parquet, not the parquet default

* looks for row key only for merge disposition

* moves fill_empty_source_column_values_with_placeholder to pyarrow helper

* tests bring own vector and explicit client as credentials

* ignores lancedb in mypy.ini

* adds missing docs

* deprecates file format configs in data writers

* fix unit tests for add_vector_column

* adjust example code to updated lancedb exceptions

* skip lancedb example (because running on fork breaks)

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: MOLKA ZHANI <molka@dlthub.com>
2025-07-07 19:09:17 +02:00
rudolfix
b1cff8cc66 allows to copy files from vibe-hub repo (#2760)
* copies files from hub repo

* adds ai setup and hub copy commands to dlt init for dlthub:

* adds vibe rest api pipeline with full AST rewrite

* moves repo locations to init

* fixes run context in venv
2025-06-18 11:25:12 +02:00
rudolfix
1dc29d7f01 adds parquet support to postgres via adbc (#2685)
* adds parquet support to postgres via adbc

* use selector to compute list of file formats in caps

* adds docs on failing data types

* adds direct test for all data types

* fixes test
2025-06-06 00:02:40 +02:00
rudolfix
b472ab7168 [transformations] decouples sqlglot lineage and schema generation from destination identifiers (#2705)
* uppercase env var

* fix linting and marimo e2e tests

* enables only x-annotation propagation, fixes lineage test to include clickhouse, sqlalchemy and clickhouse

* computes sqlglot schema and lineage solely on dlt schema identifiers, disables any normalization and table name expansion

* computes ibis unbound table solely on dlt schema identifiers, disables any normalization and table name expansion

* makes ibis relation to work on dlt schema identifiers

* decouples query generation from query normalization in base relation. query normalization will expand table names, qualify tables, case fold and quote

* adds capability to check if nulls are enforced on alter

* adds option to get table path without casefolding

* rewrites how identifiers are normalized in sqlalchemy

* makes test_read_interfaces work with all destinations without escaping, WIP

* fixes how credentials are emitted by destination_config

* fixes linting issues for marimo

* revert name / type scoped destination configs

* fix pii annotations hint

* quote table names in row counts (will not work with table names with white spaces otherwise)

* format

* fix marimo app linting errors

* normalize database name in sqlglot schema
fix anoynmous column detection in lineage

* disables one lineage test..

* fix dataset mismatch in query resolution the correct way

* remove qualified table names from some selectors

* fix a couple more tests

* make normalizing of query for pure sql relations optional
use normalized query in transformations

* fix default of normalizing query
cache sqlglot schema on dataset

* move query normalization into utils, cache result and do not modify original qualified query

* directly access normalized_query from relation

* disable sqlglot schema cache on dataset

* fix filesystem tests and disallow access of non-existent table

* fix unrelated breakage in lancedb example

* update tests that were using tables not in schema on datasets

* fix snowflake tests, re-enable two disabled tests

* fix last snowflake test

---------

Co-authored-by: djudjuu <julius@dlthub.com>
Co-authored-by: David Scharf <shrps@posteo.net>
2025-06-04 20:29:30 +02:00
David Scharf
7eb4570f8e dlt marimo app pre-release version (#2662)
* start marimo app

* some more work

* a few small additional changes

* move marimo to dlt helpers and some small changes

* a bunch of improvements

* ui improvments and start fixing types

* clean up imports and make app more typesafe

* nicer tables

* start data page with row counts

* first version of query explorer

* make db browser nicer and dataset faster

* add pipeline quickstart links
add query cache and fast query execution

* add studio extra

* add first very simple test

* add studio command

* add more first tests

* fix dropdown

* rename helpers to utils
fix linter

* incomplete work on e2e tests

* tmp

* move e2e tests

* add tests to common file

* fallback when getting pipelines

* add poetry context to marimo start command

* fix folder

* add basic page checking for all e2e test pipelines

* small change

* add python caching (marimo caching does not work properly) and make dlt_pipeline a top level object

* start adding load info tab

* add ibis to e2e dependencies

* add loads page and data browser query history

* update basic e2e tests

* basic grammar fixes

* start adding trace view

* clean up imports

* start reworking tabs / switches

* finish conversion into grid friendly version

* fix types

* clean up strings and cell names

* a bit of styling

* make schema page one cell

* some style  updates

* changes to schema browser

* stg

* some text improvements

* fix unit tests

* fixes tests

* fix load id based row counting

* small css improvements

* add more info to trace section

* fix tests and small changes to trace page

* small string change

* fix warnings in edit mode

* extract all strings

* fix strings

* comments and some formatting

* remove incorrect info

* add config and make tests work again

* us string refs in e2e tests

* update test file

* add better timestamp rendering for loads and update tests

* fix rest api tests

* disable marimo tests on python 3.13

* use marimo state for some caching

* slightly re-organize utils

* add generated version of utils tests

* exclude python 3.9 for marimo e2e tests

* run e2e tests headless

* disable marimo e2e tests on windows

* remove marimo extra and create dependency groups for marimo and streamlit

* add marimo dependencies to linter

(cherry picked from commit e4235a981ee2d79d1e51cb7728b551acad562e3b)

* streamlit should be present for linting

* re-enable relevant fixtures for e2e tests
remove unused imports

* move marimo tests first for debugging purposes

* print html from test to see what is going on

* another test

* do not set duckdb credentials and move marimo tests back to end

* fix marimo app dependencies
2025-05-30 17:12:58 +02:00
djudjuu
0d5176fbbd refactor init-command for use in dlt project (#2568)
* refactor init-command for use in dlt project

* remove config.toml from project docs

* fix ibis mypy error

---------

Co-authored-by: dave <shrps@posteo.net>
2025-05-07 16:09:44 +02:00
Steinthor Palsson
42c5c9b50a Apply hints for nested tables (#2165)
* unifies ResourceHints typed dict

* Apply nested hints and compute table chain from nested hints

* Arrow fix

* Handle TableNameMeta

* Fix name hint

* Arrow fix, all tests/extract running

* Nested hints tests, handle table name overrides

* lint

* required Optional type annotation added

Adding this type annotation fixed 69 failing tests. The missing Optional
impacted the dlt.common.validation.validate_dict().validate_prop()
functions to parse the RESTAPIConfig object

* updated pokemon source to pokeapi==2.7.0

* refactored tests; fixed syntaxerror

* use naming convention when resolving nested tables

* use naming convention when resolving nested tables

* added comments to tests

* removed unused path param

* flat map to apply nested hints

* added failing test for reference

* temporarily fix default table template

* allows for normalizer to break nesting, fixes key_hash generation bug

* generates parent and resource depending on the primary/merge keys present in nested resources

* format fixes

* fixes and declares tests

* renames parent to parent_table_name in ResourceHints, fixes bug where table name from meta was not applied in compute_table_schema of hints

* add nested_hints to resource decorators

* allows schema contracts to be defined on nested tables

* implements missing tests

* adds nested hints docs

* generates load id for all root tables

* warns on usage of write disposition in nested hints, fixes typo and commented leftovers

* fixes inferred columns order in relational normalizer

* installs latest dlt-plus from PyPI when validating docs

* inludes snippets setting in mypy.ini

---------

Co-authored-by: zilto <zilto@github.com>
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-03-25 20:19:23 +01:00
rudolfix
1a5d7740f6 adds common tmp dir and ref importer (#2276)
* calls on_resolved etc. on reverse mro

* adds tmp_dir to run context, defaults to cwd or DLT_TMP_DIR

* does not set destination_name to destination_type when resolving DestinationClientConfiguration

* adds common ref importer with traces, missing dependency importer and applies to destination and source refs

* tests tmp_dir in plugin

* removes package check for mypy

* makes all destination using local files to follow tmp_dir, unifies how local files are named, uses destination name to name databases

* allows for callable destination attr for ref

* allows for () -> Destination in import typechecker

* allows for explicit and necessary prereleases if uv is used for venv

* tmp dir does not depend on PROJECT_DIR env

* converts query prop into method in relation.py

* bumps sqlglot in lock to allow for ibis 10

* corrects destination name in pipeline state to represent configured name

* improves some tests

* adds warning when duckdb catalog is identical to dataset name

* bumps lock file to get dev env on Py 3.12

* fixes tests

* adds string encoding option to postgres destination, set to utf-8 in redshift

* shifts tests to pymysql

* warns if dataset name is normalized and changed

* disables r2 delta login test due to bug in delta-rs

* adjusts max identifier length in naming convention for dynamic destination caps

* fixes other tests

* allows to run ibis 10 with redshift
2025-02-11 01:49:31 +01:00
rudolfix
8d8b4c3bad typed entity registries (#2236)
* fixes inserted_at to datetime

* adds pipeline configuration into (pipelines, name) section

* registers custom destinations via synthesized types

* adds destination registry, adds autoimport to registries, refactors common destination files

* sets destination_name to callable name for custom destination

* fixes sources registry fixture

* bumps dlt to 1.6.0a0

* adds global dir path

* adds plus info to anon tracker

* checks if dlt can be imported in alpine container

* adds top level module to run context

* stores top level plugin modules to resolve shorthand references

* adds destination references with shothand expansion

* adds preferred table formats to destination caps

* uses types as reference to sources, creates DltResource instance in from_reference

* tests plugin disovery with references

* tests plus plugin telemetry

* converts dict arrow types before sending to delta or iceberg

* updates deps, incl duckdb

* adds plug/unplug callbacks for run context

* improves reading snapshots on iceberg

* fixes tests and deps

* allows to push and pop context on stack, fixes some tests

* plugs and unplugs content on reload

* always refresh views on abfss + sql client filesystem

* fixes some tests
2025-01-29 11:08:27 +01:00
David Scharf
cbcff925ba drop python 3.8, enable python 3.13, and enable full linting for 3.12 (#2194)
* add python 3.12 linting

* update locked versions to make project installable on py 3.12

* update flake8

* downgrade poetry for all tests relying on python3.8

* drop python 3.8

* enable python3.13

* copy test updates from python3.13 branch

* update locked sentry version

* pin poetry to 1.8.5

* install ibis outside of poetry

* rename to workflows for consistency

* switch to published alpha version of dlt-pendulum for python 3.13

* fix images

* add note to readme
2025-01-12 16:40:41 +01:00
Jorrit Sandbrink
4e5a2405e2 iceberg table format support for filesystem destination (#2067)
* add pyiceberg dependency and upgrade mypy

- mypy upgrade needed to solve this issue: https://github.com/apache/iceberg-python/issues/768
- uses <1.13.0 requirement on mypy because 1.13.0 gives error
- new lint errors arising due to version upgrade are simply ignored

* extend pyiceberg dependencies

* remove redundant delta annotation

* add basic local filesystem iceberg support

* add active table format setting

* disable merge tests for iceberg table format

* restore non-redundant extra info

* refactor to in-memory iceberg catalog

* add s3 support for iceberg table format

* add schema evolution support for iceberg table format

* extract _register_table function

* add partition support for iceberg table format

* update docstring

* enable child table test for iceberg table format

* enable empty source test for iceberg table format

* make iceberg catalog namespace configurable and default to dataset name

* add optional typing

* fix typo

* improve typing

* extract logic into dedicated function

* add iceberg read support to filesystem sql client

* remove unused import

* add todo

* extract logic into separate functions

* add azure support for iceberg table format

* generalize delta table format tests

* enable get tables function test for iceberg table format

* remove ignores

* undo table directory management change

* enable test_read_interfaces tests for iceberg

* fix active table format filter

* use mixin for object store rs credentials

* generalize catalog typing

* extract pyiceberg scheme mapping into separate function

* generalize credentials mixin test setup

* remove unused import

* add centralized fallback to append when merge is not supported

* Revert "add centralized fallback to append when merge is not supported"

This reverts commit 54cd0bcebf.

* fall back to append if merge is not supported on filesystem

* fix test for s3-compatible storage

* remove obsolete code path

* exclude gcs read interface tests for iceberg

* add gcs support for iceberg table format

* switch to UnsupportedAuthenticationMethodException

* add iceberg table format docs

* use shorter pipeline name to prevent too long sql identifiers

* add iceberg catalog note to docs

* black format

* use shorter pipeline name to prevent too long sql identifiers

* correct max id length for sqlalchemy mysql dialect

* Revert "use shorter pipeline name to prevent too long sql identifiers"

This reverts commit 6cce03b771.

* Revert "use shorter pipeline name to prevent too long sql identifiers"

This reverts commit ef29aa7c2f.

* replace show with execute to prevent useless print output

* add abfss scheme to test

* remove az support for iceberg table format

* remove iceberg bucket test exclusion

* add note to docs on azure scheme support for iceberg table format

* exclude iceberg from duckdb s3-compatibility test

* disable pyiceberg info logs for tests

* extend table format docs and move into own page

* upgrade adlfs to enable account_host attribute

* Merge branch 'devel' of https://github.com/dlt-hub/dlt into feat/1996-iceberg-filesystem

* fix lint errors

* re-add pyiceberg dependency

* enabled iceberg in dbt-duckdb

* upgrade pyiceberg version

* remove pyiceberg mypy errors across python version

* does not install airflow group for dev

* fixes gcp oauth iceberg credentials handling

* fixes ca cert bundle duckdb azure on ci

* allow for airflow dep to be present during type check

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2024-12-11 09:35:59 +01:00
rudolfix
f4faa836df #2087 allows double underscores in identifiers (#2098)
* removes astunparse and aiohttp

* allows for built-in ast unparse if present

* uses break path for normalization to allow names containing path separators, migrates old schema to enable compat mode with old behavior

* adds removeprefix util

* updates docs

* bumps dlt to version 1.4.1

* linter fixes

* fixes tests

* fixes and tests saving pandas indexes

* fixes sqllite read interface tests

* updates docs
2024-12-02 16:24:57 +01:00
rudolfix
bfd0b52848 azure account host docs (#2091)
* updates docs

* bumps to alpha 1.4.1a0

* fixes mypy linting
2024-11-24 00:54:03 +01:00
rudolfix
810e619cd4 adds engine adapter and passes incremental and engine to query adapter (#2070)
* adds engine adapter and passes incremental and engine to query adapter

* adds usage for extended query adapter

* supports custom account host for azure blob storage

* does not wake loader in retry in job

* fixes typing and azure url generation

* fixes docs

* allows subqueries in table adapter and accepts write disposition in sql_table

* fixes type error due to signature detection

* adds table adapter to remove nullability

* fixes subquery table adapter test
2024-11-23 22:38:32 +01:00
David Scharf
956619996e docs: add typechecking to embedded snippets (#1130)
* fixes a couple of typechecking errors in the docs

* fix more snippets and enable mypy typechecking on embedded snippets

* switch to glob for file discovery

* additional typing checks for merged in blog posts

* fix snippets after devel merge
2024-03-26 13:38:21 +01:00
David Scharf
8b282264e2 add grammar fixing script to docs tools (#1117) 2024-03-20 11:50:36 +01:00
David Scharf
713aa314d0 Extend custom destination (#1107)
* rename tests file

* add setting to skip dlt internal tables and columns in custom destination

* add nesting level setting to custom destination
update readme

* use correct internal dlt schema item marker
propagate the max_nesting_level the correct way from the destination caps

* add example for custom destination bigquery

* fix embedded snippet checker output

* add custom destination example to docs

* update custom destination example

* pin flake8-encodings to fork

* fix snippet marker

* ignore google imports

* Docs: fix custom destination  (#1113)

* removed sink mentions, fixed code snippets

* rename title

* trigger tests

* trigger tests 2

* revert changes

* small edits

* pin databind.json python package

* pin databind core

* add bigquery extra for snippets tests

* updates to the readme

* rename function for nesting level test

---------

Co-authored-by: Alena Astrakhantseva <alena@dlthub.com>
2024-03-20 08:09:47 +01:00
Steinthor Palsson
36d1718e01 Testing databricks with s3 staging 2024-01-26 16:31:42 -05:00
rudolfix
1f94a3b21a Rfix/load package extract (#790)
* implements package storage

* removes deprecated to_service_account_credentials

* allows to reopen buffered writer

* WIP extract state with other resources

* adds version hash to state

* adds load id to extract step

* adds typing based deprecations

* adds high precision timer for windows

* adds listing of packages and tests
2023-11-29 22:10:51 +01:00
rudolfix
7325c4f029 arrow example (#707)
* sends region name to s3 fsspec client

* shows warning in config missing exception when pipeline script not in working folder

* counts rows of arrow tables in writers to rotate files properly

* adds arrow + connector x example

* bumps to mypy 1.6.1 and fixes transformer decorator

* explains variant column and other fixes in docs

* reuses primary key for index in incremental

* removes unix ts autodetect by default, add/remove detects in schema

* passes column schema to arrow writer

* fixes tests

* adds blog post

---------

Co-authored-by: Adrian <Adrian>
2023-10-24 11:52:59 +02:00
Steinthor Palsson
e71bd37602 Support aws config endpoint_url with fsspec (#701)
* Support aws config `endpoint_url` with fsspec

* Test r2 bucket
2023-10-22 17:46:04 +02:00
David Scharf
58f8ad1049 examples for docs (#616)
* remove snipsync (#613)

* add custom snippets element

* remove snipsync

* migrate performance snippets

* add missing init files

* refine snippet watcher

* docs examples

* fixes chess example

* fixes DLT -> dlt

* more work on transformers example

* make header smaller

* example for zendesk incremental loading

* move incremental loading example to right location

* added text and output example to incremental zendesk


* allow secrets files in examples

* add zendesk credentials

* correct text and code snippets for zendesk example

* add main clause

* add config example

* pytest marker to skip tests running in PRs from forks on github

* removes more typings and adds comments to zendesk example

* shortens example titles

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: AstrakhantsevaAA <astra92293@gmail.com>
2023-10-04 11:05:18 +02:00
Steinthor Palsson
bda6c0ca08 Ignore snippet imports 2023-09-28 22:54:17 -04:00
Steinthor Palsson
9032f43401 No pandas stubs, lockfile 2023-09-28 20:12:07 -04:00
Steinthor Palsson
1135b735d4 Run mypy on tests, fix/ignore all type errors in tests 2023-09-28 18:46:04 -04:00
Steinthor Palsson
d993e288c1 Install missing stubs and remove global ignore_missing_imports 2023-09-28 18:33:09 -04:00
Alena Astrakhantseva
40869ddaa5 Docs: getting started (#568)
* implement normaliser table counts

* fix small bug in extractor and add trace tests to replace tests

* some more tests

* update running doc

* add multi process row count test

* add row count utils tests

* moves step info getters to PipelineTrace class

* serializes dataclasses directly

* getting started

* fix sidebar

* fix LINK

* fix sidebar

* add tabs

* fix mdx

* fix tabs

* fix tabs

* fix tabs

* add tabs for sources

* fix code blocks

* update with Marcin version

* delete empty link

* fixed images, refactored database example

* delete full refresh

* start adding tested snippets for getting started index page

* rename first snippet

* added tests for load data snippets

* test

* fix code blocks tabulation

* tests for incremental snippets

* replace requests with dlt.sources.helpers.requests

* reorgs gettins started and snippets

* adds evnets example, getting started reorg, removes yarn

* fix docs icons and clean up docusaurus stylesheet a bit

* adds weaviate example

* adds getting started weaviate text

* sets up snippets lints and tests

* post merge fixes

* fix deps in docs lint workflow

* small fixes

* md to mdx

* makes dev groups optional

* sentry sdk default group

* fixes intro

* fixes drop staging dataset

---------

Co-authored-by: Dave <shrps@posteo.net>
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2023-08-31 10:17:07 +02:00
Marcin Rudolf
128b248240 bumps mypy, adds deprecated module 2023-04-22 22:23:39 +02:00
Steinthor Palsson
5943b17613 mypy show error codes 2023-04-09 20:36:01 -04:00
Marcin Rudolf
11a80d8909 various typing improvements 2022-10-31 20:30:04 +01:00
Marcin Rudolf
99de9b897a adds project package, dependencies and dev pipeline 2022-05-23 22:48:04 +02:00