repo-mirrors/dlt - dlt - Gitea: Git with a cup of tea

mirror of https://github.com/dlt-hub/dlt.git synced 2025-12-17 19:31:30 +00:00

Author	SHA1	Message	Date
David Scharf	4a5ffd82b3	Chore: Update docs npm dependencies and clean up docs build tooling (#3247 ) * bump npm deps * remove unneeded netlify redirects file * remove unneeded lockfile * remove another unneeded lockfile * post rebase lockfile update * remove old netlify command * create new docs tools project and move api docs gen there * tmp * add uv to build docs workflow * move docs pyproject * re-org docs pcakage and move snippet linter * move notebook linting commands and deps to tools folder add flake8 to tools linting * remove unneeded files * fix linting and formatting errors * remove wrong file * move docs processing script to new package * fix gen api ref * clean up package json and use commands from parent makefile * update build website workflow * move linting to docs makefile partially * fix python version for docs project * consolidate docs commands in docs makefile * fix docs linter * fully update docs test flow * fixes some linting and dependency problems * fix constants * move notebook formatting to docs project * fix lint embedded snippets * fix examples tests * add missing dependencies * fix snippet linting * add missing lint dependencies to core and missing test dependencies to docs * add missing weaviate * add missing regex module * add forked dependency and updates readme file * revert accidental change to example * fix main linter * * Move relevant pytest options to subproject * Remove shims / path inserts that are now managed by pytest options * Some typing fixes * Clean up base project pytest ini * Enable transformation snippets tests * remove unneeded raw import of intro snippets * downgrade alive progress * uses dlt logger which also fixes internal alive error * enables transformation snippets linting * fixes dashboard races again --------- Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>	2025-11-16 18:01:30 +01:00
rudolfix	91dc3d955f	avoids passing naming conventions as modules (#3229 ) * adds /home/rudolfix/src/dlt to sys.path when running dlt commands and a cli flag to disable it * adds cli docs check to lint * avoids passing custom naming as modules in docs * removes cli docs check due to Python 3.9 * fixes deploy cli * adds pokemon table count consts * improves custom naming convention docs	2025-10-23 13:45:06 +02:00
rudolfix	0dcdcf0e33	ignores native config values if config spec does not implement those (#3233 ) * does not fail config resolution if native valued provided to a config that does not implement native values * updates databricks docs * allows to replace hints regexes on schema * removes partition hint on eth merge test on databricks * adds pokemon table count consts * reorgs databricks dlt fix * fixes lancedb custom destination example * fixes lancedb custom destination example * reduces no sql_database examples run on ci * fixes merge * marks and skips rfam tests	2025-10-22 22:48:13 +02:00
rudolfix	563c764f29	Feat/adds workspace configuration (#3221 ) * removes runtime configuration from pipeline context to run context with corresponding action of initializing local runtime * improves telemetry instrumentation decorator + tests + disable telemetry in dlt tests by default * resolves DashboardConfiguration so it is placed within workspace or pipeline configuration * adds and resolves WorkspaceConfiguration and corresponding WorkspaceRuntimeConfiguration * reorganizes cli commands, wrappers, adds missing telemetry tracking * uses working and local dir overrides without adding profile names * uses python language to display stacktraces in marimo * restores runtime_config on pipeline pointing to new PipelineRuntimeConfiguration * renames working dir def _data to .var and updates .gitignore * adds workspace show command * adds reload method on run context, fixes requests helper test * slight cli improvements * new workspace writeup without sidebar * docstring for cli plugins	2025-10-20 23:18:54 +02:00
rudolfix	fe567414dc	chore/moves cli to `_workspace` module (#3215 ) * adds selective required context, checks profile support in switch_profile * creates and tests hub module * adds plugin version to telemetry * renames imports in docs * renames ci workflows * fixes lint * tests deploy command on duckdb * moves cli module to workspace * moves cli tests to workspace module * renames fixtures, rewrites fixture to patch run context to _storage * allows to patch global dir in workspace context * when finding git repo, does not look up if GIT_CEILING_DIRECTORIES is set * imports git utils only when need to clone package in dbt runner * runs workspace tests as part of common * fixes tests, config tests sideeffects * moves dashboards to workspace * fixes pipeline trace test * moves dashboard helper tests * excludes additional secret files and pinned profile from gitignore * cleansup hatchling files in pyproject * fixes dashboard running tests in ci * moves git module to libs * diff fix * fixes fixture names	2025-10-19 15:21:42 +02:00
rudolfix	01698752db	Feat/adds workspace (#3171 ) * ports toml config provider with profiles * supports run context with profiles * separates pluggy hooks from impls, uses pyproject and __plugins__.py for self-plugging * implements workspace run context with profiles and basic cli * displays workspace name and profile name before executing cli commands if run context supports profiles * exposes dlt.current.workspace() * converts run context protocol into abstract class * fixes plugins tests * refactors _workspace: private and public modules * adds workspace test cases * launches workspace and pipeline mpc with cli, sse by default * tests basic workspace behaviors * refactors code to switch context and profile * adds default profile to run context interface * ports pipeline and oss mcp, changes derivation structure * adds safeguards and tests to workspace cleanup cli helper * adds run_context to SupportsPipeline, checks run_context change on pipeline activation * adds mcp dependency to workspace extra, fixes types * renames test fixture * mcp export tweak * updates cli reference and common ci workflow * disables dlt-plus deps in ci * removes df from mcp tools, fixes workspace tests * fixes tests	2025-10-08 20:16:34 +02:00
David Scharf	b923062c51	Docs docusaurus / cloudflare fixes (#3114 ) * bump all dependencies * fix one admonition * normalize docs urls * migrate depcreated admonitions * fix admonition type for source info header * some comments	2025-09-22 18:32:17 +02:00
David Scharf	5d29c0ded0	Dashboard updates and fixes (#3055 ) * fix bug in child tables data browsing * fixes streamlit launch, prevents streamlit launch after marimo launch * disables trace json serialization * removes streamlit hot reload cli flag * fix smaller bugs and start adding parametrized tests to pipeline utils * update cli docs * parametrize utils tests with different pipeline types and states * start fixing e2e tests * change filesystem bucket url * move example pipelines into separate folder * extracts more helpers into utils improves error handling and messaging * add more tests and move sql query under utils exception wrapper * final fixes to e2e test and add no destination pipeline to unit tests * render mo tables in unit tests for applicable helper functions use mo.json object view for state in all cases instead of yaml * allow map_nested_in_place to also process keys use this in trace sanitizing use repr to keep nested hint keys and show a good string representation add test case that makes sure traces of nested hints can be rendered * update e2e tests to respect new json view of state * remove cloning of dict from map_nested_in_place * remove streamlit mentions and add marimo references in appropriate places * update dashboard page and insert some images * separate mapping function for nested keys and values * update dashboard utils to new mapping function * post merge fixes * add dlt+ fix for backwards compatibility --------- Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>	2025-09-09 16:01:02 +02:00
rudolfix	823bf3865f	fully support naive and tz-aware timestamp/time data types (#2570 ) * adds databricks timestamp NTZ * improves error messages in pyarrow tuples to arrow * decreases timestamp precision to 6 for mssql * adds naive datetime to all data types case, enables fallback when testing destinations not supporting it * other test fixes * always stores incremental state last value as present in the data, tests tz-awareness edge cases * fixes ntz timestamp tests * fixes sqlalchemy destination to work with mssql * adds func to current module to get current resource instance * generates LIMIT clause in sql_database when limit step is present * adds basic tests for mssql in sql_database * adds docs on tz-awareness in datetime columns in sql_database * adds naive an tz aware datetimes to destination caps, implements for various destinations * caches dlt type to python type conversion * normalizes timezone handling in timestamp and time data types, fixes remaining pendulum timezone problems, applies tz/non-tz preserving methods when necessary, improves test converage * fixes incremental and lag so they always follow the tz-awareness of the data under cursor column, fixes pendulum tz problems, adds tests * moves schema inference and data coercion from Schema to item_normalizers, applies timezone normalization to json data, adjusts new columns to destination caps for json data, tests * casts timezones in arrow table normalizations, datetime and time cases in row tuples to arrow, refactors to get generic method to cast tables to dlt schemas, tests * tracks resource parent, along pipe parent, fixes resource cloning when adding to source, fixes source and resource iterators, makes sure that list of extracted resources always includes implicit and explicit resources * updates dbapi sql client for dremio * adjust column schema inferred from arrow to destination caps in extractor, tests * moves schema and data setup for all data types tests to common code * adds option to exclude columns in sql_table, uses LimitItem to generate LIMIT statements, tests incl. proper cursor tests for naive/tz aware incremental cursor columns * tests sql_database on mssql for all data types and incremental cursor on dates * improves tests for row tuples to arrow with cast to dlt schema, tests for naive datetimes * improved test for timestamps and int with precision on duckdb * disables Python 3.14 tests and dashboard test on mac * better maybe transaction in job client: takes into account ddl and regular transaction destination caps * pyodbc py3.13 bump	2025-08-31 20:06:22 +02:00
Thierry Jean	eb95c36f3c	fix: replace `arrow2` with `arrow` backend for `connectorx` (#2933 ) * replace arrow2 with arrow backend for connectorx * updated docs/ * updated minimal deps * update docs and pyproject.toml deps * updated minimal deps to support 3.9 * converts +00:00 to UTC right after handover from connectorx * fixes examples connectorx lint --------- Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>	2025-08-04 17:01:34 +02:00
dat-a-man	f70b50a46e	Updating custom configurations with @configspec decorator (#2826 )	2025-07-23 15:55:02 +02:00
David Scharf	682e900492	make docs snippets tests use local secrets (#2903 ) * make docs snippets tests independent of secrets * move examples tests into own workflow and remove github fork marker * make custom naming example use secrets file instead of hardcoded secrets	2025-07-17 10:42:45 +02:00
David Scharf	21b68e61f1	Add workspace extra and rename marimo app to "pipeline dashboard" (#2876 ) * adds dlt workspace extra, updates exception and github workflows * renames app from "marimo app" to "pipeline dashboard" updates --marimo flag to --dashboard * rename studio folders to dashboard * removes all other references to studio * exclude lockfile and markdown files from lfs * update workspace extra dependency versions * bump version	2025-07-14 21:26:50 +02:00
djudjuu	8040ca2da0	do not run lancedb custom destination example test on forked subprocess (#2854 ) * do not run lancedb custom destination example test on forked subprocess * use lancedb connection correctly	2025-07-14 11:10:00 +02:00
djudjuu	96014481be	update lancedb orphan deletion mechanism (#2820 ) * bump to latest lancedb * do not pass api-key to embedding_func, align schema for orphan deletion * bump lancedb * updated example * use pyarrow helpers in type mapper * removes code duplication from lancedb_client, moves jobs to a separate module * sets nullability, fixes schema on merge to include vector column if not added by the user, removes nullability on auto-embed columns in adapter * read vector field from config * fix nullability test hint * unit test add_vector_column * more specific ValueError parsing * no longer accept value error when opening table * schema alignment test next versions * no fusion datatype typecasting * refactor * problems with json loading * test fixes * fixes column normalization when reading existing schema * warn against orphan removal without settings * added docs * todos, check for merge-disposition * fixed missing load tests * fixed tests * fixed multiple merge keys condition * pyarrow precision types * remove unused code * added max precision in LanceDB tests * remove arrow to fsiont_tupe tests * refactor * prepare_load_table in orphan removal job * documentation update * refactor * adds method to get dict of non-default values from configuration * moves parquet and csv format configuration from data writers to destination * adds parquet format to destination caps to allow lancedb to have custom settings * adds more lancedb configs, moves connect method to credentials, allows lancedb client to be passed instead of creds * forces arrow list struct to be saved in parquet, not the parquet default * looks for row key only for merge disposition * moves fill_empty_source_column_values_with_placeholder to pyarrow helper * tests bring own vector and explicit client as credentials * ignores lancedb in mypy.ini * adds missing docs * deprecates file format configs in data writers * fix unit tests for add_vector_column * adjust example code to updated lancedb exceptions * skip lancedb example (because running on fork breaks) --------- Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org> Co-authored-by: MOLKA ZHANI <molka@dlthub.com>	2025-07-07 19:09:17 +02:00
David Scharf	3ebbfa1f9e	migrate to uv (#2766 ) * move pyproject.toml and makefile from old branch and add inbetween changes * update workflow files to use uv * run new version of formatter * fix building of images with uv * possibly fix docs linting * downgrade lancedb dependency to fix tests * fix gcs compat mode for s3 for newest boto * fix docstrings in examples * add some uv constraints * update readme.md and contributing.md and some other places * allow duckdb 0.8 in range * add link-mode copy to uv venv on windows * remove poetry lockfile and unneeded lockfile checker * fix chess api related failures * sleep after dremio start.. * set correct package in pyproject * Revert "add some uv constraints" This reverts commit `d611e9ecce`. # Conflicts: # pyproject.toml # uv.lock * add missing databricks sql connector version bounds	2025-06-19 10:11:24 +02:00
David Scharf	670ac1d940	alway test docs snippets, not only after authorization (#2769 ) * alway test snippets * maybe fix chess examples (cherry picked from commit ef5577bde5405398e327455b61c99e2a053263a3) * fix docs snippets linting	2025-06-17 22:01:24 +02:00
rudolfix	f821d21165	fixes leaking datasets tests (#2730 ) * adds optional pipeline activation history to context * allows to configure configs and pragmas for duckdb, improves sql_client, tests * allows query string for motherduck, tests WIP * mocks local_dir correctly to place local files, drop duckdb in pipeline fixture in most places * enables activation factory to drop datasets from all pipelines * uses correct fixture scope in test read interfaces * bumps duckdb and pyarrow * ignores some flake8 errors * logs resolved traces thread-wise, clears log between pipeline runs * improves duckdb tests and docs * bumps arrow to v20 because duckdb 1.3 needs at least 19 for its types * fixes tests - mostly duckdb database locations * fixes lockfile * fixes edge cases when passing setting to duckdb connection * disables iceberg abfss tests * refactors WithLocalFiles so they can be used independent from destination * more local dir test fixes * moves WithLocalFiles to common storages configuration * tests edge cases when setting configs on duckdb fails * updates docs * reverts duckdb to 1.2.1 - last stable version * more test fixes * moves create_secret to duckdb sqlclient * disables building of Dockerfile until we upgrade arrow * skip gcs compat test for local clickhouse tests --------- Co-authored-by: dave <shrps@posteo.net>	2025-06-11 22:17:05 +02:00
David Scharf	36ee706122	Update github workflow setup (#2728 ) * use both pull request and pull request target on destination workflows * remove additional triggers * marks one test as smoke test and only runs this for the time being * only run one test in common, needs to be reverted later * run common tests only on linter success * fix common workflow * only start workflows on call (do not call them yet) * test master workflow * remove docs changes step from lint * remove local destinations docs change * rename master trigger workflows * change concurrency key * try other dependencies * add destination tests with authorize step * remove authorize and docs step from destination tests * fix destination test * rename main workflow * test inherit secrets * add more workflows to main file * fix starting conditions for some workflows * rename plus tests matrix job * remove concurrency settings for now * add first remote destinations workflow version * move some more remote destinations * remove pytest args * try to fix extras string * add more remote destination tests * rename some workflows and add concurrency settings to main workflow * move test_destinations * fix link to called workflow * add better main workflow labels move clickhouse remote tests * create local destinations test * disabled some workflows * disable clickhouse oss for now split duckdb and postgres local tests into own matrix job * copy ssh agent key * move all local destination secrets into template secrets file * small fixes * enable all tests again * fix local tests * add missing openai dep * try to fix qdrant creds * fix qdrant server / local file differentiation * fix cli test * change workflow dependencies * remove telemetry info and other small changes * run dummy destination with the local tests * remove duckdb from remote tests, always run all mssql and postgres tests * enable clickhouse oss * fix condition for always running all tests * move cli commands to postgres tests * rename clickhouse-compose to be inline with other services * fix clickhouse local credentials and disable tests which require staging destinations * adapt postgres to postgres example to new fixture * fix clickhouse excluded configs * update essential test handling * skip gcs compat test for local clickhouse tests	2025-06-11 15:09:06 +02:00
rudolfix	b472ab7168	[transformations] decouples sqlglot lineage and schema generation from destination identifiers (#2705 ) * uppercase env var * fix linting and marimo e2e tests * enables only x-annotation propagation, fixes lineage test to include clickhouse, sqlalchemy and clickhouse * computes sqlglot schema and lineage solely on dlt schema identifiers, disables any normalization and table name expansion * computes ibis unbound table solely on dlt schema identifiers, disables any normalization and table name expansion * makes ibis relation to work on dlt schema identifiers * decouples query generation from query normalization in base relation. query normalization will expand table names, qualify tables, case fold and quote * adds capability to check if nulls are enforced on alter * adds option to get table path without casefolding * rewrites how identifiers are normalized in sqlalchemy * makes test_read_interfaces work with all destinations without escaping, WIP * fixes how credentials are emitted by destination_config * fixes linting issues for marimo * revert name / type scoped destination configs * fix pii annotations hint * quote table names in row counts (will not work with table names with white spaces otherwise) * format * fix marimo app linting errors * normalize database name in sqlglot schema fix anoynmous column detection in lineage * disables one lineage test.. * fix dataset mismatch in query resolution the correct way * remove qualified table names from some selectors * fix a couple more tests * make normalizing of query for pure sql relations optional use normalized query in transformations * fix default of normalizing query cache sqlglot schema on dataset * move query normalization into utils, cache result and do not modify original qualified query * directly access normalized_query from relation * disable sqlglot schema cache on dataset * fix filesystem tests and disallow access of non-existent table * fix unrelated breakage in lancedb example * update tests that were using tables not in schema on datasets * fix snowflake tests, re-enable two disabled tests * fix last snowflake test --------- Co-authored-by: djudjuu <julius@dlthub.com> Co-authored-by: David Scharf <shrps@posteo.net>	2025-06-04 20:29:30 +02:00
David Scharf	5ceba48757	dlt.transformation implementation (#2528 ) * triggers devel tests * fixed malformed docstring * use native sqlglot type annotation * pass hints via SQLGlot metadata * fix linter errors and tests * fix a few more tests and edge cases * fix bug in lineage * enable columns schema for both ReadableRelation Types * add more tests and make lineage tests independent from loading * add lineage tests for all sql destinations * enable tests on ci and disable column schema for sqlalchemy for now * fix some more tests * add sqlalchemy hack * first fix for snowflake and some smaller chnages and clarifications * fix sqlglot schema creation, makes clickhouse work * re-add transformations tests folder * fix lineage datatype * disable databricks and synapse ibis backend tests * move transformation code from prototype excluding old lineage and including updates so that linter passes, no real code changes yet. * fix some of the python extractor based transformations * fix most tests * make basic transformation tests run on all destinations * enable all current transformation tests for all destinations run some duckdb transformations on all OSes * a little bit of cleanup * move common transactions and mark all destination transaction tests as essential for now * Add improvements from review in prototype PR and some cleanup * exclude dremio * fix some transformations tests * fix row_counts for snowflake and add some comments * converts SupportsReadableRelation to an ABC * add scalar access to SupportsReadableRelation * simplify transformation signature * add top level dlt objects and some small changes * second part of removing transformation extra args * add clickhouse tests * add config based transformation source * add better transformation examples * use fruitshop template for testing * remove custom row_counts method in favor of "global" test one * first draft of transformations doc * some work on the docs page * feat: 2540 lineage `allow_unknown_columns` and `allow_anonymous_columns` (#2577) * test compute_columns_schema() and exception handling * convert transformation code examples to snippets * finish first round of transformation docs * Quite a few PR fixes * fixes some tests * add support and docs for dataframe and arrow operations * add config and fallback if destination not reachable * fix scalar method fallback to models if pipeline destination is not available * hopefully fix one test * Docs: addition of normalizer behaviour to transformations docs (#2639) * Normalizer info added * Unnecessary paragraph removed, regular normalization linked * feat: 2540 - SQLGlot type mapping (#2587) * fixes some tests * post rebase cleanup * renamed kwarg * type handling done; WIP * sqlglot-dlt type mapping completed * added docstrings to tests * removed unused test file * attach metadata to DataType * refactored test to parameterized form * refactor function names * bug fix .to_py() * rename compute_columns_schema() kwargs * refactor type conversion branches * fixes some tests * add support and docs for dataframe and arrow operations * add config and fallback if destination not reachable * fix scalar method fallback to models if pipeline destination is not available * fix: update return type in athena_adapter docstring to reflect correct destination (#2599) * list secrets in vault config provider to avoid calls to backend (#2597) * fixes bug where configuration section was not propagated when embedded configuration is resolved * splits vault provider settings per vault type * adds option to list secrets to vault and google secrets provider * uses google secrets provider with global cache for tests * documents vault provider * test and docs fixes * slightly clarify clickhouse docs (#2594) * slightly clarify clickhouse docs * Update clickhouse.md * Extract dataset code snippets into tests snippets system (#2598) * extracts dataset code blocks into tested snippets and uses fruitshop pipeline as base dataset for demonstration purposes * add ibis group * Enabling 'model' loader_file_format for athena, synapse and dremio (#2556) * Athena model loader format initial support * test_verify_capabilities_data_types adjusted for athena * Synapse enabled * The offset logic for tsql made unreachable * Athena test config without iceberg removed, dremio added * Unnecessary synapse workaround removed * fix some typos in cursor-restapi docs (#2608) * fix some typos in cursor-restapi docs * fix typo * refactor init-command for use in dlt project (#2568) * refactor init-command for use in dlt project * remove config.toml from project docs * fix ibis mypy error --------- Co-authored-by: dave <shrps@posteo.net> * docs: Fix incorrect nesting in secrets.toml (#2614) * fixes parquet data writer settings docs & rewrites configuration docs (#2583) * fixes parquet data writer settings docs * adds section to dlt resource decorator * fixes and tests how config sections are created when single resource is extracted * fixes config sections for parallel doc example * exports postgres adapter * rewrites configuration docs, moves a few docs sections in sidebar * snippet fixes * accepts docs changes from review Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com> * adds tip how to eject core source * linter fixes --------- Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com> * enables fsspec per-thread instance cache and updates documentation (#2621) * bumps pendulum and docs (#2624) * fixes sql database docstrings and docs * bumps poetry to 3.0.1 and drop dlt poetry * Added dedup sort example (#2235) * Added dedup sort example * Updated formatting * Updated * Updated * Update docs/website/docs/general-usage/incremental-loading.md --------- Co-authored-by: Alena Astrakhantseva <alena@dlthub.com> Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org> * Docs: add advanced project tutorial (#2338) * hopefully fix one test * trigger ci * improve tests, lint --------- Co-authored-by: David Scharf <shrps@posteo.net> Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com> Co-authored-by: rudolfix <rudolfix@rudolfix.org> Co-authored-by: anuunchin <88698977+anuunchin@users.noreply.github.com> Co-authored-by: hsm207 <hsm207@users.noreply.github.com> Co-authored-by: djudjuu <djudju@proton.me> Co-authored-by: Alexander Grueneberg <com.github@agrueneberg.info> Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com> Co-authored-by: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Co-authored-by: Alena Astrakhantseva <alena@dlthub.com> * qualify all queries that come into the transformations * fix lineage for snowflake and clickhouse lineage * apply schema fix for sqlglot and remove special treatment of snowflake * align datasets interfaces with ibis implementation ["col"] selects column and not table with one column * disable incremental on transformations decorator and warn if incremental args are discovered * fixes one more test * fixes snowflake tests after sqlglot schema fix * removes standalone resources, fixes transformation function wrapping (#2684) * changes contrib and README (#2666) * changes contrib and README * Apply suggestions from code review Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com> --------- Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com> * raises if resolving dataclass without configspec * adds function type inspect that follows wrappers * removes make fun, uses wraps * adds conftest to transformations * (1) fixes tranformation overloads (2) passes TransformationConfiguration as base spec so buffer is always injected (3) wraps tranformation_function (4) makes str SQL a model (5) tests configurations and parametrized transformations * (1) removes resources returning resources (2) allows resources to be also functions (3) allows base spec to be passed to resource function (4) makes DltResource and SourceFactory to wrap decorated function and fixes signatures (5) allows inner resources to be injectable, warns for transformers (6) normalizes and tests how functions are wrapped and unwrapped so signatures and configs are available * normalizes config resolve behavior: default values can be overriden from providers but explicit cannot. if those were instances of base configurations, behavior was inconsistent (explicit values were treated like defaults). also if native value is found for a config and it does not accept native values, config resolution will fail, previously it was ignored * do not use config specs cached in module when creating autospecs * fixes venv tests when uv is present * if incremental parses from another incremental as native value, it copies origina type correctly * merges standalone resources with regular resources: (1) all are DltResources (2) we generate the correct types for __call__! (3) all resources can be configured including inner resources and including default params, previously only standalone could. that unifies behavior for resources and sources re. config injection (4) resources can return another resources if have DltResource in type annotation (5) resources can be renamed with lambda names also sections can be renamed * fixes transformation decorators so they generate correct typing * binds params to resource function instead of using defaults to avoid generating config injection in rest_api * removes remaining full_refresh flags * fixes Makefile commands to run common and local destination tests * fixes xdg home test * fixes venv tests for uv * linter and docsstring fixes --------- Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com> * allows for initial values that are configurations also in case no native initial values are supported * fixes docs linting * Outer select quotes columns (#2694) * fix normalizer tests * fix a few small tests * remove dependency on ibis for common tests (not supported on python 3.13) * fixes for python 3.9 * fix sqlglot schema propagation and retrieval * fixes leaking sqlalchemy credentials into other test * skip not materialized columns in sqlglot schema generation --------- Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org> Co-authored-by: zilto <zilto@github.com> Co-authored-by: Thierry Jean <68975210+zilto@users.noreply.github.com> Co-authored-by: anuunchin <88698977+anuunchin@users.noreply.github.com> Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com> Co-authored-by: hsm207 <hsm207@users.noreply.github.com> Co-authored-by: djudjuu <djudju@proton.me> Co-authored-by: Alexander Grueneberg <com.github@agrueneberg.info> Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com> Co-authored-by: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Co-authored-by: Alena Astrakhantseva <alena@dlthub.com>	2025-05-30 17:42:28 +02:00
Anton Burnashev	bdb6904041	docs: split incremental loading page (#2592 ) * Split incremental loading page * Restructure docs references to incremental loading; fix broken links	2025-05-14 10:28:10 +02:00
rudolfix	794bc853d0	2457-refactors iceberg and duckdb cache support (#2430 ) * makes pyiceberg helper more generic, makes clear catalog is ephemeral * filesystem config normalizes bucket url also on partial, saved original version * extracts base cache sql client to create views on any destination * refactors filesystem config to add with local files mixin * bumps pyiceberg to 0.9 * passes file_format via schema so it can be used to recognize file formats in filesystem sql_client * improves how secrets are handles in WithTableScanners * fixes wrong resolve for WithLocalFiles configuration * implements aws credentials from fileio * defines SupportsOpenTables interface and implements it for filesystem * defines exceptions for supports open tables * bumps and simplifies deltalake * fixes nullability warning and skips NOT NULL on duckdb ALTER with a warning * adds FileIO to credentials ops * makes Athena Iceberg location tag configurable * disables duckdb skipping NOT NULL on alter, adds tests * adds open table client tests * adds replace strategy selector, internal x-replace-strategy hint, removes sql_params * excludes certain statements from transactions when running jobs * borrows and returns sqlalchemy connections in destination * better recognition of terminal and not terminal errors in sqlalchemy * bumps to alpha release * fixes dropping of temp tables in sqlalchemy merge job * fixes some tests * adds a public property to get config locations from Provider * shows info on locations for config providers when displaying exceptions, hides warnings when project context is present * detaches sqllite databases before returning connection. sql alchemy does not do that and locks on connection reuse * raises when open table client not available * applies naming convention to sql client with table scanners	2025-04-10 11:05:00 +02:00
David Scharf	d7ee07541a	update docusaurus & other docs dependency updates (#2331 ) * start updating docusaurus * update one vulnerability * disable versions generation * disabled pydoc markdown on netlify removed links to now non existing api-reference disabled checks for broken anchors * re-enable api reference * update sidebar entries * Revert disabling of api reference links * fix a couple of anchors * fix sidebar logos * fix linter in sidebar fixing script * re-enable sources lists * fixes anchors (cherry picked from commit `38f43c60f2`) * add encoding to sidebar script * fix onboarding call link * disable versions even more * fix merged in broken anchor * fix footer link styling --------- Co-authored-by: akelad <akela@dlthub.com>	2025-03-05 12:34:44 +01:00
David Scharf	fb2a31ae61	add workflow for running plus tests (#2270 ) * add workflow for running plus tests * adds a few more tests * exclude mssql tests from non linux platforms * add dlt nightly package to docs tests * add dlt nightly package to docs tests * cleansup workflow and example * fixes example tests and workflows --------- Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>	2025-02-10 18:51:04 +01:00
David Scharf	c7c33709e0	Force use of typeddict from typingexentions, pin poetry in tests, simple disable python 3.8 (#2185 ) * import correct typeddict version for use in pydantic, disallow use of usual python typeddict imports * add test * update import for examples * fixed some imports * remove python 3.8 lint and test for now * always use typeddict from typing_extensions pin poetry in tests to 1.8.5	2025-01-09 15:38:46 +01:00
David Scharf	268768f78b	convert add_limit to pipe step based limiting (#2131 ) * convert add_limit to step based limiting * prevent late arriving items to be forwarded from limit add some convenience methods for pipe step management * added a few more tests for limit * add more limit functions from branch * remove rate-limiting * fix limiting bug and update docs * revert back to inserting validator step at the same position if replaced * make time limit tests more lenient for mac os tests * tmp * add test for testing incremental with limit * improve limit tests with parallelized case * add backfill example with sql_database * fix linting * remove extra file * only wrap iterators on demand * move items transform steps into extra file	2024-12-16 21:16:40 +01:00
David Scharf	dfde0718ea	ibis support - hand over credentials to ibis backend for a number of destinations (#2004 ) * add PoC for ibis table support on readabledbapidataset * add PoC for exposing an ibis backend for a destination * install ibis dependency for tests * add support for filesystem * remove print statments * remove ibis tables from dbapirelation * clean up interfaces * move backend creation and skip tests for unsupported backend * fix dependencies and typing * mark import not found, can't be linted on 3.8 and 3.9 * add snowflake and bigquery support * add redshift and maybe fix linter * fix linter * remove unneeded dependency * add in missing pipeline drop * fix snowflake table access test * add mssql support * enable synapse * add clickhouse support * enable motherduck * post rebase lock file update * enable motherduck * add missing ibis framework extras * remove argument of create ibis backend * extract destination client factories into dataset file * fix partial loading example * fix setting of default schema name in destination config * fix default dataset for staging destination * post rebase lockfile update * always set azure transport connection	2024-11-23 22:49:17 +01:00
Alena Astrakhantseva	0c6fd65806	nake title shorter (#2032 )	2024-11-06 19:09:21 +01:00
dat-a-man	17847f1d8f	Added partial loading example (#1993 ) * added partial loading example * Updated formatting * Updated * Updated * Updated the logic according to comment	2024-11-06 12:44:17 +01:00
Anton Burnashev	1933c3dfa4	Fix Zendesk example: make test resilient to data changes (#1999 ) * Makes zendesk tests resilient to data changes * Use requests as a module	2024-10-28 20:53:59 +01:00
rudolfix	f290522b8b	unifies run configuration and run context (#1944 ) * allows to pass run_dir via plugin hook + arbitrary args * adds name, data_dir and pipeline deprecation to run_configuration, renames to runtime_configuration * adds before_add, after_remove and improves add_extra when adding to container, tracks reference to container in context * merges run context and provider context, exposes init providers via run context * initializes loggers with run context * does not use config injection when creating default requests Client * removes duplicated code for examples and doc snippets * allows to init requests helper without runtime injection, uses re-entrant locks when injecting context * disables sentry on CI * renames config provider context to container, improves telemetry fixtures in tests	2024-10-15 10:37:55 +02:00
rudolfix	c87e399c7d	adds registries and plugins (#1894 ) * adds sources registry and factory, allows for late config binding and rename, wraps standalone resources * converts rest_api to a standard source * marks secret values with Annotated, allows regular types to be used in configs * reduces the number of modules imported on initial dlt import * removes resource rename via AST in dlt init, provides new templates * replaces hardcoded paths to settings and data with pluggable run context * fixes init command tests * adds plugin system and example plugin tests * uses run context to load secrets / configs * adds run context name to source reference and uses it to resolve * fixes module name and wrong SPEC for single resource sources when registering * adds pluggy * adds methods to get location of entities to run context * fixes toml provider to write toml objects, fixes toml writing to not override old documents and preserve comments * simplifies init command, makes sure it creates files according to run context * fixes dbt test venv, prepares to use uv * adds SPEC for callable resources * fixes wrong SPEC passed to single resource source * allows mock run context to read from env * fixes oauth2 auth dataclass * fixes secrets masking for shorthand auth * adds rest_api auth secret config injections tests, fixes some others * fixes docstrings * allows source references to python modules out of registry * fixes lock	2024-10-09 11:34:31 +02:00
rudolfix	866bce3df3	bumps to 1.0.0 + docs cleanup (#1809 ) * removes blog files * updates schema docs for nested references * updates docs to use nested instead of parent child * adds more migration tests * bumps to 1.0.0 * adds scd2 tests	2024-09-16 14:44:33 +02:00
rudolfix	a6857c9c66	migrates complex data type and nested reference hints (#1792 ) * adds fallback to complex variant column if it exists * adds mogrations for comples data type and preferred dt * renames complex in docs * renames complex * fixes bug with dynamic columns in make_hints * adds v10 schema engine fixture * finalizes comples -> json rename, adds more tests * adds row_key and parent_key, drops foreign_key, adds migrations and updates test schemas * test fixes * deprecates skip_complex_types Pydantic config, updates trace contract	2024-09-10 21:48:12 +02:00
Willi Müller	79c70c91e3	Feat/1749 abort load package and raise exception on terminal errors in jobs (#1781 ) * defaults `raise_on_failed_jobs = True`. Adapts test_dummy_client.py * updates docs on terminal exceptions on failed jobs * undoes change of test assertion, changes test setup instead * removes calls to raise_on_failed_jobs() in docs * Enables setting of raise_on_failed_jobs in airflow_helper, removes fail_task_if_any_job_failed * removes setting of os.environ["LOAD__RAISE_ON_FAILED_JOBS"] = "true" and calls to raise_on_failed_jobs() * Removes redundant calls to raise_on_failed_jobs() in entire test suite. Refactors tests where necessary. * fixes default arg overwriting config value in load of Pipeline * fixes some test cases that started to abort * requests errors set to transient for databrics * fixes even more tests --------- Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>	2024-09-10 15:34:01 +02:00
dat-a-man	84a8e25579	Refined documentation with minor improvements (#1760 ) * small improvements * Updated lancedb title	2024-09-06 18:53:35 +02:00
novica	36c0d140ba	fix installation command" (#1741 )	2024-09-02 09:27:10 +02:00
rudolfix	935dc09efd	Feat/1711 create with not exists dlt tables (#1740 ) * uses normalized column names when linking tables in relational * destination cap if create table if not exits supported * generates IF NOT EXISTS for dlt tables * adds logging for terminal and retry exception in run_managed of load job * passes schema update to be collected in trace in filesystem * fixes job log exception message	2024-08-27 00:20:06 +02:00
dat-a-man	6f7591e2d7	Add custom parent-child relationships example (#1678 )	2024-08-23 16:33:48 +02:00
VioletM	a0e2996c48	Restructure credentials docs (#1508 ) Co-authored-by: akelad <akela@dlthub.com>	2024-08-08 10:50:22 +02:00
David Scharf	7676e4cec6	prevent accidental wrapping of sources in resources when using adapters (#1645 ) * prevent accidental wrapping of sources in resources when using adapters * fix typo * fix qdrant zendesk example * another fix to the qdrant example * rename utils function add support for source with single resource add tests * add logger warning when setting default name for resource * only use selected resources in get_resource_for_adapter * switch to value error	2024-07-30 17:50:37 +02:00
rudolfix	9823773e70	Feat/1596 adds custom docs config provider (#1642 ) * initial decoupling of config generation from toml writer * keeps pure Python object in docs config provider, adds yaml and json support to vault providers, refactors set_value in formet TomlBaseProvider * adds a method to register config providers to config accessor * adds example for yaml loader custom config provider * implements config provider with user supplied loader function * typos and small fixes * adds reference to example in docs * slightly improve docs * update one snippet --------- Co-authored-by: dave <shrps@posteo.net>	2024-07-29 16:27:30 +02:00
Steinthor Palsson	48c93f5864	Fix/qdrant tests in CI (#1526 ) * Run qdrant server in local tests * Add qdrant to test destination configs * Fix stringify UUID objects * Install qdrant deps * Fix qdrant image version * Disable httpx logging in tests * Add index and use order by for fetching state * Try qdrant local support * Fix qdrant load stored state * Disable parallel load in qdrant local * Test destination config for qdrant local and server * Fixes * qdrant example test * Missing module * Cleanup * resolves configuration to get full capabilities in load * uses embedded qdrant for zendesk example --------- Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>	2024-07-04 19:45:57 +02:00
rudolfix	6aedfdd379	removes deprecated credentials argument from Pipeline (#1537 ) * removes deprected credentials argument from Pipeline * fixes dependency in tests * fixes explicit creds tests dependencies	2024-07-04 15:52:48 +02:00
rudolfix	af7752725b	Feat/simplifies naming convention writing (#1523 ) * adds naming convention example * improves naming convention docs * simplifies naming convention classes and configurations, implements sql cs, adds tests * bumps to version 0.5.1a0 * linter fixes * format fixes	2024-06-27 18:55:01 +02:00
rudolfix	b76f8f4130	allows naming conventions to be changed (#998 ) * allows to decorate async function with dlt.source * adds pytest-async and updates pytest to 7.x * fixes forked teardown issue 7.x * bumps deps for py 3.12 * adds py 12 common tests * fixes typings after deps bump * bumps airflow, yanks duckdb to 0.9.2 * fixes tests * fixes pandas version * adds 3.12 duckdb dep * adds right hand pipe operator * fixes docker ci build * adds docs on async sources and resources * normalizes default hints and preferred types in schema * defines pipeline state table in utils, column normalization in simple regex * normalizes all identifiers used by relational normalizer, fixes other modules * fixes sql job client to use normalized identifiers in queries * runs state sync tests for lower and upper case naming conventions * fixes weaviate to use normalized identifiers in queries * partially fixes qdrant incorrect state and version retrieval queries * initial sql uppercase naming convention * adds native df readers to databricks and bigquery * adds casing identifier capability to support different casing in naming conventions, fixes how identifiers are normalized in destinations * cleans typing for relational normalizer * renames escape functions * destination capabilities for case fold and case sensitivity * drops supports naming module and allows naming to be instance in config and schema * checks all tables in information schema in one go, observes case folding and sensitivity in sql destinations * moves schema verification to destination utils * adds method to remove processing hints from schema, helper functions for schema settings, refactor, tests * accepts naming convention instances when resolving configs * fixes the cloning of schema in decorator, removes processing hints * removes processing hints when saving imported schema * adds docs on naming conventions, removes technical docs * adds casing info to databrick caps, makes caps an instance attr * adjusts destination casing in caps from schema naming and config * raises detailed schema identifier clash exceptions * adds is_case_sensitive and name to NamingConvention * adds sanity check if _dlt prefix is preserved * finds genric types in non generic classes deriving from generic * uses casefold INSERT VALUES job column names * adds a method make_qualified_table_name_path that calculates components of fully qualified table name and uses it to query INFO SCHEMA * adds casing info to destinations, caps as instance attrs, custom table name paths * adds naming convention to restore state tests, make them essential * fixes table builder tests * removes processing hints when exporting schema to import folder, warns on schema import overriding local schema, warns on processing hints present * allows to subclass INFO SCHEMA query generation and uses specialized big query override * uses correct schema escaping function in sql jobs * passes pipeline state to package state via extract * fixes optional normalizers module * excludes version_hash from pipeline state SELECT * passes pipeline state to package state pt.2 * re-enables sentry tests * bumps qdrant client, makes test running for local version * makes weaviate running * uses schemata to find databases on athena * uses api get_table for hidden dataset on bigquery to reflect schemas, support case insensitive datasets * adds naming conventions to two restore state tests * fixes escape identifiers to column escape * fix conflicts in docs * adjusts capabilities in capabilities() method, uses config and naming optionally * allows to add props to classes without vectorizer in weaviate * moves caps function into factories, cleansup adapters and custom destination * sentry_dsn * adds basic destination reference tests * fixes table builder tests * fix deps and docs * fixes more tests * case sensitivity docs stubs * fixes drop_pipeline fixture * improves partial config generation for capabilities * adds snowflake csv support * creates separate csv tests * allows to import files into extract storage, adds import file writer and spec * handles ImportFileMeta in extractor * adds import file item normalizer and router to normalize * supports csv format config for snowflake * removes realpath wherever possible and adds fast make_full_path to FileStorage * adds additional methods to load_package storage to make listings faster * adds file_format to dlt.resource, uses preferred file format for dlt state table * docs for importing files, file_format * code improvements and tests * docs hard links note * moves loader parallelism test to pipeliens, solves duckdb ci test error issue * fixes tests * moves drop_pipeline fixture level up * drops default naming convention from caps so naming in saved schema persists, allows (section, <schema_name>, schema) config section for schema settings * unifies all representations of pipeline state * tries to decompress text file first in fs_client * tests get stored state in test_job_client * removes credentials from dlt.attach, addes destination and staging factories * cleans up env variables and pipeline dropping fixutere precedence * removes dev_mode from dlt.attach * adds missing arguments to filesystem factory * fixes tests * updates destination and naming convention docs * removes is_case_sensitive from naming convention initializer * simplifies with_file_import mark * adds case sensitivity tests * uses dev_mode everywhere * improves csv docs * fixes encodings in fsspec * improves naming convention docs * fixes tests and renames clash to collision * fixes getting original bases from instance	2024-06-26 23:08:09 +02:00
Marcel Coetzee	6b83ceec9d	Add LanceDB custom destination example code (#1323 ) * Add LanceDB custom destination example code Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Format Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Remove Postgres credentials from example.secrets.toml Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Format Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Add typing Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Refactor code documentation and add type ignore comments Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Ignore checks Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * wrap in main if statement Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Add lancedb to install dependencies in test_doc_snippets workflow Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * poetry Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Update deps Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Update LanceDB version and replace Sentence-Transformers with OpenAIEmbeddings Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Poetry lock Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Format Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Update versions Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Replace OpenAI with Cohere in LanceDB custom destination example Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Format Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Add error handling to custom destination lanceDB example Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Lift config to secrets/config Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Ignore example lancedb local dir Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Why was this uncommented Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Remove unnecessary lock Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Cleanup Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Remove print statements from custom_destination_lancedb.py Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Print info Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Print info Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Use rest_client Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * noqa Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * Remove `cohere` dependency and add `embeddings` extra to `lancedb` Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> * changing secrets path for cohere to pass docs tests * fixes lock file * moves get lancedb path to run within the test * fix dependencies * fix linting * fix lancedb deps * update lock file * change source name * moved client_id to secrets * switch lancedb example to openai and small fixes * small fixes * add openai to docs deps * fix grammar gpt typing --------- Signed-off-by: Marcel Coetzee <marcel@mooncoon.com> Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org> Co-authored-by: rahuljo <rahuljoshi8227@gmail.com> Co-authored-by: Dave <shrps@posteo.net> Co-authored-by: Alena <alena@dlthub.com>	2024-06-24 15:38:35 +02:00
Steinthor Palsson	d4340d830c	Fix databricks pandas error (#1443 ) * update dependencies for databricks/dbt * use kwargs if args not defined, fix typing * Revert to use inline params to keep support for 13.x cluster * Typing fix * adds dbt support for mssql * converts dbt deps from extra to group, allows databricks client >2.9.3 * fixes dict to env util * limits dbt version to <1.8 in destination tests * skips chess dbt package for mssql --------- Co-authored-by: Oon Tong Tan <oony_oontong@hotmail.com> Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>	2024-06-11 16:30:58 +02:00
David Scharf	a9021fe8de	Fix streamlit bug on chess example (#1425 ) * fix error on missing nullable hint * remove unneeded function (and unrelated formatting :) )	2024-06-11 15:35:06 +02:00

1 2 3

129 Commits