repo-mirrors/dlt - dlt - Gitea: Git with a cup of tea

mirror of https://github.com/dlt-hub/dlt.git synced 2025-12-17 19:31:30 +00:00

Author	SHA1	Message	Date
rudolfix	06bc05848b	(chore) adds hub extra (#3428 ) * adds hub extra * makes hub module more user friendly when hub not installed * test and lint fixes * adds plugin version check util function * adds dlt-runtime to hub extra, minimal import tests * bumps to dlthub 0.20.0 alpha * lists pipelines with cli using the same functions as dashboard, dlt pipeline will list pipelines by default * adds configured propfiles method on context so only profiles with configs or pipelines are listed * adds list of locations that contained actual configs to provider interface * improves workspace and profile commands * test fixes * fixes tests	2025-12-05 16:15:19 +01:00
David Scharf	4a5ffd82b3	Chore: Update docs npm dependencies and clean up docs build tooling (#3247 ) * bump npm deps * remove unneeded netlify redirects file * remove unneeded lockfile * remove another unneeded lockfile * post rebase lockfile update * remove old netlify command * create new docs tools project and move api docs gen there * tmp * add uv to build docs workflow * move docs pyproject * re-org docs pcakage and move snippet linter * move notebook linting commands and deps to tools folder add flake8 to tools linting * remove unneeded files * fix linting and formatting errors * remove wrong file * move docs processing script to new package * fix gen api ref * clean up package json and use commands from parent makefile * update build website workflow * move linting to docs makefile partially * fix python version for docs project * consolidate docs commands in docs makefile * fix docs linter * fully update docs test flow * fixes some linting and dependency problems * fix constants * move notebook formatting to docs project * fix lint embedded snippets * fix examples tests * add missing dependencies * fix snippet linting * add missing lint dependencies to core and missing test dependencies to docs * add missing weaviate * add missing regex module * add forked dependency and updates readme file * revert accidental change to example * fix main linter * * Move relevant pytest options to subproject * Remove shims / path inserts that are now managed by pytest options * Some typing fixes * Clean up base project pytest ini * Enable transformation snippets tests * remove unneeded raw import of intro snippets * downgrade alive progress * uses dlt logger which also fixes internal alive error * enables transformation snippets linting * fixes dashboard races again --------- Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>	2025-11-16 18:01:30 +01:00
Menna	4d25a6c5b5	feat/3198-add-workspace-info-and-profile-selection Added a dropdown for profile selection in the dashboard interface and updated the layout to display profile and workspace information inline with pipeline selection.	2025-11-14 18:44:45 +01:00
rudolfix	192296f4f8	fixes git import and enables tests (#3262 ) * enable hub tests * removes erroneous git import * enables tests with importing dlt into minimal alpine container * imports workspace modules on demand * bumps dlt to version 1.18.1 * fixes mssql hub test on mac * review fixes	2025-10-29 21:32:07 +01:00
Violetta Mishechkina	38b0dec5a1	Add dlthub intro docs (#3241 ) * Add dlthub intro * Update with comments	2025-10-27 16:23:37 +01:00
rudolfix	91dc3d955f	avoids passing naming conventions as modules (#3229 ) * adds /home/rudolfix/src/dlt to sys.path when running dlt commands and a cli flag to disable it * adds cli docs check to lint * avoids passing custom naming as modules in docs * removes cli docs check due to Python 3.9 * fixes deploy cli * adds pokemon table count consts * improves custom naming convention docs	2025-10-23 13:45:06 +02:00
rudolfix	fe567414dc	chore/moves cli to `_workspace` module (#3215 ) * adds selective required context, checks profile support in switch_profile * creates and tests hub module * adds plugin version to telemetry * renames imports in docs * renames ci workflows * fixes lint * tests deploy command on duckdb * moves cli module to workspace * moves cli tests to workspace module * renames fixtures, rewrites fixture to patch run context to _storage * allows to patch global dir in workspace context * when finding git repo, does not look up if GIT_CEILING_DIRECTORIES is set * imports git utils only when need to clone package in dbt runner * runs workspace tests as part of common * fixes tests, config tests sideeffects * moves dashboards to workspace * fixes pipeline trace test * moves dashboard helper tests * excludes additional secret files and pinned profile from gitignore * cleansup hatchling files in pyproject * fixes dashboard running tests in ci * moves git module to libs * diff fix * fixes fixture names	2025-10-19 15:21:42 +02:00
Thierry Jean	6f1b0b979a	feat: unify `dlt.Relation` API and create bound Ibis tables (#3179 ) * port code from PR#2498 * added .to_ibis() * fix override; linting; fix transpiling * narrowed var type * add docstrings to tests * remove pytest marks * wrap _safe_raw_sql() to handle closing connections * lint / format * 3.9 typing; use public dlt interface * revert back to simple yield without context manager * formatting * revert to return tuples * try to skip <3.9 * hack to skip test_ibis.py on <3.10 * closes ibis conn in ibis tests * bumps tokenizers library in lockfile * fixes to lint on windows * closes ibis conn properly --------- Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>	2025-10-18 09:44:05 +02:00
rudolfix	bc2706b63a	renames `dlt_plus` plugin to `dlthub` (#3192 ) * adds selective required context, checks profile support in switch_profile * creates and tests hub module * adds plugin version to telemetry * renames imports in docs * renames ci workflows * fixes lint	2025-10-14 11:47:27 +02:00
rudolfix	b062dcafa4	docs/removes dlt plus docs and adds eula (#3079 ) * answers defaults in cli if tty disconnected * adds method to send anon tracker event even if disabled * fixes types in source/resource build in generator * adds dlt.hub with transformation decorator * moves dlt-plus to separate sidebar in docs, renames to dltHub Features, adds EULA * renamed plus to hub in docs * fixes docs logos * removes more dlt+ * renames plus tests * fixes ci run main * fixes hub workflows	2025-09-21 00:15:08 +02:00
David Scharf	d143c29e35	Improve pipeline dashboard test coverage (#3091 ) * disable most tests * try correct windows command for runnig marimo e2e tests * try without timeout * test only launch marimo * bump python version * try install playwright deps * fix e2e tests for dashboard on windows * enable e2e tests for dashboard * test macos 14 for dashboard e2e tests * add basic tests for ui elements * improve ui elements tests * revert changes to main github workflow * review fixes --------- Co-authored-by: Your Name <you@example.com>	2025-09-17 19:58:18 +02:00
David Scharf	431c6b6f48	add -s flag to read command in publish-library command in Makefile (#3089 )	2025-09-16 13:30:03 +02:00
David Scharf	5d29c0ded0	Dashboard updates and fixes (#3055 ) * fix bug in child tables data browsing * fixes streamlit launch, prevents streamlit launch after marimo launch * disables trace json serialization * removes streamlit hot reload cli flag * fix smaller bugs and start adding parametrized tests to pipeline utils * update cli docs * parametrize utils tests with different pipeline types and states * start fixing e2e tests * change filesystem bucket url * move example pipelines into separate folder * extracts more helpers into utils improves error handling and messaging * add more tests and move sql query under utils exception wrapper * final fixes to e2e test and add no destination pipeline to unit tests * render mo tables in unit tests for applicable helper functions use mo.json object view for state in all cases instead of yaml * allow map_nested_in_place to also process keys use this in trace sanitizing use repr to keep nested hint keys and show a good string representation add test case that makes sure traces of nested hints can be rendered * update e2e tests to respect new json view of state * remove cloning of dict from map_nested_in_place * remove streamlit mentions and add marimo references in appropriate places * update dashboard page and insert some images * separate mapping function for nested keys and values * update dashboard utils to new mapping function * post merge fixes * add dlt+ fix for backwards compatibility --------- Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>	2025-09-09 16:01:02 +02:00
anuunchin	096d769828	Docs: Education notebooks formatted and linted (#3017 ) * Formated and linted ed content * Notebook filenames lowercased, no special chars	2025-09-02 08:41:47 +02:00
Thierry Jean	0d90a83b8d	repo: add `ruff check` for linting (#2967 ) * Config ruff `check` * Add `ruff` to existing `flake8` linting for transition period	2025-08-29 11:13:26 -04:00
David Scharf	b75e4aa721	Dashboard Improvements (#2965 ) * remove uneeded file * fix forwarding of pipelines dir to marimo app * disable state sync and display all schemas and remote state and schemas in pipeline overview * add support for multiple schemas * fix e2e tests, further updates pending * use dropdown instead of multiselect for schema selection add multi schema pipeline to fixtures * add last run info in pipeline overview add buttons to open pipeline folder and local data folder if present * fix loads browser to select correct schema * allow to start dashboard for a pipeline that is not there yet and add helpful error message in this case * nicer last run time formatting show pipeline error screen also when manually chnaing the pipeline name in the url * move buttons to top, add refresh buttons to sections * use raw query when constructing queries * lazy load remote state tab * fix traces and trace typing (mostly) * add exception traces to ui * add file watcher * remove test code * add source and resource state viewer to data panel * update existing unit tests * add unit test for new utils * make marimo dashboard the default app for pipeline show * update docs * update existing e2e tests for new yaml based rendering of state * move streamlit app down in sidebar * grammar fixes for dashboard strings * open duckdb in readme mode in datapanel in dashboard * remove old tests re-enable dashboard main command * add missing args to dashboard command * small fixes to e2e tests * add tests for exceptions * re-organize e2e tests into invidual tests * add basic schema selection checks * improve dashboard help and dashboard docs page * short some strings in testing to make selecting predictable * merge devel * typo --------- Co-authored-by: djudjuu <djudju@proton.me>	2025-08-15 16:56:52 +02:00
David Scharf	ef92ffcd77	Refactor transformations (#2970 ) * remove transformation code and tests that now live in dlt_plus * move lineage code and tests into dataset folder scope * start fixing model item format tests * revert model item format tests back to version before last big change (with some updates) * disable transformations snippets linting and testing for now * remove uneeded test	2025-08-06 15:28:29 +02:00
rudolfix	edad825a59	2946 sqlalchemy destination fixes (#2951 ) * fixes sqlalchemy destination to work with mssql * do not generate ; in merge jobs * fixes engine version type * demonstrates plugging TypeMapper into sqlalchemy destination * excludes temp files from snippet linting * adds precision to _dlt_load_id and _dlt_id columns * adds json field support for mssql * fallback for alembic migrations when dialect not supported ie trino * normalizes use of ; to separate queries * adds type mappers for mysql, mssql and trino * fixes type mapper import * updates destination caps from explicit destination params at the end to overwrite adjustments * normalizes ; usage, forward trackebacks when handling database exception * fixes sqlalchemy merge eq condition and tests * fixes clickhouse temporary table engine * synth unpickle synthesizes on any error * fixes duckdb with table scanners accessing self.execute... in open_connection * fixes synpase json column fallback for index * moves adding _dlt_load_id to arrow table after it is merged and normalized * fixes more tests * moves _dlt_load_id add in arrow extractor after normalization, before table merrging * tests run context plug passthrough * fixes BIGQUERY numeric creation * fixes databricks PRIMARY KEY injection in tests	2025-08-04 16:59:45 +02:00
David Scharf	21b68e61f1	Add workspace extra and rename marimo app to "pipeline dashboard" (#2876 ) * adds dlt workspace extra, updates exception and github workflows * renames app from "marimo app" to "pipeline dashboard" updates --marimo flag to --dashboard * rename studio folders to dashboard * removes all other references to studio * exclude lockfile and markdown files from lfs * update workspace extra dependency versions * bump version	2025-07-14 21:26:50 +02:00
rudolfix	c0d41d97da	cleans dist folder before publish (#2811 )	2025-06-25 16:25:03 +02:00
David Scharf	3ba504c65d	marimo app updates (#2778 ) * make dlt app ejectable * update app file url in makefile and tests add missing stylesheet to package * start marimo app in process * convert caching toggle to button for clearer use * exlcude incomplete columns * adds a bunch of tests for marimo app utils * make normalized query output pretty and disable tests on 3.9 * filter out incomplete tables * update cli strings and small changes to app ejection	2025-06-25 13:49:56 +02:00
David Scharf	9ff0dd254a	add hidden input for pypi token (#2804 )	2025-06-24 12:12:21 +02:00
David Scharf	5245a42536	run all common tests with --resolution lowest-direct on uv sync (#2787 ) * run all common tests with resolution-lowest on sync * make model item normalizer tests pass, disable on time test for now * fix duckdb instantiation for old versions bump pyarrow to have version that supports "append_column" on recordbatch exclude deltalake tests for too low pyarrow versions * fixes errors in makefile bump minimum pytest version to what was in lockfile * bump pendulum min requirement * fix common test file * bump ibis dependency * go back to old version of pendulum bump to prerelease	2025-06-23 21:30:58 +02:00
rudolfix	af94e584ac	(feat) allows to add SQL statements to schema migration executed after tables were created/altered (#2791 )	2025-06-22 13:39:36 +03:00
David Scharf	3ebbfa1f9e	migrate to uv (#2766 ) * move pyproject.toml and makefile from old branch and add inbetween changes * update workflow files to use uv * run new version of formatter * fix building of images with uv * possibly fix docs linting * downgrade lancedb dependency to fix tests * fix gcs compat mode for s3 for newest boto * fix docstrings in examples * add some uv constraints * update readme.md and contributing.md and some other places * allow duckdb 0.8 in range * add link-mode copy to uv venv on windows * remove poetry lockfile and unneeded lockfile checker * fix chess api related failures * sleep after dremio start.. * set correct package in pyproject * Revert "add some uv constraints" This reverts commit `d611e9ecce`. # Conflicts: # pyproject.toml # uv.lock * add missing databricks sql connector version bounds	2025-06-19 10:11:24 +02:00
rudolfix	f821d21165	fixes leaking datasets tests (#2730 ) * adds optional pipeline activation history to context * allows to configure configs and pragmas for duckdb, improves sql_client, tests * allows query string for motherduck, tests WIP * mocks local_dir correctly to place local files, drop duckdb in pipeline fixture in most places * enables activation factory to drop datasets from all pipelines * uses correct fixture scope in test read interfaces * bumps duckdb and pyarrow * ignores some flake8 errors * logs resolved traces thread-wise, clears log between pipeline runs * improves duckdb tests and docs * bumps arrow to v20 because duckdb 1.3 needs at least 19 for its types * fixes tests - mostly duckdb database locations * fixes lockfile * fixes edge cases when passing setting to duckdb connection * disables iceberg abfss tests * refactors WithLocalFiles so they can be used independent from destination * more local dir test fixes * moves WithLocalFiles to common storages configuration * tests edge cases when setting configs on duckdb fails * updates docs * reverts duckdb to 1.2.1 - last stable version * more test fixes * moves create_secret to duckdb sqlclient * disables building of Dockerfile until we upgrade arrow * skip gcs compat test for local clickhouse tests --------- Co-authored-by: dave <shrps@posteo.net>	2025-06-11 22:17:05 +02:00
David Scharf	36ee706122	Update github workflow setup (#2728 ) * use both pull request and pull request target on destination workflows * remove additional triggers * marks one test as smoke test and only runs this for the time being * only run one test in common, needs to be reverted later * run common tests only on linter success * fix common workflow * only start workflows on call (do not call them yet) * test master workflow * remove docs changes step from lint * remove local destinations docs change * rename master trigger workflows * change concurrency key * try other dependencies * add destination tests with authorize step * remove authorize and docs step from destination tests * fix destination test * rename main workflow * test inherit secrets * add more workflows to main file * fix starting conditions for some workflows * rename plus tests matrix job * remove concurrency settings for now * add first remote destinations workflow version * move some more remote destinations * remove pytest args * try to fix extras string * add more remote destination tests * rename some workflows and add concurrency settings to main workflow * move test_destinations * fix link to called workflow * add better main workflow labels move clickhouse remote tests * create local destinations test * disabled some workflows * disable clickhouse oss for now split duckdb and postgres local tests into own matrix job * copy ssh agent key * move all local destination secrets into template secrets file * small fixes * enable all tests again * fix local tests * add missing openai dep * try to fix qdrant creds * fix qdrant server / local file differentiation * fix cli test * change workflow dependencies * remove telemetry info and other small changes * run dummy destination with the local tests * remove duckdb from remote tests, always run all mssql and postgres tests * enable clickhouse oss * fix condition for always running all tests * move cli commands to postgres tests * rename clickhouse-compose to be inline with other services * fix clickhouse local credentials and disable tests which require staging destinations * adapt postgres to postgres example to new fixture * fix clickhouse excluded configs * update essential test handling * skip gcs compat test for local clickhouse tests	2025-06-11 15:09:06 +02:00
David Scharf	9b392e9cab	remove marimo from dev deps (#2723 )	2025-06-06 13:18:38 +02:00
rudolfix	1dc29d7f01	adds parquet support to postgres via adbc (#2685 ) * adds parquet support to postgres via adbc * use selector to compute list of file formats in caps * adds docs on failing data types * adds direct test for all data types * fixes test	2025-06-06 00:02:40 +02:00
rudolfix	b472ab7168	[transformations] decouples sqlglot lineage and schema generation from destination identifiers (#2705 ) * uppercase env var * fix linting and marimo e2e tests * enables only x-annotation propagation, fixes lineage test to include clickhouse, sqlalchemy and clickhouse * computes sqlglot schema and lineage solely on dlt schema identifiers, disables any normalization and table name expansion * computes ibis unbound table solely on dlt schema identifiers, disables any normalization and table name expansion * makes ibis relation to work on dlt schema identifiers * decouples query generation from query normalization in base relation. query normalization will expand table names, qualify tables, case fold and quote * adds capability to check if nulls are enforced on alter * adds option to get table path without casefolding * rewrites how identifiers are normalized in sqlalchemy * makes test_read_interfaces work with all destinations without escaping, WIP * fixes how credentials are emitted by destination_config * fixes linting issues for marimo * revert name / type scoped destination configs * fix pii annotations hint * quote table names in row counts (will not work with table names with white spaces otherwise) * format * fix marimo app linting errors * normalize database name in sqlglot schema fix anoynmous column detection in lineage * disables one lineage test.. * fix dataset mismatch in query resolution the correct way * remove qualified table names from some selectors * fix a couple more tests * make normalizing of query for pure sql relations optional use normalized query in transformations * fix default of normalizing query cache sqlglot schema on dataset * move query normalization into utils, cache result and do not modify original qualified query * directly access normalized_query from relation * disable sqlglot schema cache on dataset * fix filesystem tests and disallow access of non-existent table * fix unrelated breakage in lancedb example * update tests that were using tables not in schema on datasets * fix snowflake tests, re-enable two disabled tests * fix last snowflake test --------- Co-authored-by: djudjuu <julius@dlthub.com> Co-authored-by: David Scharf <shrps@posteo.net>	2025-06-04 20:29:30 +02:00
David Scharf	5ceba48757	dlt.transformation implementation (#2528 ) * triggers devel tests * fixed malformed docstring * use native sqlglot type annotation * pass hints via SQLGlot metadata * fix linter errors and tests * fix a few more tests and edge cases * fix bug in lineage * enable columns schema for both ReadableRelation Types * add more tests and make lineage tests independent from loading * add lineage tests for all sql destinations * enable tests on ci and disable column schema for sqlalchemy for now * fix some more tests * add sqlalchemy hack * first fix for snowflake and some smaller chnages and clarifications * fix sqlglot schema creation, makes clickhouse work * re-add transformations tests folder * fix lineage datatype * disable databricks and synapse ibis backend tests * move transformation code from prototype excluding old lineage and including updates so that linter passes, no real code changes yet. * fix some of the python extractor based transformations * fix most tests * make basic transformation tests run on all destinations * enable all current transformation tests for all destinations run some duckdb transformations on all OSes * a little bit of cleanup * move common transactions and mark all destination transaction tests as essential for now * Add improvements from review in prototype PR and some cleanup * exclude dremio * fix some transformations tests * fix row_counts for snowflake and add some comments * converts SupportsReadableRelation to an ABC * add scalar access to SupportsReadableRelation * simplify transformation signature * add top level dlt objects and some small changes * second part of removing transformation extra args * add clickhouse tests * add config based transformation source * add better transformation examples * use fruitshop template for testing * remove custom row_counts method in favor of "global" test one * first draft of transformations doc * some work on the docs page * feat: 2540 lineage `allow_unknown_columns` and `allow_anonymous_columns` (#2577) * test compute_columns_schema() and exception handling * convert transformation code examples to snippets * finish first round of transformation docs * Quite a few PR fixes * fixes some tests * add support and docs for dataframe and arrow operations * add config and fallback if destination not reachable * fix scalar method fallback to models if pipeline destination is not available * hopefully fix one test * Docs: addition of normalizer behaviour to transformations docs (#2639) * Normalizer info added * Unnecessary paragraph removed, regular normalization linked * feat: 2540 - SQLGlot type mapping (#2587) * fixes some tests * post rebase cleanup * renamed kwarg * type handling done; WIP * sqlglot-dlt type mapping completed * added docstrings to tests * removed unused test file * attach metadata to DataType * refactored test to parameterized form * refactor function names * bug fix .to_py() * rename compute_columns_schema() kwargs * refactor type conversion branches * fixes some tests * add support and docs for dataframe and arrow operations * add config and fallback if destination not reachable * fix scalar method fallback to models if pipeline destination is not available * fix: update return type in athena_adapter docstring to reflect correct destination (#2599) * list secrets in vault config provider to avoid calls to backend (#2597) * fixes bug where configuration section was not propagated when embedded configuration is resolved * splits vault provider settings per vault type * adds option to list secrets to vault and google secrets provider * uses google secrets provider with global cache for tests * documents vault provider * test and docs fixes * slightly clarify clickhouse docs (#2594) * slightly clarify clickhouse docs * Update clickhouse.md * Extract dataset code snippets into tests snippets system (#2598) * extracts dataset code blocks into tested snippets and uses fruitshop pipeline as base dataset for demonstration purposes * add ibis group * Enabling 'model' loader_file_format for athena, synapse and dremio (#2556) * Athena model loader format initial support * test_verify_capabilities_data_types adjusted for athena * Synapse enabled * The offset logic for tsql made unreachable * Athena test config without iceberg removed, dremio added * Unnecessary synapse workaround removed * fix some typos in cursor-restapi docs (#2608) * fix some typos in cursor-restapi docs * fix typo * refactor init-command for use in dlt project (#2568) * refactor init-command for use in dlt project * remove config.toml from project docs * fix ibis mypy error --------- Co-authored-by: dave <shrps@posteo.net> * docs: Fix incorrect nesting in secrets.toml (#2614) * fixes parquet data writer settings docs & rewrites configuration docs (#2583) * fixes parquet data writer settings docs * adds section to dlt resource decorator * fixes and tests how config sections are created when single resource is extracted * fixes config sections for parallel doc example * exports postgres adapter * rewrites configuration docs, moves a few docs sections in sidebar * snippet fixes * accepts docs changes from review Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com> * adds tip how to eject core source * linter fixes --------- Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com> * enables fsspec per-thread instance cache and updates documentation (#2621) * bumps pendulum and docs (#2624) * fixes sql database docstrings and docs * bumps poetry to 3.0.1 and drop dlt poetry * Added dedup sort example (#2235) * Added dedup sort example * Updated formatting * Updated * Updated * Update docs/website/docs/general-usage/incremental-loading.md --------- Co-authored-by: Alena Astrakhantseva <alena@dlthub.com> Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org> * Docs: add advanced project tutorial (#2338) * hopefully fix one test * trigger ci * improve tests, lint --------- Co-authored-by: David Scharf <shrps@posteo.net> Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com> Co-authored-by: rudolfix <rudolfix@rudolfix.org> Co-authored-by: anuunchin <88698977+anuunchin@users.noreply.github.com> Co-authored-by: hsm207 <hsm207@users.noreply.github.com> Co-authored-by: djudjuu <djudju@proton.me> Co-authored-by: Alexander Grueneberg <com.github@agrueneberg.info> Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com> Co-authored-by: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Co-authored-by: Alena Astrakhantseva <alena@dlthub.com> * qualify all queries that come into the transformations * fix lineage for snowflake and clickhouse lineage * apply schema fix for sqlglot and remove special treatment of snowflake * align datasets interfaces with ibis implementation ["col"] selects column and not table with one column * disable incremental on transformations decorator and warn if incremental args are discovered * fixes one more test * fixes snowflake tests after sqlglot schema fix * removes standalone resources, fixes transformation function wrapping (#2684) * changes contrib and README (#2666) * changes contrib and README * Apply suggestions from code review Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com> --------- Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com> * raises if resolving dataclass without configspec * adds function type inspect that follows wrappers * removes make fun, uses wraps * adds conftest to transformations * (1) fixes tranformation overloads (2) passes TransformationConfiguration as base spec so buffer is always injected (3) wraps tranformation_function (4) makes str SQL a model (5) tests configurations and parametrized transformations * (1) removes resources returning resources (2) allows resources to be also functions (3) allows base spec to be passed to resource function (4) makes DltResource and SourceFactory to wrap decorated function and fixes signatures (5) allows inner resources to be injectable, warns for transformers (6) normalizes and tests how functions are wrapped and unwrapped so signatures and configs are available * normalizes config resolve behavior: default values can be overriden from providers but explicit cannot. if those were instances of base configurations, behavior was inconsistent (explicit values were treated like defaults). also if native value is found for a config and it does not accept native values, config resolution will fail, previously it was ignored * do not use config specs cached in module when creating autospecs * fixes venv tests when uv is present * if incremental parses from another incremental as native value, it copies origina type correctly * merges standalone resources with regular resources: (1) all are DltResources (2) we generate the correct types for __call__! (3) all resources can be configured including inner resources and including default params, previously only standalone could. that unifies behavior for resources and sources re. config injection (4) resources can return another resources if have DltResource in type annotation (5) resources can be renamed with lambda names also sections can be renamed * fixes transformation decorators so they generate correct typing * binds params to resource function instead of using defaults to avoid generating config injection in rest_api * removes remaining full_refresh flags * fixes Makefile commands to run common and local destination tests * fixes xdg home test * fixes venv tests for uv * linter and docsstring fixes --------- Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com> * allows for initial values that are configurations also in case no native initial values are supported * fixes docs linting * Outer select quotes columns (#2694) * fix normalizer tests * fix a few small tests * remove dependency on ibis for common tests (not supported on python 3.13) * fixes for python 3.9 * fix sqlglot schema propagation and retrieval * fixes leaking sqlalchemy credentials into other test * skip not materialized columns in sqlglot schema generation --------- Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org> Co-authored-by: zilto <zilto@github.com> Co-authored-by: Thierry Jean <68975210+zilto@users.noreply.github.com> Co-authored-by: anuunchin <88698977+anuunchin@users.noreply.github.com> Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com> Co-authored-by: hsm207 <hsm207@users.noreply.github.com> Co-authored-by: djudjuu <djudju@proton.me> Co-authored-by: Alexander Grueneberg <com.github@agrueneberg.info> Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com> Co-authored-by: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Co-authored-by: Alena Astrakhantseva <alena@dlthub.com>	2025-05-30 17:42:28 +02:00
David Scharf	7eb4570f8e	dlt marimo app pre-release version (#2662 ) * start marimo app * some more work * a few small additional changes * move marimo to dlt helpers and some small changes * a bunch of improvements * ui improvments and start fixing types * clean up imports and make app more typesafe * nicer tables * start data page with row counts * first version of query explorer * make db browser nicer and dataset faster * add pipeline quickstart links add query cache and fast query execution * add studio extra * add first very simple test * add studio command * add more first tests * fix dropdown * rename helpers to utils fix linter * incomplete work on e2e tests * tmp * move e2e tests * add tests to common file * fallback when getting pipelines * add poetry context to marimo start command * fix folder * add basic page checking for all e2e test pipelines * small change * add python caching (marimo caching does not work properly) and make dlt_pipeline a top level object * start adding load info tab * add ibis to e2e dependencies * add loads page and data browser query history * update basic e2e tests * basic grammar fixes * start adding trace view * clean up imports * start reworking tabs / switches * finish conversion into grid friendly version * fix types * clean up strings and cell names * a bit of styling * make schema page one cell * some style updates * changes to schema browser * stg * some text improvements * fix unit tests * fixes tests * fix load id based row counting * small css improvements * add more info to trace section * fix tests and small changes to trace page * small string change * fix warnings in edit mode * extract all strings * fix strings * comments and some formatting * remove incorrect info * add config and make tests work again * us string refs in e2e tests * update test file * add better timestamp rendering for loads and update tests * fix rest api tests * disable marimo tests on python 3.13 * use marimo state for some caching * slightly re-organize utils * add generated version of utils tests * exclude python 3.9 for marimo e2e tests * run e2e tests headless * disable marimo e2e tests on windows * remove marimo extra and create dependency groups for marimo and streamlit * add marimo dependencies to linter (cherry picked from commit e4235a981ee2d79d1e51cb7728b551acad562e3b) * streamlit should be present for linting * re-enable relevant fixtures for e2e tests remove unused imports * move marimo tests first for debugging purposes * print html from test to see what is going on * another test * do not set duckdb credentials and move marimo tests back to end * fix marimo app dependencies	2025-05-30 17:12:58 +02:00
David Scharf	a3534b392d	Simplify pipeline test utils (#2566 ) * remove some duplicate test utils * use dataset to get table counts * add exception for sftp but use dataset otherwise for loading table counts and contents * update checking of empty tables in filesystem tests * support filesystemsqlclient for tables that have prefixes rather than folders * fix table location resolution for internal tables * make sftp check raise same errors as filesystemsqlclient * more cleanup * fix replace disposition tests * simplify table count code in many places * small cleanup * fix tables to dicts function * disable databricks and synapse ibis backend tests (cherry picked from commit `aba8de4706`) * simplify table assertions * add tests for tests :) * fix two tests * fix dbt tests * makes open table locations to work in windows fs * review comments * adds docstrings plus linting to pipeline utils * fix docstring linting on utils class * bump adlfs in lockfile * test loading abfss first * test getting tables one by one for azure * fix resolving of sql_client * change folder detection * add comment for abfss fix fix iceberg * move abfss fallback into utils method * normalizes trailing separator in paths in filesystem * fixes two tests * fix glob resolution for tables that have nested folders * removes globs from duckdb filesystem sql client, adds tests for edge cases * disables globbing for iceberg, adds optional autorefresh flag for view, fixes tests and docs --------- Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>	2025-05-30 13:18:45 +02:00
rudolfix	0785e7e487	removes airflow DummyOperator import (#2628 ) Some checks failed docs \| deploy docs / deploy (push) Failing after 2s	2025-05-13 17:24:51 +02:00
David Scharf	45a43d9f7d	add basic docstring linting (#2520 ) * first version of using pydoclint * set up linting only for some files * first round of fixing important public interfaces * fix destination factory docstrings add missing dep for docs * add missing pipeline classes to linting * small tweaks to docs rendering * try to fix CI errors * fix lockfile * trigger ci * post merge lockfile fix * add docstrings to dataset protocols * small changes and revert PR target * Update dlt/common/destination/dataset.py --------- Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>	2025-04-29 21:10:24 +02:00
Marcin Rudolf	10cbc27dc8	adds ibis as dependency group	2025-04-22 16:44:59 +02:00
rudolfix	e677b33266	renames tmp_dir to local_dir (#2305 ) * bumps to prerelease 1.6.2a1 * adds option to import missing modules to Reference.find * adds commonly used props to dataset * renames tmp_dir to local_dir * enables bandit on lll * fixes push and pop run context * allows earlier dlt version when preparing dbt venv * converts pyiceberg local paths to file urls * more deps and iceberg fixes * adds pip to venv only when pip used * normalizes pyiceberg paths for windows in all cases * allows to get current venv for dbt get_venv	2025-02-17 13:49:55 +01:00
rudolfix	1a5d7740f6	adds common tmp dir and ref importer (#2276 ) * calls on_resolved etc. on reverse mro * adds tmp_dir to run context, defaults to cwd or DLT_TMP_DIR * does not set destination_name to destination_type when resolving DestinationClientConfiguration * adds common ref importer with traces, missing dependency importer and applies to destination and source refs * tests tmp_dir in plugin * removes package check for mypy * makes all destination using local files to follow tmp_dir, unifies how local files are named, uses destination name to name databases * allows for callable destination attr for ref * allows for () -> Destination in import typechecker * allows for explicit and necessary prereleases if uv is used for venv * tmp dir does not depend on PROJECT_DIR env * converts query prop into method in relation.py * bumps sqlglot in lock to allow for ibis 10 * corrects destination name in pipeline state to represent configured name * improves some tests * adds warning when duckdb catalog is identical to dataset name * bumps lock file to get dev env on Py 3.12 * fixes tests * adds string encoding option to postgres destination, set to utf-8 in redshift * shifts tests to pymysql * warns if dataset name is normalized and changed * disables r2 delta login test due to bug in delta-rs * adjusts max identifier length in naming convention for dynamic destination caps * fixes other tests * allows to run ibis 10 with redshift	2025-02-11 01:49:31 +01:00
David Scharf	c0735acbe4	Nicer cli help output and generated cli reference (#2232 ) * make argparse output nicer * small update for auto docs generation * fix invoke test * add docs command for autogenerating cli docs * add generated cli page to docs * add make commands and checks for outdated docs * adds nicer listing of args * only check docs output on py 3.11 * update rest api pokemon tests * add some developer notes and make debugging more convenient * add anchor links to subcommands * adds inheritance information for each command * add link to cli docs to default help. * update sidebar layout * Update docs/website/docs/reference/command-line-interface-generated.md Co-authored-by: Akela Drissner-Schmid <32450038+akelad@users.noreply.github.com> * Update docs/website/docs/reference/command-line-interface.md Co-authored-by: Akela Drissner-Schmid <32450038+akelad@users.noreply.github.com> * merge generated and old cli page * add additional warning * re-order help and description * fix linting * put arguments and options into collapsible * remove dlt+ mention * post merge lock --------- Co-authored-by: Akela Drissner-Schmid <32450038+akelad@users.noreply.github.com>	2025-01-30 12:58:38 +01:00
rudolfix	8d8b4c3bad	typed entity registries (#2236 ) * fixes inserted_at to datetime * adds pipeline configuration into (pipelines, name) section * registers custom destinations via synthesized types * adds destination registry, adds autoimport to registries, refactors common destination files * sets destination_name to callable name for custom destination * fixes sources registry fixture * bumps dlt to 1.6.0a0 * adds global dir path * adds plus info to anon tracker * checks if dlt can be imported in alpine container * adds top level module to run context * stores top level plugin modules to resolve shorthand references * adds destination references with shothand expansion * adds preferred table formats to destination caps * uses types as reference to sources, creates DltResource instance in from_reference * tests plugin disovery with references * tests plus plugin telemetry * converts dict arrow types before sending to delta or iceberg * updates deps, incl duckdb * adds plug/unplug callbacks for run context * improves reading snapshots on iceberg * fixes tests and deps * allows to push and pop context on stack, fixes some tests * plugs and unplugs content on reload * always refresh views on abfss + sql client filesystem * fixes some tests	2025-01-29 11:08:27 +01:00
David Scharf	cbcff925ba	drop python 3.8, enable python 3.13, and enable full linting for 3.12 (#2194 ) * add python 3.12 linting * update locked versions to make project installable on py 3.12 * update flake8 * downgrade poetry for all tests relying on python3.8 * drop python 3.8 * enable python3.13 * copy test updates from python3.13 branch * update locked sentry version * pin poetry to 1.8.5 * install ibis outside of poetry * rename to workflows for consistency * switch to published alpha version of dlt-pendulum for python 3.13 * fix images * add note to readme	2025-01-12 16:40:41 +01:00
rudolfix	95d6063961	Fix/refresh standalone resources (#2140 ) * drops tables from schema and relational * documents custom sections for sql_database and source rename * clones schema without data tables when resources without source are extacted, adds tests * skips airflow tests if not installed * adds doc on setting up FUSE on bucket * adds doc on setting up FUSE on bucket * adds row key propagation for table when its nested table require it * fixes tests	2024-12-15 16:49:14 +01:00
Jorrit Sandbrink	4e5a2405e2	`iceberg` table format support for `filesystem` destination (#2067 ) * add pyiceberg dependency and upgrade mypy - mypy upgrade needed to solve this issue: https://github.com/apache/iceberg-python/issues/768 - uses <1.13.0 requirement on mypy because 1.13.0 gives error - new lint errors arising due to version upgrade are simply ignored * extend pyiceberg dependencies * remove redundant delta annotation * add basic local filesystem iceberg support * add active table format setting * disable merge tests for iceberg table format * restore non-redundant extra info * refactor to in-memory iceberg catalog * add s3 support for iceberg table format * add schema evolution support for iceberg table format * extract _register_table function * add partition support for iceberg table format * update docstring * enable child table test for iceberg table format * enable empty source test for iceberg table format * make iceberg catalog namespace configurable and default to dataset name * add optional typing * fix typo * improve typing * extract logic into dedicated function * add iceberg read support to filesystem sql client * remove unused import * add todo * extract logic into separate functions * add azure support for iceberg table format * generalize delta table format tests * enable get tables function test for iceberg table format * remove ignores * undo table directory management change * enable test_read_interfaces tests for iceberg * fix active table format filter * use mixin for object store rs credentials * generalize catalog typing * extract pyiceberg scheme mapping into separate function * generalize credentials mixin test setup * remove unused import * add centralized fallback to append when merge is not supported * Revert "add centralized fallback to append when merge is not supported" This reverts commit `54cd0bcebf`. * fall back to append if merge is not supported on filesystem * fix test for s3-compatible storage * remove obsolete code path * exclude gcs read interface tests for iceberg * add gcs support for iceberg table format * switch to UnsupportedAuthenticationMethodException * add iceberg table format docs * use shorter pipeline name to prevent too long sql identifiers * add iceberg catalog note to docs * black format * use shorter pipeline name to prevent too long sql identifiers * correct max id length for sqlalchemy mysql dialect * Revert "use shorter pipeline name to prevent too long sql identifiers" This reverts commit `6cce03b771`. * Revert "use shorter pipeline name to prevent too long sql identifiers" This reverts commit `ef29aa7c2f`. * replace show with execute to prevent useless print output * add abfss scheme to test * remove az support for iceberg table format * remove iceberg bucket test exclusion * add note to docs on azure scheme support for iceberg table format * exclude iceberg from duckdb s3-compatibility test * disable pyiceberg info logs for tests * extend table format docs and move into own page * upgrade adlfs to enable account_host attribute * Merge branch 'devel' of https://github.com/dlt-hub/dlt into feat/1996-iceberg-filesystem * fix lint errors * re-add pyiceberg dependency * enabled iceberg in dbt-duckdb * upgrade pyiceberg version * remove pyiceberg mypy errors across python version * does not install airflow group for dev * fixes gcp oauth iceberg credentials handling * fixes ca cert bundle duckdb azure on ci * allow for airflow dep to be present during type check --------- Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>	2024-12-11 09:35:59 +01:00
dat-a-man	f5a64be626	Added deploy with modal. (#1805 ) * Added deploy with modal. * A few minor fixes * updated links as per comment * Updated as per the comments. * Update docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md * Updated * Updated as per comments * Updated * minor fix for relative link * Incorporated comments and new script provided. * Added the snippets * Updated * Updated * updated poetry.lock * Updated "poetry.lock" * Added "__init__.py" * Updated snippets.py * Updated path in MAKEFILE * Added __init__.py in walkthroughs * Adjusted for black * Modified mypy.ini added a pattern module_name_pattern = '[a-zA-Z0-9_\-]+' * updated * renamed deploy-a-pipeline with deploy_a_pipeline * Updated for errors in linting * small changes * bring back deploy-a-pipeline * bring back deploy-a-pipeline in sidebar * fix path to snippet * update lock file * fix path to snippet in tags * fix Duplicate module named "snippets" * rename snippets to code, refactor article, fix mypy errors * fix black errors * rename code to deploy_snippets * add pytest testing for modal function * move example article to the bottom * update lock file --------- Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com> Co-authored-by: Alena <alena@dlthub.com>	2024-11-07 07:35:04 +01:00
David Scharf	b732eb229c	Super fast snippet linting & type checking (#2019 ) * lint and type check all snippets at once * start fixing snippets * don't ignore missing imports * small changes * fixes many snippets * fix a bunch of more stuff * fix linter * more small fixes * make pendulum and datetime top level imports * fixed timedelta occurences * fix linter	2024-11-04 20:19:39 +01:00
Anton Burnashev	d7bf4a8271	Excludes examles from mypy and flake8 checks (#1969 )	2024-11-01 17:53:19 +01:00
rudolfix	f290522b8b	unifies run configuration and run context (#1944 ) * allows to pass run_dir via plugin hook + arbitrary args * adds name, data_dir and pipeline deprecation to run_configuration, renames to runtime_configuration * adds before_add, after_remove and improves add_extra when adding to container, tracks reference to container in context * merges run context and provider context, exposes init providers via run context * initializes loggers with run context * does not use config injection when creating default requests Client * removes duplicated code for examples and doc snippets * allows to init requests helper without runtime injection, uses re-entrant locks when injecting context * disables sentry on CI * renames config provider context to container, improves telemetry fixtures in tests	2024-10-15 10:37:55 +02:00
David Scharf	4ee65a8269	data pond: expose readable datasets as dataframes and arrow tables (#1507 ) * add simple ibis helper * start working on dataframe reading interface * a bit more work * first simple implementation * small change * more work on dataset * some work on filesystem destination * add support for parquet files and compression on jsonl files in filesystem dataframe implementation * fix test after devel merge * add nice composable pipeline example * small updates to demo * enable tests for all bucket providers remove resource based dataset accessor * fix tests * create views in duckdb filesystem accessor * move to relations based interface * add generic duckdb interface to filesystem * move code for accessing frames and tables to the cursor and use duckdb dbapi cursor in filesystem * add native db api cursor fetching to exposed dataset * some small changes * switch dataaccess pandas to pyarrow * add native bigquery support for df and arrow tables * change iter functions to always expect chunk size (None will default to full frame/table) * add native implementation for databricks * add dremio native implementation for full frames and tables * fix filesystem test make filesystem duckdb instance use glob pattern * add test for evolving filesystem * fix empty dataframe retrieval * remove old df test * clean up interfaces a bit (more to come?) remove pipeline dependency from dataset * move dataset creation into destination client and clean up interfaces / reference a bit more * renames some interfaces and adds brief docstrings * add filesystem cached duckdb and remove the need to declare needed views for filesystem * fix tests for snowflake * make data set a function * fix db-types depdency for bigquery * create duckdb based sql client for filesystem * fix example pipeline * enable filesystem sql client to work on streamlit * add comments * rename sql to query remove unneeded code * fix tests that rely on sql client * post merge cleanups * move imports around a bit * exclude abfss buckets from test * add support for arrow schema creation from known dlt schema * re-use sqldatabase code for cursors * fix bug * add default columns where needed * add sql glot to filesystem deps * store filesystem tables in correct dataset * move cursor columns location * fix snowflake and mssql disable tests with sftp * clean up compose files a bit * fix sqlalchemy * add mysql docker compose file * fix linting * prepare hint checking * disable part of state test * enable hint check * add column type support for filesystem json * rename dataset implementation to DBAPI remove dataset specific code from destination client * wrap functions in dbapi readable dataset * remove example pipeline * rename test_decimal_name * make column code a bit clearer and fix mssql again * rename df methods to pandas * fix bug in default columns * fix hints test and columns bug removes some uneeded code * catch mysql error if no rows returned * add exceptions for not implemented bucket and filetypes * fix docs * add config section for getting pipeline clients * set default dataset in filesystem sqlclient * add config section for sync_destination * rename readablerelation methods * use more functions of the duckdb sql client in filesystem version * update dependencies * use active pipeline capabilities if available for arrow table * update types * rename dataset accessor function * add test for accessing tables with unquqlified tablename * fix sql client * add duckdb native support for azure, s3 and gcs (via s3) * some typing * add dataframes tests back in * add join table and update view tests for filesystem * start adding tests for creating views on remote duckdb * fix snippets * fix some dependencies and mssql/synapse tests * fix bigquery dependencies and abfss tests * add tests for adding view to external dbs and persistent secrets * add support for delta tables * add duckdb to read interface tests * fix delta tests * make default secret name derived from bucket url * try fix azure tests again * fix df access tests * PR fixes * correct internal table access * allow datasets without schema * skips parametrized queries, skips tables from non-dataset schemas * move filesystem specific sql_client tests to correct location and test a few more things * fix sql client tests * make secret name when dropping optional * fix gs test * remove moved filesystem tests from test_read_interfaces * fix sql client tests again... :) * clear duckdb secrets * disable secrets deleting for delta tests --------- Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>	2024-10-08 14:30:56 +02:00
Jorrit Sandbrink	ff1434b480	fix intermittent `delta` panic issue (#1832 ) * bring airflow group back to make dev * replace try_get_deltatable	2024-09-19 13:49:11 +02:00
David Scharf	c96ce7b957	docs: fix absolute links (#1834 ) * search and replace absolute links * fix after automatic replacement * fix devel links * add docs preprocessing step to ci docs tests * add check for devel and absolute links * post merge fix * add line number to error output * install node 20 * fix all root links in docs	2024-09-18 15:39:34 +02:00

1 2 3

108 Commits