* adds optional pipeline activation history to context
* allows to configure configs and pragmas for duckdb, improves sql_client, tests
* allows query string for motherduck, tests WIP
* mocks local_dir correctly to place local files, drop duckdb in pipeline fixture in most places
* enables activation factory to drop datasets from all pipelines
* uses correct fixture scope in test read interfaces
* bumps duckdb and pyarrow
* ignores some flake8 errors
* logs resolved traces thread-wise, clears log between pipeline runs
* improves duckdb tests and docs
* bumps arrow to v20 because duckdb 1.3 needs at least 19 for its types
* fixes tests - mostly duckdb database locations
* fixes lockfile
* fixes edge cases when passing setting to duckdb connection
* disables iceberg abfss tests
* refactors WithLocalFiles so they can be used independent from destination
* more local dir test fixes
* moves WithLocalFiles to common storages configuration
* tests edge cases when setting configs on duckdb fails
* updates docs
* reverts duckdb to 1.2.1 - last stable version
* more test fixes
* moves create_secret to duckdb sqlclient
* disables building of Dockerfile until we upgrade arrow
* skip gcs compat test for local clickhouse tests
---------
Co-authored-by: dave <shrps@posteo.net>
* triggers devel tests
* fixed malformed docstring
* use native sqlglot type annotation
* pass hints via SQLGlot metadata
* fix linter errors and tests
* fix a few more tests and edge cases
* fix bug in lineage
* enable columns schema for both ReadableRelation Types
* add more tests and make lineage tests independent from loading
* add lineage tests for all sql destinations
* enable tests on ci and disable column schema for sqlalchemy for now
* fix some more tests
* add sqlalchemy hack
* first fix for snowflake and some smaller chnages and clarifications
* fix sqlglot schema creation, makes clickhouse work
* re-add transformations tests folder
* fix lineage datatype
* disable databricks and synapse ibis backend tests
* move transformation code from prototype excluding old lineage and including updates so that linter passes, no real code changes yet.
* fix some of the python extractor based transformations
* fix most tests
* make basic transformation tests run on all destinations
* enable all current transformation tests for all destinations
run some duckdb transformations on all OSes
* a little bit of cleanup
* move common transactions and mark all destination transaction tests as essential for now
* Add improvements from review in prototype PR and some cleanup
* exclude dremio
* fix some transformations tests
* fix row_counts for snowflake and add some comments
* converts SupportsReadableRelation to an ABC
* add scalar access to SupportsReadableRelation
* simplify transformation signature
* add top level dlt objects and some small changes
* second part of removing transformation extra args
* add clickhouse tests
* add config based transformation source
* add better transformation examples
* use fruitshop template for testing
* remove custom row_counts method in favor of "global" test one
* first draft of transformations doc
* some work on the docs page
* feat: 2540 lineage `allow_unknown_columns` and `allow_anonymous_columns` (#2577)
* test compute_columns_schema() and exception handling
* convert transformation code examples to snippets
* finish first round of transformation docs
* Quite a few PR fixes
* fixes some tests
* add support and docs for dataframe and arrow operations
* add config and fallback if destination not reachable
* fix scalar method
fallback to models if pipeline destination is not available
* hopefully fix one test
* Docs: addition of normalizer behaviour to transformations docs (#2639)
* Normalizer info added
* Unnecessary paragraph removed, regular normalization linked
* feat: 2540 - SQLGlot type mapping (#2587)
* fixes some tests
* post rebase cleanup
* renamed kwarg
* type handling done; WIP
* sqlglot-dlt type mapping completed
* added docstrings to tests
* removed unused test file
* attach metadata to DataType
* refactored test to parameterized form
* refactor function names
* bug fix .to_py()
* rename compute_columns_schema() kwargs
* refactor type conversion branches
* fixes some tests
* add support and docs for dataframe and arrow operations
* add config and fallback if destination not reachable
* fix scalar method
fallback to models if pipeline destination is not available
* fix: update return type in athena_adapter docstring to reflect correct destination (#2599)
* list secrets in vault config provider to avoid calls to backend (#2597)
* fixes bug where configuration section was not propagated when embedded configuration is resolved
* splits vault provider settings per vault type
* adds option to list secrets to vault and google secrets provider
* uses google secrets provider with global cache for tests
* documents vault provider
* test and docs fixes
* slightly clarify clickhouse docs (#2594)
* slightly clarify clickhouse docs
* Update clickhouse.md
* Extract dataset code snippets into tests snippets system (#2598)
* extracts dataset code blocks into tested snippets and uses fruitshop pipeline as base dataset for demonstration purposes
* add ibis group
* Enabling 'model' loader_file_format for athena, synapse and dremio (#2556)
* Athena model loader format initial support
* test_verify_capabilities_data_types adjusted for athena
* Synapse enabled
* The offset logic for tsql made unreachable
* Athena test config without iceberg removed, dremio added
* Unnecessary synapse workaround removed
* fix some typos in cursor-restapi docs (#2608)
* fix some typos in cursor-restapi docs
* fix typo
* refactor init-command for use in dlt project (#2568)
* refactor init-command for use in dlt project
* remove config.toml from project docs
* fix ibis mypy error
---------
Co-authored-by: dave <shrps@posteo.net>
* docs: Fix incorrect nesting in secrets.toml (#2614)
* fixes parquet data writer settings docs & rewrites configuration docs (#2583)
* fixes parquet data writer settings docs
* adds section to dlt resource decorator
* fixes and tests how config sections are created when single resource is extracted
* fixes config sections for parallel doc example
* exports postgres adapter
* rewrites configuration docs, moves a few docs sections in sidebar
* snippet fixes
* accepts docs changes from review
Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>
* adds tip how to eject core source
* linter fixes
---------
Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>
* enables fsspec per-thread instance cache and updates documentation (#2621)
* bumps pendulum and docs (#2624)
* fixes sql database docstrings and docs
* bumps poetry to 3.0.1 and drop dlt poetry
* Added dedup sort example (#2235)
* Added dedup sort example
* Updated formatting
* Updated
* Updated
* Update docs/website/docs/general-usage/incremental-loading.md
---------
Co-authored-by: Alena Astrakhantseva <alena@dlthub.com>
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
* Docs: add advanced project tutorial (#2338)
* hopefully fix one test
* trigger ci
* improve tests, lint
---------
Co-authored-by: David Scharf <shrps@posteo.net>
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
Co-authored-by: rudolfix <rudolfix@rudolfix.org>
Co-authored-by: anuunchin <88698977+anuunchin@users.noreply.github.com>
Co-authored-by: hsm207 <hsm207@users.noreply.github.com>
Co-authored-by: djudjuu <djudju@proton.me>
Co-authored-by: Alexander Grueneberg <com.github@agrueneberg.info>
Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>
Co-authored-by: dat-a-man <98139823+dat-a-man@users.noreply.github.com>
Co-authored-by: Alena Astrakhantseva <alena@dlthub.com>
* qualify all queries that come into the transformations
* fix lineage for snowflake and clickhouse lineage
* apply schema fix for sqlglot and remove special treatment of snowflake
* align datasets interfaces with ibis implementation ["col"] selects column and not table with one column
* disable incremental on transformations decorator and warn if incremental args are discovered
* fixes one more test
* fixes snowflake tests after sqlglot schema fix
* removes standalone resources, fixes transformation function wrapping (#2684)
* changes contrib and README (#2666)
* changes contrib and README
* Apply suggestions from code review
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
---------
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
* raises if resolving dataclass without configspec
* adds function type inspect that follows wrappers
* removes make fun, uses wraps
* adds conftest to transformations
* (1) fixes tranformation overloads (2) passes TransformationConfiguration as base spec so buffer is always injected (3) wraps tranformation_function (4) makes str SQL a model (5) tests configurations and parametrized transformations
* (1) removes resources returning resources (2) allows resources to be also functions (3) allows base spec to be passed to resource function (4) makes DltResource and SourceFactory to wrap decorated function and fixes signatures (5) allows inner resources to be injectable, warns for transformers (6) normalizes and tests how functions are wrapped and unwrapped so signatures and configs are available
* normalizes config resolve behavior: default values can be overriden from providers but explicit cannot. if those were instances of base configurations, behavior was inconsistent (explicit values were treated like defaults). also if native value is found for a config and it does not accept native values, config resolution will fail, previously it was ignored
* do not use config specs cached in module when creating autospecs
* fixes venv tests when uv is present
* if incremental parses from another incremental as native value, it copies origina type correctly
* merges standalone resources with regular resources: (1) all are DltResources (2) we generate the correct types for __call__! (3) all resources can be configured including inner resources and including default params, previously only standalone could. that unifies behavior for resources and sources re. config injection (4) resources can return another resources if have DltResource in type annotation (5) resources can be renamed with lambda names also sections can be renamed
* fixes transformation decorators so they generate correct typing
* binds params to resource function instead of using defaults to avoid generating config injection in rest_api
* removes remaining full_refresh flags
* fixes Makefile commands to run common and local destination tests
* fixes xdg home test
* fixes venv tests for uv
* linter and docsstring fixes
---------
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
* allows for initial values that are configurations also in case no native initial values are supported
* fixes docs linting
* Outer select quotes columns (#2694)
* fix normalizer tests
* fix a few small tests
* remove dependency on ibis for common tests (not supported on python 3.13)
* fixes for python 3.9
* fix sqlglot schema propagation and retrieval
* fixes leaking sqlalchemy credentials into other test
* skip not materialized columns in sqlglot schema generation
---------
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: zilto <zilto@github.com>
Co-authored-by: Thierry Jean <68975210+zilto@users.noreply.github.com>
Co-authored-by: anuunchin <88698977+anuunchin@users.noreply.github.com>
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
Co-authored-by: hsm207 <hsm207@users.noreply.github.com>
Co-authored-by: djudjuu <djudju@proton.me>
Co-authored-by: Alexander Grueneberg <com.github@agrueneberg.info>
Co-authored-by: Violetta Mishechkina <sansiositres@gmail.com>
Co-authored-by: dat-a-man <98139823+dat-a-man@users.noreply.github.com>
Co-authored-by: Alena Astrakhantseva <alena@dlthub.com>
* first version of using pydoclint
* set up linting only for some files
* first round of fixing important public interfaces
* fix destination factory docstrings
add missing dep for docs
* add missing pipeline classes to linting
* small tweaks to docs rendering
* try to fix CI errors
* fix lockfile
* trigger ci
* post merge lockfile fix
* add docstrings to dataset protocols
* small changes and revert PR target
* Update dlt/common/destination/dataset.py
---------
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
* import correct typeddict version for use in pydantic, disallow use of usual python typeddict imports
* add test
* update import for examples
* fixed some imports
* remove python 3.8 lint and test for now
* always use typeddict from typing_extensions
pin poetry in tests to 1.8.5
* adds more info to pipeline drop and info commands
* extracts known env variables to separate module
* drops tables on staging
* tests create/drop datasets and tables
* simplifies drop command and helpers + tests
* adds no print linter module and a few other small fixes
* improves collision detection when normalizers change
* allows glob to work with memory filesystem
* replaces walk in filesystem destination with own glob
* standardizes drop_dataset beahvior for all destinations
* creates athena iceberg tables in random locations
* format examples
* add core functionality for scd2 merge strategy
* make scd2 validity column names configurable
* make alias descriptive
* add validity column name conflict checking
* extend write disposition with dictionary configuration option
* add default delete-insert merge strategy
* update write_disposition type hints
* extend tested destinations
* 2nd time setup (#1202)
* remove obsolete deepcopy
* add scd2 docs
* add write_disposition existence condition
* add nullability hints to validity columns
* cache functions to limit schema lookups
* add row_hash_column_name config option
* default to default merge strategy
* replace hardcoded column name with variable to fix test
* fix doc snippets
* compares records without order and with caps timestamps precision in scd2 tests
* defines create load id, stores package state typed, allows package state to be passed on, uses load_id as created_at if possible
* creates new package to normalize from extracted package so state is carried on
* bans direct pendulum import
* uses timestamps with properly reduced precision in scd2
* selects newest state by load_id, not created_at. this will not affect execution as long as packages are processed in order
* adds formating datetime literal to escape
* renames x-row-hash to x-row-version
* corrects json and pendulum imports
* uses unique column in scd2 sql generation
* renames arrow items literal
* adds limitations to docs
* passes only complete columns to arrow normalize
* renames mode to disposition
* saves parquet with timestamp precision corresponding to the destination and updates schema in the normalizer
* adds transform that computes hashes of tables
* tests arrow/pandas + scd2
* allows scd2 columns to be added to arrow items
* various renames
* uses generic caps when writing parquet if no destination context
* disables coercing timestamps in parquet arrow writer
---------
Co-authored-by: Jorrit Sandbrink <sandbj01@heiway.net>
Co-authored-by: adrianbr <adrian.brudaru@gmail.com>
Co-authored-by: rudolfix <rudolfix@rudolfix.org>