379 Commits

Author SHA1 Message Date
anuunchin
266052eb76 Docs: Converting Jupyter notebooks in education to marimo notebooks (#3068)
* Initial commit

* lesson_1_quick_start adjusted for marimo

* lesson_2_dlt_sources_and_resources_create_first_dlt_pipeline marimo

* Fundamentals course 3 improved

* Marimo badges added

* Fundamenta: course 8

* Marimo badge link fix

* Fundamentals: course 7

* Fundamentals: course 6

* Fundamentals: course 5

* Fundamentals: cousre 4

* Fundamentals: course 3

* Fundamentals: course 2

* Fundmantals: course 1

* marimo links corrected

* Inline deps

* Fundamentals: fix lesson 2

* Fundamentals: fix lesson 3

* Fundamentals: fix lesson 4

* Formatting moved to build-molabs

* Fundamentals: fix lesson 5

* Removal of scrolls

* Fundamentals: fix lesson 6

* Fundamentals: fix lesson 7

* Fundamentals: fix lesson 8

* os.environ replaced with dlt.secrets where relevant

* Advanced: fix lesson 5

* Advanced fix lesson 9

* os.environ fixes

* Advanced: fix lesson 1

* Comments cleanup

* Additional comment removal, fix lesson 6 advanced

* Clean main makefile

* Get rid of constants.py

* Nicer json.loads()

* Better functions in preprocess_to_molab

* Tests for doc tooling funcs

* Validate molab command

* Marimo check added

* docs pages adjustment

* limits sqlglot in dev group until fixed

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-16 16:30:32 +01:00
Rakesh V.
34669f1ac7 Feat/iceberg advanced partitioning (#3053)
* feat: implement advanced Iceberg partitioning with explicit ordering

- Add support for advanced partition transforms (year, month, day, hour, bucket, truncate)
- Implement explicit partition ordering via index property
- Add custom partition naming support
- Implement priority system: advanced partitioning overrides legacy partition: True
- Add comprehensive validation for partition specifications
- Add graceful error handling for PyIceberg limitations
- Add performance optimization with early exit for non-partitioned schemas
- Update schema typing to support dict/list partition syntax
- Add pyiceberg-core>=0.6.0 dependency for advanced transforms
- Add comprehensive test suite with 22+ test cases covering all scenarios

Backward compatible: existing partition: True syntax continues to work
Resolves partition ordering limitations in Iceberg table format

* Port iceberg_partition and build_iceberg_partition_spec to dlt core

* update type hint in IcebergLoadFilesystemJob

* Add tests for Iceberg advanced partitioning; remove unused partition extraction code

* Add docs for iceberg_adapter

---------

Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
2025-12-12 10:57:56 +01:00
ivasio
99207237fe docs: add runtime docs to CLI reference (#3445)
* bumps to version 1.20.0

* update the hub reference docs, add CI check

* use dependency specifier in hub for plugin version check

* minimum dlt runtime cli check

* rollaback to old fsspec min version

* fixes test_hub ci workflow

* fixes flaky test

* bumps hub extra

* updates cli docs linting

* fixes docs lock

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: ivasio <ivan@dlthub.com>
2025-12-09 17:30:53 +01:00
djudjuu
289e00dece data quality checks cell in dashboard (#3413)
* adds hub extra

* makes hub module more user friendly when hub not installed

* test and lint fixes

* adds plugin version check util function

* basic cell appearing if installed

* use data quality cell

* show raw data too

* adds dlt-runtime to hub extra, minimal import tests

* bumps to dlthub 0.20.0 alpha

* lists pipelines with cli using the same functions as dashboard, dlt pipeline will list pipelines by default

* adds configured propfiles method on context so only profiles with configs or pipelines are listed

* adds list of locations that contained actual configs to provider interface

* improves workspace and profile commands

* test fixes

* fixes tests

* update text

* adds quality widget as python functions

* adds data_quality as module to hub

* adds hub extra to docs deps

* fixes dashboard imports

* bumps to alpha x.20.0a1

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-12-07 12:59:21 +01:00
rudolfix
06bc05848b (chore) adds hub extra (#3428)
* adds hub extra

* makes hub module more user friendly when hub not installed

* test and lint fixes

* adds plugin version check util function

* adds dlt-runtime to hub extra, minimal import tests

* bumps to dlthub 0.20.0 alpha

* lists pipelines with cli using the same functions as dashboard, dlt pipeline will list pipelines by default

* adds configured propfiles method on context so only profiles with configs or pipelines are listed

* adds list of locations that contained actual configs to provider interface

* improves workspace and profile commands

* test fixes

* fixes tests
2025-12-05 16:15:19 +01:00
rudolfix
3e84f7aaa9 blocks failed sqlglot version, bumps sqlglot in lockfile (#3420) 2025-12-02 22:23:54 +01:00
rudolfix
dd38c80fb4 fixes arrow import in sql_database (#3411)
* fixes pyarrow import in sql_database

* bumps to 1.19.1

* linter fix

* fixes common workflow
2025-12-02 18:33:03 +01:00
rudolfix
a0e5bd073d bumps to version 1.19.0 (#3401)
* bumps to version 1.19.0

* fixes lakeformation test
2025-12-01 11:38:02 +01:00
rudolfix
fc47edd280 ingests parquet into mssql, mysql and sqlite via ADBC (#3333)
* extracts adbc parquet load job with file format selector

* ports postgres parquet job to base job

* implements mssql adbc job

* adds pickle test for all destination caps

* adds dbc to adbc group, updates test workflow

* fixes sqlglot from find

* fixes docs

* adds sqlalchemy adbc docs

* adds support from sqllite and mysql in sqlalchemy

* fixes and tests str annotation resolving

* allows to disable adbc and does that in tests

* fixes imports

* docs lock bump

* fixes globalns extraction

* clarifies how adbc drivers are installed, implements fallback for postgres

* improves dashboard multi schema test

* fixes followup jobs

* fixes connection string escaping

* Update docs/website/docs/dlt-ecosystem/destinations/sqlalchemy.md

Co-authored-by: djudjuu <djudju@proton.me>

* removes code dedup

* fixes columns that receive None, simple and nested values

---------

Co-authored-by: djudjuu <djudju@proton.me>
2025-11-28 17:13:19 +01:00
rudolfix
8bd0b116fb fixes athena refresh mode (#3313)
* adds filter to exclude dropped tables in staging destination, implements for athena

* enables refresh mode tests for athena, fixes tests

* fixes staging_allowed_local_path on databricks, bumps databricks connector in lockfile

* passes dropped tables schemas to filter, adjust athena filter

* allows to disable lake formation
2025-11-21 10:58:54 +01:00
Violetta Mishechkina
b08f2334a8 docs: update weaviate destination docs and version (#3352) 2025-11-20 15:45:00 -05:00
David Scharf
4a5ffd82b3 Chore: Update docs npm dependencies and clean up docs build tooling (#3247)
* bump npm deps

* remove unneeded netlify redirects file

* remove unneeded lockfile

* remove another unneeded lockfile

* post rebase lockfile update

* remove old netlify command

* create new docs tools project and move api docs gen there

* tmp

* add uv to build docs workflow

* move docs pyproject

* re-org docs pcakage and move snippet linter

* move notebook linting commands and deps to tools folder
add flake8 to tools linting

* remove unneeded files

* fix linting and formatting errors

* remove wrong file

* move docs processing script to new package

* fix gen api ref

* clean up package json and use commands from parent makefile

* update build website workflow

* move linting to docs makefile partially

* fix python version for docs project

* consolidate docs commands in docs makefile

* fix docs linter

* fully update docs test flow

* fixes some linting and dependency problems

* fix constants

* move notebook formatting to docs project

* fix lint embedded snippets

* fix examples tests

* add missing dependencies

* fix snippet linting

* add missing lint dependencies to core and missing test dependencies to docs

* add missing weaviate

* add missing regex module

* add forked dependency and updates readme file

* revert accidental change to example

* fix main linter

* * Move relevant pytest options to subproject
* Remove shims / path inserts that are now managed by pytest options
* Some typing fixes
* Clean up base project pytest ini
* Enable transformation snippets tests

* remove unneeded raw import of intro snippets

* downgrade alive progress

* uses dlt logger which also fixes internal alive error

* enables transformation snippets linting

* fixes dashboard races again

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-11-16 18:01:30 +01:00
rudolfix
6811dd7044 clones command repos in global_dir & bumps to 1.18.2 (#3279)
* clones command repos into global_dir, not data_dir

* bumps to version 1.18.2
2025-11-03 13:22:18 +01:00
rudolfix
192296f4f8 fixes git import and enables tests (#3262)
* enable hub tests

* removes erroneous git import

* enables tests with importing dlt into minimal alpine container

* imports workspace modules on demand

* bumps dlt to version 1.18.1

* fixes mssql hub test on mac

* review fixes
2025-10-29 21:32:07 +01:00
rudolfix
e56f617c0e adds more signal options (#3248)
* adds option in load that prevents draining pool on signal

* adds runtime pipeline option to not intercept signals

* refactors signal module

* tests new cases

* describes signal handling in running in prod docs

* bumps dlt to 1.18.0

* fixes tests forked

* removes logging and buffered console output from signals

* adds retry count to load job metrics, generates started_at in init of runnable load job

* allows to update existing metrics in load step

* finalized jobs require start and finish dates

* generates metrics in each job state and in each completed loop, does not complete package if pool drained but jobs left, adds detailed tests for metrics

* fixes remote metrics

* replaces event with package bound semaphore to complete load jobs early

* fixes dashboard to on windows

* improves signals docs

* renames delayed_signals to intercepted_signals
2025-10-28 13:56:24 +01:00
Max Yakovenko
98c81466ea Feature: Introduce support of http based resources for fs source (#3029)
* Feature, Add support of http based paths

* Feature, Add support of http resources

* Feature, Enforce coercion to pendulum types. Add support of RFC 1123 format

* Feature, Add cloudfront base_url to the configurations

* Feature, Add a test for http based resources

* Feature, Add a test case for RFC 1123 datetime format

* Feature, Remove test cases related to datetime parsing in RFC and timestamp formats

* Revert "Feature, Enforce coercion to pendulum types. Add support of RFC 1123 format"

This reverts commit 142624b24a.

* Feature, Restore the structure of the url for the cdn

* Feature, Replace custom datetime parser function with a single dispatched one

* Feature, Add a stub package for singledispatch

* Feature, Reffactor pendulume datetime processing functions

* Feature, Fix the linting errors in time related tests

* Feature, Fix the declaration

* Feature, Revert the changes related to datetime parsing

* Feature, Add http schema for testing. Add pendulum parser to support RFC 1123 format

* Feature, Update the configuration for http bucket

* Feature, Add a http server. Update the test for http fs

* Feature, Upgrade fsspec

* Feature, Fix codestyle

* Feature, Fix the protocol validation for fsspec args

* Feature, Fix the typing annotations

* Add an example for http filesystem

* Feature, Add schema to the urlparse call

* Feature, Fix the codestyle for http entries in MIME_DISPATCH

* Feature, Expand the list of supported locations in the docs

* uses more random port and closes httpd to release it properly, drops auto fixture as it would be attached to all tests

* moves httpd tests to common tests

* adds http extra to support fsspec

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-10-23 17:08:15 +02:00
rudolfix
0dcdcf0e33 ignores native config values if config spec does not implement those (#3233)
* does not fail config resolution if native valued provided to a config that does not implement native values

* updates databricks docs

* allows to replace hints regexes on schema

* removes partition hint on eth merge test on databricks

* adds pokemon table count consts

* reorgs databricks dlt fix

* fixes lancedb custom destination example

* fixes lancedb custom destination example

* reduces no sql_database examples run on ci

* fixes merge

* marks and skips rfam tests
2025-10-22 22:48:13 +02:00
anuunchin
3298b4059f Feat: workspace file selector, package builder (#3207)
* File selector, package builder

* pathspec added, improvements

* Test for file selector

* Test for package builder

* digest256_tar_stream util

* Unnecessary file selector protocol removed

* Posix path in builder

* Relevant notes and dosctring improvements
2025-10-20 23:18:13 +02:00
Menna
e2ef7c1ec8 feat/3103: Ensure consistency in HexBytes coercion (#3200)
* Refactor: Replace hexbytes dependency with custom HexBytes implementation

* Removed the hexbytes library and integrated a custom HexBytes class to ensure compatibility with the codebase.
* Updated imports across multiple files to use the new HexBytes class.
* Added tests for the HexBytes class to validate its functionality and ensure proper behavior with various input types.

* Update hexbytes error handling test to reject lists as input type

* Remove TypeError test for unsupported list input in HexBytes error handling

* Refactor: Improve formatting of hex method in HexBytes class for better readability

* Refactor: Clean up comments and improve readability in hex method of HexBytes class

* Refactor: Rename methods in HexBytes class for clarity and consistency

* Updated method names from `to_bytes` to `_to_bytes` and `hexstr_to_bytes` to `_hexstr_to_bytes` to indicate their private nature.
* Adjusted method calls within the class to reflect the new names, enhancing code readability and maintainability.

* * Removed support for bool and int types in HexBytes constructor, streamlining input handling and Introduced a new fromhex method to create HexBytes from hex strings, improving clarity.

* Remove hexbytes dependency from lockfile and related configurations

* Enhance hex method in HexBytes class to support custom separators and bytes per separator. This improves flexibility in hex encoding output while maintaining the existing functionality.

* Refactor hex method in HexBytes class to improve parameter handling and readability. Updated the method signature to clarify the use of custom separators and bytes per separator, ensuring consistent behavior with existing functionality.

* Update hex method in HexBytes class to remove unnecessary noqa comments, enhancing code clarity and consistency.
2025-10-20 22:22:06 +02:00
rudolfix
fe567414dc chore/moves cli to _workspace module (#3215)
* adds selective required context, checks profile support in switch_profile

* creates and tests hub module

* adds plugin version to telemetry

* renames imports in docs

* renames ci workflows

* fixes lint

* tests deploy command on duckdb

* moves cli module to workspace

* moves cli tests to workspace module

* renames fixtures, rewrites fixture to patch run context to _storage

* allows to patch global dir in workspace context

* when finding git repo, does not look up if GIT_CEILING_DIRECTORIES is set

* imports git utils only when need to clone package in dbt runner

* runs workspace tests as part of common

* fixes tests, config tests sideeffects

* moves dashboards to workspace

* fixes pipeline trace test

* moves dashboard helper tests

* excludes additional secret files and pinned profile from gitignore

* cleansup hatchling files in pyproject

* fixes dashboard running tests in ci

* moves git module to libs

* diff fix

* fixes fixture names
2025-10-19 15:21:42 +02:00
Thierry Jean
8a46409dad repo(pytest): migrate to pyproject.toml and reduce verbosity (#3205)
* migrate pytest.ini to pyproject.toml

* decrease verbosity level to quiet
2025-10-16 08:07:24 -04:00
rudolfix
01698752db Feat/adds workspace (#3171)
* ports toml config provider with profiles

* supports run context with profiles

* separates pluggy hooks from impls, uses pyproject and __plugins__.py for self-plugging

* implements workspace run context with profiles and basic cli

* displays workspace name and profile name before executing cli commands if run context supports profiles

* exposes dlt.current.workspace()

* converts run context protocol into abstract class

* fixes plugins tests

* refactors _workspace: private and public modules

* adds workspace test cases

* launches workspace and pipeline mpc with cli, sse by default

* tests basic workspace behaviors

* refactors code to switch context and profile

* adds default profile to run context interface

* ports pipeline and oss mcp, changes derivation structure

* adds safeguards and tests to workspace cleanup cli helper

* adds run_context to SupportsPipeline, checks run_context change on pipeline activation

* adds mcp dependency to workspace extra, fixes types

* renames test fixture

* mcp export tweak

* updates cli reference and common ci workflow

* disables dlt-plus deps in ci

* removes df from mcp tools, fixes workspace tests

* fixes tests
2025-10-08 20:16:34 +02:00
Thierry Jean
a3ec5bf4e0 fix(dashboard): remove pandas deps, use pyarrow (#3157) 2025-10-02 13:42:11 -04:00
rudolfix
499afaf5dc bump to version 1.17.1 (#3158) 2025-10-02 15:41:30 +02:00
Marcin Rudolf
53b94352cb bump to version 1.17.0 2025-09-24 08:30:48 +02:00
Thierry Jean
8565a2ac06 feat: ducklake destination (#3015)
* move duckdb capabilities to utility function

* add basic DuckLake files based on DuckDB / Motherduck

* refactor ducklake config

* wip; ducklake destination

* simplified testing

* ignore ducklake files

* completed default config; TODO fix write

* unicode issues

* commented out patches

* lint

* uses destination_type as final fallback when creating default local file names, allows to copy local file context in WithLocalFiles

* creates connection pool for duckdb

* fixes exception handling in open_connection in sql_client, fixes racing when connections opened in duckdb, improves error handling if commit tx fails

* handles ducklake attach/detach in sql_client

* modifes ducklake configuration to: (1) use sqllite as default catalog (2) point all local files to local_dir (3) allow various urls to configure ducklake name (4) uses parquet as default file format

* adjust caps to execute load jobs sequentially for duckdb and sqllite catalogs

* passes ducklake conn to ibis, improves how duckb conn is passed (via open_connection which provides full context)

* adds configuration and credential tests, smoke tests for supported catalogs

* enables ducklake on ci

* fixes ducklake imports

* fixes how secrets are created from filesystem

* generates remote_url in load job metrics with real url of the ducklake table

* tests for all buckets

* adds ducklake extra

* adds hints for secrets.toml gen

* implements cursor for ducklake with correct df vector size

* forces use of ducklake/duckdb datasets in ibis handover, tests non existing dataset behavior

* removes dashboard e2e from common tests on ci

* docs WIP

* implements field resolution check and recursive copy for base configuration

* copies credentials before using as default when resolving capabilities

* allows recursive resolution traces in config field missing exception

* improves config resolve: collects traces recursive, keeps resolving if embedded config fails, collects resolved keys

* decouples connection string credentials and base duckdb credentials

* improves how duckdb handles exceptions when executing query

* makes catalog name explicit in ducklake credentials, creates default db and storage folder names after it

* supports ducklake partitioning on duckdb 1.4

* supports metadata schema on postgres, adds experimental ducklake catalog support on Motherduck

* fixes union config resolve with single base config in union

* docs WIP

* enabled ducklake remote test

* improves ibis filesystem con handover, enables databricks

* fixes tests

* fixes lancedb default name

* propagates only top level config section, replaces with embedded field name in other cases

* adds tests and examples for programmatic creation of ducklake facotry

* adds merge selector in duckdb caps to enable upsert on 1.4

* ducklake code cleanups

* makes sure pipeline is dropped before run_context goes out of scope

* finalizes ducklake docs

* fallback in duckdb merge selector if duckdb not installed

* propagates persist_secret flag in filesystem sql client

* fixes tests and ci

* runs remote ducklake on local postgres catalog for low latency

* uses packaging version, not semver for python packages comparisons

* Update docs/website/docs/dlt-ecosystem/destinations/duckdb.md

* fixes recursive re-raise in sql_client

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
2025-09-24 08:27:16 +02:00
David Scharf
801eb285a2 bumps to version 1.16 (#3071) 2025-09-10 06:31:11 +02:00
anuunchin
096d769828 Docs: Education notebooks formatted and linted (#3017)
* Formated and linted ed content

* Notebook filenames lowercased, no special chars
2025-09-02 08:41:47 +02:00
rudolfix
823bf3865f fully support naive and tz-aware timestamp/time data types (#2570)
* adds databricks timestamp NTZ

* improves error messages in pyarrow tuples to arrow

* decreases timestamp precision to 6 for mssql

* adds naive datetime to all data types case, enables fallback when testing destinations not supporting it

* other test fixes

* always stores incremental state last value as present in the data, tests tz-awareness edge cases

* fixes ntz timestamp tests

* fixes sqlalchemy destination to work with mssql

* adds func to current module to get current resource instance

* generates LIMIT clause in sql_database when limit step is present

* adds basic tests for mssql in sql_database

* adds docs on tz-awareness in datetime columns in sql_database

* adds naive an tz aware datetimes to destination caps, implements for various destinations

* caches dlt type to python type conversion

* normalizes timezone handling in timestamp and time data types, fixes remaining pendulum timezone problems, applies tz/non-tz preserving methods when necessary, improves test converage

* fixes incremental and lag so they always follow the tz-awareness of the data under cursor column, fixes pendulum tz problems, adds tests

* moves schema inference and data coercion from Schema to item_normalizers, applies timezone normalization to json data, adjusts new columns to destination caps for json data, tests

* casts timezones in arrow table normalizations, datetime and time cases in row tuples to arrow, refactors to get generic method to cast tables to dlt schemas, tests

* tracks resource parent, along pipe parent, fixes resource cloning when adding to source, fixes source and resource iterators, makes sure that list of extracted resources always includes implicit and explicit resources

* updates dbapi sql client for dremio

* adjust column schema inferred from arrow to destination caps in extractor, tests

* moves schema and data setup for all data types tests to common code

* adds option to exclude columns in sql_table, uses LimitItem to generate LIMIT statements, tests incl. proper cursor tests for naive/tz aware incremental cursor columns

* tests sql_database on mssql for all data types and incremental cursor on dates

* improves tests for row tuples to arrow with cast to dlt schema, tests for naive datetimes

* improved test for timestamps and int with precision on duckdb

* disables Python 3.14 tests and dashboard test on mac

* better maybe transaction in job client: takes into account ddl and regular transaction destination caps

* pyodbc py3.13 bump
2025-08-31 20:06:22 +02:00
Thierry Jean
0d90a83b8d repo: add ruff check for linting (#2967)
* Config ruff `check` 

* Add `ruff` to existing `flake8` linting for transition period
2025-08-29 11:13:26 -04:00
Thierry Jean
3be08570d4 feat: dlt.Schema.to_dot() graphviz export (#2959)
* graphviz renderer added

* dlt.Schema._repr_html_ added

* updated docs

* update CLI docs

* updated linting rule

* added tests for formatting kwargs

* added utility to validate dot
2025-08-12 14:02:15 -04:00
AyushPatel101
f17e98122d Add remaining paramiko connect params to SFTP filesystem (#2823)
* Add pkey, disabled_algorithms, transport_factory and auth_strategy parameters to paramiko.client.connect. Also update filesystem docs for SFTP creds

* Move paramiko imports after the pytest.skip

---------

Co-authored-by: Ayush Patel <Ayush.Patel@imc.com>
2025-08-11 10:46:22 +02:00
rudolfix
e9b64d6f09 bumps to version 1.15.0 (#2958)
* bumps to version 1.15.0

* handled duckdb 1.3.2 in iceberg scanner and bumps dev version - seems to work with adlfs

* binds old dev duckb on windows until segfault is fixed

* test fixes, docs update
2025-08-05 14:08:02 +02:00
Thierry Jean
eb95c36f3c fix: replace arrow2 with arrow backend for connectorx (#2933)
* replace arrow2 with arrow backend for connectorx

* updated docs/

* updated minimal deps

* update docs and pyproject.toml deps

* updated minimal deps to support 3.9

* converts +00:00 to UTC right after handover from connectorx

* fixes examples connectorx lint

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-08-04 17:01:34 +02:00
Thierry Jean
02c461a09c feat: Schema.to_dbml() (#2929)
* dbml WIP

* dbml exporter

* full reference support; Schema.to_dbml()

* revert uv.lock changes

* fixed condition for _dlt tables ref

* rename _dbml.py to private module; use json encoder

* use TStoredSchema as entrypoint

* implementation completed

* added documentation

* added CLI support

* support unknown data type

* please the linter gods

* minified the image

* enables dbml for schema export

* image link to bucket; renamed constant

* updated docs linting

* include recommended VSCode extension

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
2025-08-04 13:00:49 +02:00
David Scharf
d1daade6af Enable and test python 3.14 support (#2789)
* enable 3.14 with orjson branch

* make example plugin a uv project

* post rebase pyproject update

* fix one dependency
update readme

* update readme about python 3.14
2025-07-22 14:29:01 +02:00
rudolfix
c262022bfe fixes arrow/pandas dependencies in extras and dep groups (#2895)
* bumps to version 1.14.1

* removes non dev dependencies from dev group

* sets good pandas dep in extras
2025-07-16 21:49:05 +02:00
rudolfix
c16abfe2b8 bumps dlt to version 1.14.0 (#2889) 2025-07-16 10:22:32 +02:00
David Scharf
983a33e6b6 Run full linter step on docs changes, bump marimo min version, enable marimo tests for python 3.13 (#2884)
* run full linter step on docs changes

* disable dashboard e2e tests on 3.11
enable dashboard e2e and unit tests on 3.13

* bump marimo min dependency

* Revert "Auxiliary commit to revert individual files from 52165eaeeb543932bc917bb5efc373c02ab2937b"

This reverts commit b7c5baf7c0c51e67ad323cd1b2cb9423f48f4165.

* re-lock changes

* revert incorrect change in secrets toml
2025-07-15 14:35:22 +02:00
David Scharf
21b68e61f1 Add workspace extra and rename marimo app to "pipeline dashboard" (#2876)
* adds dlt workspace extra, updates exception and github workflows

* renames app from "marimo app" to "pipeline dashboard"
updates --marimo flag to --dashboard

* rename studio folders to dashboard

* removes all other references to studio

* exclude lockfile and markdown files from lfs

* update workspace extra dependency versions

* bump version
2025-07-14 21:26:50 +02:00
Marcin Rudolf
c19111c684 bumps to 1.13.0 2025-07-07 19:20:02 +02:00
djudjuu
96014481be update lancedb orphan deletion mechanism (#2820)
* bump to latest lancedb

* do not pass api-key to embedding_func, align schema for orphan deletion

* bump lancedb

* updated example

* use pyarrow helpers in type mapper

* removes code duplication from lancedb_client, moves jobs to a separate module

* sets nullability, fixes schema on merge to include vector column if not added by the user, removes nullability on auto-embed columns in adapter

* read vector field from config

* fix nullability test hint

* unit test add_vector_column

* more specific ValueError parsing

* no longer accept value error when opening table

* schema alignment test next versions

* no fusion datatype typecasting

* refactor

* problems with json loading

* test fixes

* fixes column normalization when reading existing schema

* warn against orphan removal without settings

* added docs

* todos, check for merge-disposition

* fixed missing load tests

* fixed tests

* fixed multiple merge keys condition

* pyarrow precision types

* remove unused code

* added max precision in LanceDB tests

* remove arrow to fsiont_tupe tests

* refactor

* prepare_load_table in orphan removal job

* documentation update

* refactor

* adds method to get dict of non-default values from configuration

* moves parquet and csv format configuration from data writers to destination

* adds parquet format to destination caps to allow lancedb to have custom settings

* adds more lancedb configs, moves connect method to credentials, allows lancedb client to be passed instead of creds

* forces arrow list struct to be saved in parquet, not the parquet default

* looks for row key only for merge disposition

* moves fill_empty_source_column_values_with_placeholder to pyarrow helper

* tests bring own vector and explicit client as credentials

* ignores lancedb in mypy.ini

* adds missing docs

* deprecates file format configs in data writers

* fix unit tests for add_vector_column

* adjust example code to updated lancedb exceptions

* skip lancedb example (because running on fork breaks)

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Co-authored-by: MOLKA ZHANI <molka@dlthub.com>
2025-07-07 19:09:17 +02:00
anuunchin
f986d322a9 Chore: Pyiceberg's python contsraint moved from project wide constraints (#2839)
* Pyiceberg's python contraint moved from project wide deps

* Python restriction removed from unnecessary placed
2025-07-07 08:48:14 +02:00
rudolfix
1ab1ac14cc (chore) improves pool executor behaviors (#2818)
* moves source state handling to extract, uses contextvars to propagate current pipe context, does not store last state in global var

* implements thread pool with shutdown timeout, adds warning when threads do not join, switch default method to spawn if in orchestrator

* detects prefect, dagster and marimo in telemetry

* propagates pipe context in pipe iterator using contextvars

* cleansup dlt.current module

* enables running in wasm/pyodide

* bumps for 1.12.4a0 wasm release

* Update tests/common/runners/test_runners.py

Co-authored-by: djudjuu <djudju@proton.me>

---------

Co-authored-by: djudjuu <djudju@proton.me>
2025-07-02 11:17:45 +02:00
rudolfix
80ed2cd244 feat(athena): apply lakeformation tags on database (cont.) (#2808)
* feat(athena): apply lakeformation tags on database

* uses credentials from destination to create tags, creates tags together with schema migration

* extracts athena sql_client to a separate module

* fixes lakeformation tests to use ci credentials and to run tests selectively

* adds snippet lang in docs

---------

Co-authored-by: Alexander Hagelborn <alex@datadao.se>
2025-06-25 22:45:45 +02:00
David Scharf
04457ddd05 Hotix - fix marimo start command (#2812)
* start marimo in subprocess again

* bump dlt version

* bump lockfile
catch keyboard interrupt

---------

Co-authored-by: rudolfix <rudolfix@rudolfix.org>
2025-06-25 16:24:40 +02:00
David Scharf
3ba504c65d marimo app updates (#2778)
* make dlt app ejectable

* update app file url in makefile and tests
add missing stylesheet to package

* start marimo app in process

* convert caching toggle to button for clearer use

* exlcude incomplete columns

* adds a bunch of tests for marimo app utils

* make normalized query output pretty and disable tests on 3.9

* filter out incomplete tables

* update cli strings and small changes to app ejection
2025-06-25 13:49:56 +02:00
Marcin Rudolf
0e96a79f52 Merge branch 'master' into devel 2025-06-25 12:24:16 +02:00
David Scharf
5245a42536 run all common tests with --resolution lowest-direct on uv sync (#2787)
* run all common tests with resolution-lowest on sync

* make model item normalizer tests pass, disable on time test for now

* fix duckdb instantiation for old versions
bump pyarrow to have version that supports "append_column" on recordbatch
exclude deltalake tests for too low pyarrow versions

* fixes errors in makefile
bump minimum pytest version to what was in lockfile

* bump pendulum min requirement

* fix common test file

* bump ibis dependency

* go back to old version of pendulum
bump to prerelease
2025-06-23 21:30:58 +02:00
David Scharf
7be54d5d01 enable linting on python 3.13 (#2790) 2025-06-23 09:15:03 +02:00