mirror of https://github.com/dlt-hub/dlt.git synced 2025-12-17 19:31:30 +00:00

Files

David Scharf 4a5ffd82b3 Chore: Update docs npm dependencies and clean up docs build tooling (#3247 )

* bump npm deps

* remove unneeded netlify redirects file

* remove unneeded lockfile

* remove another unneeded lockfile

* post rebase lockfile update

* remove old netlify command

* create new docs tools project and move api docs gen there

* tmp

* add uv to build docs workflow

* move docs pyproject

* re-org docs pcakage and move snippet linter

* move notebook linting commands and deps to tools folder
add flake8 to tools linting

* remove unneeded files

* fix linting and formatting errors

* remove wrong file

* move docs processing script to new package

* fix gen api ref

* clean up package json and use commands from parent makefile

* update build website workflow

* move linting to docs makefile partially

* fix python version for docs project

* consolidate docs commands in docs makefile

* fix docs linter

* fully update docs test flow

* fixes some linting and dependency problems

* fix constants

* move notebook formatting to docs project

* fix lint embedded snippets

* fix examples tests

* add missing dependencies

* fix snippet linting

* add missing lint dependencies to core and missing test dependencies to docs

* add missing weaviate

* add missing regex module

* add forked dependency and updates readme file

* revert accidental change to example

* fix main linter

* * Move relevant pytest options to subproject
* Remove shims / path inserts that are now managed by pytest options
* Some typing fixes
* Clean up base project pytest ini
* Enable transformation snippets tests

* remove unneeded raw import of intro snippets

* downgrade alive progress

* uses dlt logger which also fixes internal alive error

* enables transformation snippets linting

* fixes dashboard races again

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>

2025-11-16 18:01:30 +01:00

.dlt

examples for docs (#616 )

2023-10-04 11:05:18 +02:00

credentials

Chore: Update docs npm dependencies and clean up docs build tooling (#3247 )

2025-11-16 18:01:30 +01:00

data

examples for docs (#616 )

2023-10-04 11:05:18 +02:00

examples/schemas

Autodetector for ISO date strings (#767 )

2023-11-17 19:49:03 +01:00

schemas

Autodetector for ISO date strings (#767 )

2023-11-17 19:49:03 +01:00

sources

Chore: Update docs npm dependencies and clean up docs build tooling (#3247 )

2025-11-16 18:01:30 +01:00

__init__.py

examples for docs (#616 )

2023-10-04 11:05:18 +02:00

_helpers.py

Chore: Update docs npm dependencies and clean up docs build tooling (#3247 )

2025-11-16 18:01:30 +01:00

dbt_run_jaffle.py

Chore: Update docs npm dependencies and clean up docs build tooling (#3247 )

2025-11-16 18:01:30 +01:00

discord_iterator.py

format all files

2023-11-22 16:46:37 +01:00

google_drive_csv.py

examples for docs (#616 )

2023-10-04 11:05:18 +02:00

google_sheets.py

New "refresh" mode and "dev_mode" (#1063 )

2024-06-03 16:18:14 +02:00

quickstart.py

Chore: Update docs npm dependencies and clean up docs build tooling (#3247 )

2025-11-16 18:01:30 +01:00

rasa_example.py

Chore: Update docs npm dependencies and clean up docs build tooling (#3247 )

2025-11-16 18:01:30 +01:00

read_table.py

Chore: Update docs npm dependencies and clean up docs build tooling (#3247 )

2025-11-16 18:01:30 +01:00

README.md

examples for docs (#616 )

2023-10-04 11:05:18 +02:00

restore_pipeline.py

format all files

2023-11-22 16:46:37 +01:00

singer_tap_example.py

Chore: Update docs npm dependencies and clean up docs build tooling (#3247 )

2025-11-16 18:01:30 +01:00

singer_tap_jsonl_example.py

Chore: Update docs npm dependencies and clean up docs build tooling (#3247 )

2025-11-16 18:01:30 +01:00

sync_schema_example.py

examples for docs (#616 )

2023-10-04 11:05:18 +02:00

README.md

Writing Source Extractors

dlt sources are iterators or lists and writing them does not require any additional knowledge beyond basic python. dlt sources are also pythonic in nature: they are simple, can be chained, pipelined and composed like any other python iterator or a sequence.

Examples

quickstart loads a nested json document into duckdb and then queries it with built in sql_client demonstrating the parent-child table joins.
sql_query source and read_table example. This source iterates over any SELECT statement made against database system supported by SqlAlchemy. The example connects to Redshift and iterates a table containing Ethereum transactions. Shows the inferred schema (which nicely preserves typing). Mind that our source is a one-liner :)
rasa example and rasa_tracker_store source extracts rasa tracker store events to a set of inferred tables. It shows a few common patterns

shows how to pipeline resources: it depends on a "head" resource that reads base data (ie. events from kafka/postgres/file). the dependent resource is called transformer
it shows how to write stream resource which creates table schemas and sends data to those tables depending on the event type
it stores last_timestamp_value in the state

singer_tap, stdout and singer_tap_example is fully functional wrapper for any singer/meltano source

clones the desired tap, installs it and runs it in a virtual env
passes the catalog and config files
like rasa it is a transformer (on stdio pipe) and stream resource
it stores singer state in dlt state

singer_tap_jsonl_example like the above but instead of process pipe it reads singer messages from file. it creates a huge hubspot schema.
google_sheets a source that returns values from specified sheet. The example takes a sheet, infers a schema, loads it to BigQuery/Redshift and displays inferred schema. it uses the secrets.toml to manage credentials and is an example of one-liner pipeline
chess an example of a pipeline project with its own config and credential files. it is also an example of how transformers are connected to resources and resource selection. it should be run from examples/chess` folder. It also shows: how to use retry decorator and how to run resources/transformers in parallel with a decorator
chess/chess_dbt.py: an example of a dbt transformations package working with a dataset loaded by dlt. The package is incrementally processing the loaded data following the new loaded packages stored in _dlt_loads table at the end of every pipeline run. Note the automatic usage of isolated virtual environment to run dbt and sharing of the credentials.
run_dbt_jaffle runs dbt's jaffle shop example taken directly from the github repo and queries the results with sql_client. duckdb database is used to load and transform the data. The database write access is passed from dlt to dbt and back.

Not yet ported:

discord_iterator an example that load example discord data (messages, channels) into warehouse from supplied files. Shows several auxiliary pipeline functions and an example of pipelining iterators (with map function). You can also see that produced schema is quite complicated due to several layers of nesting.
ethereum source shows that you can build highly scalable, parallel and robust sources as simple iterators.