Compare commits

...

107 Commits

Author SHA1 Message Date
github-actions[bot]
e660fefc52 Bumping version to 1.1.3 and generate changelog (#6524)
* Bumping version to 1.1.3rc1 and generate CHANGELOG

* Bumping version to 1.1.3 and generate CHANGELOG (#6525)

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-01-04 21:53:07 -05:00
github-actions[bot]
7e23a3e05d Update Version Bump to include Homebrew in PATH (#5963) (#5993) (#6523)
(cherry picked from commit f903aed3e0)

Co-authored-by: leahwicz <60146280+leahwicz@users.noreply.github.com>
2023-01-04 21:16:00 -05:00
Gerda Shank
6dbea380de [Backport 1.1.latest] Partial parsing bug with empty schema file - en… (#6516)
Co-authored-by: leahwicz <60146280+leahwicz@users.noreply.github.com>
2023-01-04 16:38:52 -05:00
Peter Webb
cfd6a6cc93 Paw/ct 1643 tox passenv (#6445)
* Break tox's passenv parameters into multiple lines

* ct-1643: remove trailing whitespace -_-

* CT-1643: Add missed parameters

* CT-1643: more whitespace fixing
2022-12-14 10:47:31 -05:00
Jeremy Cohen
2298c9fbfa Disable test_postgres_package_redirect_fail (#6410) 2022-12-08 15:27:30 +01:00
leahwicz
9f865fb148 Updating version to remove rc1 (#6373) 2022-12-05 09:06:58 -05:00
github-actions[bot]
b6c9221100 s/gitlab/github for flake8 precommit repo (#6252) (#6253)
(cherry picked from commit eae98677b9)

Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com>
2022-11-15 11:24:06 -05:00
Emily Rockman
ad710f51cd add python version and upgrade action (#6204) (#6209)
(cherry picked from commit c3ccbe3357)
2022-11-04 09:33:29 -05:00
github-actions[bot]
8ef7259d9c Bumping version to 1.1.2 and generate changelog (#5575)
* Bumping version to 1.1.2 and generate CHANGELOG

* Remove newline

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: leahwicz <60146280+leahwicz@users.noreply.github.com>
2022-07-28 15:40:23 -04:00
leahwicz
424b454918 Add changelog and whitespace fix to version bump Action (#5563) (#5573)
* Add changelog and whitespace fix to version bump Action

* Fixing whitespace

* Remove tabs

* Update .github/workflows/version-bump.yml

Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com>

* Update .github/workflows/version-bump.yml

Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com>

* Update .github/workflows/version-bump.yml

Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com>

* Update .github/workflows/version-bump.yml

Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com>

* Updating per comments

* Fix whitespace

Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com>

Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com>
2022-07-28 14:37:55 -04:00
github-actions[bot]
437b763390 Bumping version to 1.1.2rc1 (#5498)
* Bumping version to 1.1.2rc1

* Fix whitespace

* Update changelog

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: leahwicz <60146280+leahwicz@users.noreply.github.com>
Co-authored-by: Leah Antkiewicz <leah.antkiewicz@fishtownanalytics.com>
2022-07-20 09:00:30 -04:00
Jeremy Cohen
d10ebecdb6 Backport #5382 to 1.1.latest (#5415)
* Backport #5382 to 1.1.latest

* Rm changelog entry for previous change
2022-06-28 17:26:47 +02:00
Jeremy Cohen
8052331439 Declare compatibility for previous artifact versions (#5346) (#5383)
* Add functional test

* Add is_compatible_version logic

* Add changelog entry

* Fix mypy
2022-06-16 18:22:43 +02:00
github-actions[bot]
50d2981320 Bumping version to 1.1.1 (#5377)
* Bumping version to 1.1.1

* Fix whitespace

* Add Changelog

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: Leah Antkiewicz <leah.antkiewicz@fishtownanalytics.com>
2022-06-15 10:39:50 -04:00
github-actions[bot]
5ef0442654 Bumping version to 1.1.1rc2 (#5342)
* Bumping version to 1.1.1rc2

* Remove whitespace

* Add Changelog

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: Leah Antkiewicz <leah.antkiewicz@fishtownanalytics.com>
2022-06-07 11:28:46 -04:00
github-actions[bot]
4d63fb2ed2 Fixing Windows color regression (#5327) (#5340)
* Fixing Windows color regression

* Cleaning up logic

* Consolidating logic to the logger

* Cleaning up vars

* Updating comment

* Removing unused import

* Fixing whitespace

* Adding changelog

(cherry picked from commit e48f7ab32e)

Co-authored-by: leahwicz <60146280+leahwicz@users.noreply.github.com>
2022-06-07 09:32:28 -04:00
Jeremy Cohen
413a19abe6 Backports: networkx, context cleanup (#5335)
* Update context readme, small code cleanup (#5334)

* Pin networkx dependency to <2.8.4 for v1.1
2022-06-07 11:00:45 +02:00
github-actions[bot]
a3dcc14f87 [Backport 1.1.latest] CT-159 Remove docs file from manifest when deleting doc node (#5293)
* CT-159 Remove docs file from manifest when deleting doc node (#5270)

* Convert partial parsing docs tests

* Failing test

* Remove doc file from manifest when doc node is removed

* Changie

(cherry picked from commit cda88d1948)

* Fix rm_file method

* comment out pip upgrade

Co-authored-by: Gerda Shank <gerda@dbtlabs.com>
2022-06-01 10:31:38 -04:00
github-actions[bot]
5e298b1854 Bumping version to 1.1.1rc1 (#5285)
* Bumping version to 1.1.1rc1

* Remove whitespace

* Changelog update

* Fixing adapter versions

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: leahwicz <60146280+leahwicz@users.noreply.github.com>
Co-authored-by: Leah Antkiewicz <leah.antkiewicz@fishtownanalytics.com>
2022-05-20 13:28:51 -04:00
github-actions[bot]
c74e9dde2b When parsing 'all_sources' should be a list of unique dirs (#5176) (#5256)
* When parsing 'all_sources' should be a list of unique dirs

* Changie

* Fix some unit tests of all_source_paths

* Convert 039_config_tests

* Remove old 039_config_tests

* Add test for duplicate directories in 'all_source_files'

(cherry picked from commit f633e9936f)

Co-authored-by: Gerda Shank <gerda@dbtlabs.com>
2022-05-16 16:00:53 -04:00
github-actions[bot]
9c6bdcabf8 Bumping hologram version (#5218) (#5226)
* Bumping hologram version

* Add automated changelog yaml from template

* Updating issue

* Loosen requirement range

Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
(cherry picked from commit aa8115aa5e)

Co-authored-by: leahwicz <60146280+leahwicz@users.noreply.github.com>
2022-05-11 11:56:43 -04:00
leahwicz
4ae2e41cc6 Flexibilize MarkupSafe pinned version (#5039) (#5111)
* Flexibilize MarkupSafe pinned version

The current `MarkupSafe` pinned version has been added in #4746 as a
temporary fix for #4745.

However, the current restrictive approach isn't compatible with other
libraries that could require an even older version of `MarkupSafe`, like
Airflow `2.2.2` [0], which requires `markupsafe>=1.1.1, <2.0`.

To avoid that issue, we can allow a greater range of supported
`MarkupSafe` versions. Considering the direct dependency `dbt-core` has
is `Jinja2==2.11.3`, we can use its pinning as the lower bound, which is
`MarkupSafe>=0.23` [1].

This fix should be also backported this to `1.0.latest` for inclusion in
the next v1.0 patch.

[0] https://github.com/adamantike/airflow/blob/2.2.2/setup.cfg#L125
[1] https://github.com/pallets/jinja/blob/2.11.3/setup.py#L53

Co-authored-by: Michael Manganiello <adamantike@users.noreply.github.com>
2022-05-09 14:49:56 -04:00
github-actions[bot]
cbd4570c67 Bumping version to 1.1.0 (#5186)
* Bumping version to 1.1.0

* Updating changelog

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: Leah Antkiewicz <leah.antkiewicz@fishtownanalytics.com>
2022-04-28 12:54:30 -04:00
github-actions[bot]
5ae33b90bf Bumping version to 1.1.0rc3 (#5166)
* Bumping version to 1.1.0rc3

* Update changelog

* Fixing spacing issue

* Removing more whitespace

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: Leah Antkiewicz <leah.antkiewicz@fishtownanalytics.com>
Co-authored-by: leahwicz <60146280+leahwicz@users.noreply.github.com>
2022-04-26 15:25:39 -04:00
github-actions[bot]
37d78338c2 [Backport 1.1.latest] Use yaml renderer (with target context) for rendering selectors (#5161)
* Use yaml renderer (with target context) for rendering selectors (#5136)

* Use yaml renderer (with target context) for rendering selectors

* Changie

* Convert cli_vars tests

* Add test for var in profiles

* Add test for cli vars in packages

* Add test for vars in selectors

(cherry picked from commit 1f898c859a)

* Tweak cli_vars_in_packages test to do a run and build an adapter

Co-authored-by: Gerda Shank <gerda@dbtlabs.com>
2022-04-26 14:13:16 -04:00
github-actions[bot]
698420b420 Even more scrubbing (#5152) (#5156)
* Even more scrubbing

* Changelog entry

* Even more

* remove reduendent scrub

* remove reduendent scrub

* fix encoding issue

* keep scrubbed log in args

Co-authored-by: Chenyu Li <chenyu.li@dbtlabs.com>
(cherry picked from commit ce0bcc08a6)

Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
2022-04-26 10:59:07 -06:00
Emily Rockman
e23f0e0747 fix retry logic failures (#5137) (#5148)
* fix retry logic failures

* changelog

* add tests to make sure data is getting where it needs to

* rename file

* remove duplicate file

(cherry picked from commit 31a3f2bdee)
2022-04-25 10:45:35 -05:00
github-actions[bot]
106da05db7 Bumping version to 1.1.0rc2 (#5129)
* Bumping version to 1.1.0rc2

* release changelog

* fix trailing space

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: Chenyu Li <chenyu.li@dbtlabs.com>
2022-04-21 09:52:08 -06:00
Chenyu Li
09a396a731 Restore ability to utilize updated_at for check_cols snapshots (#5077) (#5126)
* Restore ability to configure and utilize `updated_at` for snapshots using the check_cols strategy

* Changelog entry

* Optional comparison of column names starting with `dbt_`

* Functional test for check cols snapshots using `updated_at`

* Comments to explain the test implementation

(cherry picked from commit d09459c980)

Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com>
2022-04-21 09:06:04 -06:00
Emily Rockman
9c233f27ab remove commas from changelog yaml (#5113) 2022-04-19 19:57:14 -05:00
github-actions[bot]
a09bc28768 Bumping version to 1.1.0rc1 (#5044)
* Bumping version to 1.1.0rc1

* Adding changelog

* Updating adapter versions for Docker

* Remove extra spaces

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: Leah Antkiewicz <leah.antkiewicz@fishtownanalytics.com>
Co-authored-by: leahwicz <60146280+leahwicz@users.noreply.github.com>
2022-04-12 17:10:57 -04:00
leahwicz
365414b5fc Bumping manifest schema to v5 (#5032)
* Bumping manifest schema to v5

* Adding changelog
2022-04-12 16:06:24 -04:00
Nathaniel May
ec46be7368 Perf regression testing - overhaul of readme and runner (#4602) 2022-04-12 16:00:55 -04:00
Stu Kilgore
f23a403468 Update version output and logic (#5029)
Update version output and logic
2022-04-12 14:36:55 -05:00
Benoit Perigaud
15ad34e415 Add selected_resources to the Jinja context (#5001)
* Add selected_resources in the Jinja context

* Add tests for the Jinja variable selected_resources

* Add Changie entry for the addition of selected_resources

* Move variable to the ProviderContext

* Move selected_resources from ModelContext to ProviderContext

* Update unit tests for context to cater for the new selected_resources variable

* Move tests to a Class where tests are run after a dbt build
2022-04-12 10:25:45 -06:00
Jeremy Cohen
bacc891703 Add experimental cache_selected_only config (#5036)
* cache schema for selected models

* Create Features-20220316-003847.yaml

* rename flag, update postgres adapter

rename flag to cache_selected_only, update postgres adapter: function _relations_cache_for_schemas

* Update Features-20220316-003847.yaml

* added test for cache_selected_only flag

* formatted as per pre-commit

* Add breaking change note for adapter plugin maintainers

* Fix whitespace

* Add a test

Co-authored-by: karunpoudel-chr <poudel.karun@gmail.com>
Co-authored-by: karunpoudel-chr <62040859+karunpoudel@users.noreply.github.com>
2022-04-12 18:04:39 +02:00
Emily Rockman
a2e167761c add more complete logic around changelog contributors section (#5037)
* add more complete logic around changelog contributors section

* add instructions for future core team members

* Update .changie.yaml
2022-04-12 10:35:21 -05:00
Emily Rockman
cce8fda06c Add enabled as a source config (#5008)
* initial pass at source config test w/o overrides

* Update tests/functional/sources/test_source_configs.py

Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>

* Update tests/functional/sources/test_source_configs.py

Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>

* tweaks from feedback

* clean up some test logic - add override tests

* add new fields to source config class

* fix odd formatting

* got a test working

* removed unused tests

* removed extra fields from SourceConfig class

* fixed next failing unit test

* adding back missing import

* first pass at adding source table configs

* updated remaining tests to pass

* remove source override tests

* add comment for config merging

* changelog

* remove old comments

* hacky fix for permission test

* remove unhelpful test

* adding back test file that was accidentally deleted

Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
Co-authored-by: Nathaniel May <nathaniel.may@fishtownanalytics.com>
Co-authored-by: Chenyu Li <chenyu.li@dbtlabs.com>
2022-04-12 10:27:29 -05:00
leahwicz
dd4ac1ba4a Updating tests and doc to support Python 3.10 (#5025)
* Updating tests and doc to support Python 3.10

* Single quotes needed for python version matrix

* Adding changelog
2022-04-12 10:52:44 -04:00
Sung Won Chung
401ebc2768 Smart Source Freshness Runs (#4256)
* first draft

* working selector code

* remove debug print logs

* copy test template

* add todo

* smarter depends on graph searching notes

* add excluded source children nodes

* remove prints and clean up logger

* opinionated fresh node selection

* better if handling

* include logs with meaningul info

* add concurrent selectors note

* cleaner logging

* Revert "Merge branch 'main' of https://github.com/dbt-labs/dbt into feature/smart-source-freshness-runs"

This reverts commit 7fee4d44bf, reversing
changes made to 17c47ff42d.

* tidy up logs

* remove comments

* handle criterion that does not match nodes

* use a blank set instead

* Revert "Revert "Merge branch 'main' of https://github.com/dbt-labs/dbt into feature/smart-source-freshness-runs""

This reverts commit 71125167a1.

* make compatible with rc and new logger

* new log format

* new selector flag name

* clarify that status needs to be correct

* compare current and previous state

* correct import

* add current state

* remove print

* add todo

* fix error conditions

* clearer refresh language

* don't run wasteful logs

* remove for now

* cleaner syntax

* turn generator into set

* remove print

* add fresh selector

* data bookmarks matter only

* remove exclusion logic for status

* keep it DRY

* remove unused import

* dynamic project root

* dynamic cwd

* add TODO

* simple profiles_dir import

* add default target path

* headless path utils

* draft work

* add previous sources artifact read

* make PreviousState aware of t-2 sources

* make SourceFreshSelectorMethod aware of t-2 sources

* add archive_path() for t-2 sources to freshness.py

* clean up merged branches

* add to changelog

* rename file

* remove files

* remove archive path logic

* add in currentstate and previousstate defaults

* working version of source fresher

* syntax source_fresher: works

* fix quoting

* working version of target_path default

* None default to sources_current

* updated source selection semantics

* remove todo

* move to test_sources folder

* copy over baseline source freshness tests

* clean up

* remove test file

* update state with version checks

* fix flake tests

* add changelog

* fix name

* add base test template

* delegate tests

* add basic test to ensure nothing runs

* add another basic test

* fix test with copy state

* run error test

* run warn test

* run pass test

* error handling for runtime error in source freshness

* error handling for runtime error in source freshness

* add back fresher selector condition

* top level selector condition

* add runtime error test

* testing source fresher test selection methods

* fix formatting issues

* fix broken tests

* remove old comments

* fix regressions in other tests

* add Anais test cases

* result selector test case

Co-authored-by: Matt Winkler <matt.winkler@fishtownanalytics.com>
2022-04-12 15:08:06 +02:00
Emily Rockman
83612a98b7 cache after retrying instead of while retrying (#5028) 2022-04-11 19:53:11 -05:00
Leopoldo Araujo
827eae2750 Added no-print flag (#4854)
* Added no-print flag

* Updated changelog

* Updated changelog

* Removed changes from CHANGELOG.md

* Updated CHANGELOG.MD with changie

* Update .changes/unreleased/Features-20220408-114118.yaml

Co-authored-by: Emily Rockman <ebuschang@gmail.com>

Co-authored-by: Emily Rockman <ebuschang@gmail.com>
2022-04-11 13:48:34 -05:00
Emily Rockman
3a3bedcd8e Update index file for docs generation (#4995)
* Update index file for docs generation

* add changelog entries
2022-04-11 11:31:03 -05:00
Stu Kilgore
c1dfb4e6e6 Convert --version tests to pytest (#5026)
Convert --version tests to pytest
2022-04-11 11:04:45 -05:00
Gerda Shank
5852f17f0b Fix hard_delete_snapshot test to do the right thing. (#5020) 2022-04-08 16:18:01 -04:00
dependabot[bot]
a94156703d Bump black from 22.1.0 to 22.3.0 (#4972)
* Bump black from 22.1.0 to 22.3.0
2022-04-08 15:10:36 -05:00
Ian Knox
2b3fb7a5d0 updated docker readme CT-452 (#5018) 2022-04-08 14:30:25 -05:00
Ian Knox
5f2a43864f Decouple project creation logic from tasks CT-299 (#4981) 2022-04-08 14:28:37 -05:00
Ian Knox
88fbc94db2 added git-blame-ignore-revs file (#5019) 2022-04-08 14:20:43 -05:00
Chenyu Li
6c277b5fe1 make graph_selection tests just checking selection (#5012)
* make graph_selection tests just checking selection

* use util function
2022-04-08 11:04:54 -06:00
Chenyu Li
40e64b238c adapter_methods (#4939)
* adapter_methods

* fix fixture scope

* update table compare method

* remove unneeded part

* update test name and comment
2022-04-08 08:32:21 -06:00
Ian Knox
581bf51574 updated event message (#5011) 2022-04-08 09:12:49 -05:00
Gerda Shank
899b0ef224 Remove TableComparison and convert existing calls to use dbt.tests.util (#4986) 2022-04-07 13:04:03 -04:00
Matthew McKnight
3ade206e86 init push up of converted unique_key tests (#4958)
* init push up of converted unique_key tests

* testing cause of failure

* adding changelog entry

* moving non basic test up one directory to be more broadly part of adapter zone

* minor changes to the bad_unique_key tests

* removed unused fixture

* moving tests to base class and inheriting in a simple class

* taking in chenyu's changes to fixtures

* remove older test_unique_key tests

* removed commented out code

* uncommenting seed_count

* v2 based on feedback for base version of testing, plus small removal of leftover breakpoint

* create incremental test directory in adapter zone

* commenting out TableComparision and trying to implement check_relations_equal instead

* remove unused commented out code

* changing cast for date to fix test to work on bigquery
2022-04-07 11:29:52 -05:00
agoblet
58bd750007 add DO_NOT_TRACK environment variable support (#5000) 2022-04-07 11:45:29 -04:00
Matthew McKnight
0ec829a096 include directory README (#4685)
* start of a README for the include directory

* minor updates

* minor updates after comments from gerda and emily

* trailing space issue?

* black formatting

* minor word change

* typo update

* minor fixes and changelog creation

* remove changelog
2022-04-06 11:53:59 -05:00
Emily Rockman
7f953a6d48 [CT-352] catch and retry malformed json (#4982)
* catch None and malformed json reponses

* add json.dumps for format

* format

* Cache registry request results. Avoid one request per version

* updated to be direct in type checking

* add changelog entry

* add back logic for none check

* PR feedback: memoize > global

* add checks for expected types and keys

* consolidated cache and retry logic

* minor cleanup for clarity/consistency

* add pr review suggestions

* update unit test

Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
2022-04-05 10:44:00 -05:00
Snyk bot
0b92f04683 [Snyk] Security upgrade python from 3.9.9-slim-bullseye to 3.10.3-slim-bullseye (#4963)
* fix: docker/Dockerfile to reduce vulnerabilities

The following vulnerabilities are fixed with an upgrade:
- https://snyk.io/vuln/SNYK-DEBIAN11-EXPAT-2403512
- https://snyk.io/vuln/SNYK-DEBIAN11-EXPAT-2406127
- https://snyk.io/vuln/SNYK-DEBIAN11-OPENSSL-2388380
- https://snyk.io/vuln/SNYK-DEBIAN11-OPENSSL-2426309
- https://snyk.io/vuln/SNYK-DEBIAN11-OPENSSL-2426309

* add changelog entry

Co-authored-by: Nathaniel May <nathaniel.may@fishtownanalytics.com>
2022-04-04 12:57:43 -04:00
Jeremy Cohen
3f37a43a8c Remove unneeded code in default snapshot materialization (#4993)
* Rm unneeded create_schema in snapshot mtlzn

* Add changelog entry
2022-04-04 17:25:53 +02:00
Gerda Shank
204d53516a Create a dbt.tests.adapter release when releasing dbt and postgres (#4948)
* update black version for pre-commit
2022-03-29 19:38:33 -04:00
Jeremy Cohen
5071b00baa Custom names for generic tests (#4898)
* Support user-supplied name for generic tests

* Support dict-style generic test spec

* Add changelog entry

* Add TODO comment

* Rework raise_duplicate_resource_name

* Add functional tests

* Update comments, rm TODO

* PR feedback
2022-03-25 17:09:35 +01:00
Emily Rockman
81118d904a Convert source tests (#4935)
* convert 059 to new test framework

* remove replaced tests

* WIP, has pre-commit errors

* WIP, has pre-commit errors

* one failing test, most issued resolved

* fixed final test and cleaned up fixtures

* remove converted tests

* updated test to work on windows

* remove config version
2022-03-24 09:19:54 -05:00
Jeremy Cohen
69cdc4148e Cosmetic changelog/changie fixups (#4944)
* Reorder kinds in changie

* Reorder change categories for v1.1.0b1

* Update language for breaking change

* Contributors deserve an h3

* Make pre-commit happy? Update language

* Rm trailing whitespace
2022-03-24 12:17:55 +01:00
Chenyu Li
1c71bf414d remove capping version of typing extensions (#4934) 2022-03-23 14:08:26 -04:00
Chenyu Li
7cf57ae72d add compliation and cache tracking (#4912) 2022-03-23 14:05:50 -04:00
kadero
1b6f95fef4 Fix inconsistent timestamps snapshots (#4513) 2022-03-23 12:05:42 -05:00
github-actions[bot]
38940eeeea Bumping version to 1.1.0b1 (#4933)
* Bumping version to 1.1.0b1
2022-03-23 09:28:50 -05:00
Ian Knox
6c950bad7c updated bumpversion (#4932) 2022-03-22 15:02:52 -05:00
Joel Labes
5e681929ae Add space before justification periods (#4744)
* Update format.py

* Update CHANGELOG.md

* add change file

Co-authored-by: Gerda Shank <gerda@dbtlabs.com>
2022-03-22 15:18:38 -04:00
Matthew McKnight
ea5a9da71e update of macro for postgres/redshift use of unique_key as a list (#4858)
* pre-commit additions

* added changie changelog entry

* moving integration test over

* Pair programming

* removing ref to mapping as seems to be unnecessary check, unique_key tests pass locally for postgres

Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
2022-03-22 10:24:21 -05:00
leahwicz
9c5ee59e19 Updating backport workflow to use forked action (#4920) 2022-03-22 09:10:30 -04:00
Emily Rockman
55b1d3a191 changie - convert changelogs to yaml files and make quality of life improvements (#4917)
* convert changelog to changie yaml files

* update contributor format and README instructions

* update action to rerun when labeled/unlabled

* remove synchronize from action

* remove md file replaced by the yaml

* add synchronize and comment of what's happening

* tweak formatting
2022-03-21 20:15:52 -05:00
Ian Knox
a968aa7725 added permissions settings for docker release workflow (#4903) 2022-03-18 10:40:05 -05:00
Gerda Shank
5e0a765917 Set up adapter testing framework for use by adapter test repos (#4846) 2022-03-17 18:01:09 -04:00
Ian Knox
0aeb9976f4 remove missing setup.py file (holdover from pip install dbt (#4896) 2022-03-17 16:52:02 -05:00
Nathaniel May
30a7da8112 [HOTFIX] update dbt-extractor dependency (#4890)
* use pep 0440 compatible release operator for dbt-extractor dependency. bump to 0.4.1.
2022-03-17 16:44:30 -04:00
Matthew McKnight
f6a9dae422 FEAT: new columns in snapshots for adapters w/o bools (#4871)
* FEAT: new columns in snapshots for adapters w/o bools

* trigger gha workflow

* using changie to make changelog

* updating to be on par with main

Co-authored-by: swanderz <swanson.anders@gmail.com>
2022-03-17 10:10:23 -05:00
Gerda Shank
62a7163334 Use cli_vars instead of context to create package and selector renderers (#4878) 2022-03-17 09:27:39 -04:00
Mila Page
e2f0467f5d Add bugged version tag value to finds. (#4816)
* Change property file version exception to reflect current name and offer clearer guidance in comments.
* Add example in case of noninteger version tag just to drive the point home to readers.
2022-03-16 14:59:48 -07:00
Mila Page
3e3ecb1c3f get_response type hint is AdapterResponse only. (#4869)
* get_response type hint is AdapterResponse only.
* Propagate changes to get_response return type to execute
2022-03-16 14:54:39 -07:00
Nathaniel May
27511d807f update test project (#4875) 2022-03-16 16:35:07 -04:00
Ian Knox
15077d087c python 3.10 support (#4866)
* python 3.10 support
2022-03-15 19:35:28 -05:00
Emily Rockman
5b01cc0c22 catch all requests exceptions to retry (#4865)
* catch all requests exceptions to retry

* add changelog
2022-03-15 11:57:07 -05:00
Chenyu Li
d1bcff865d pytest conversion test_selection, schema_tests, fail_fast, permission (#4826) 2022-03-15 11:12:30 -04:00
Emily Rockman
0fce63665c Small changie fixes (#4857)
* fix broken links, update GHA to not repost comment

* tweak GHA

* convert GHA used

* consolidate GHA

* fix PR numbers and pull comment as var

* fix name of workflow step

* changie merge to fix link at top of changelog

* add changelog yaml
2022-03-11 14:54:33 -06:00
Emily Rockman
1183e85eb4 Er/ct 303 004 simple snapshot (#4838)
* convert single test in 004

* WIP

* incremental conversion

* WIP test not running

* WIP

* convert test_missing_strategy, cross_schema_snapshot

* comment

* converting to class based test

* clean up

* WIP

* converted 2 more tests

* convert hard delete test

* fixing inconsistencies, adding comments

* more conversion

* implementing class scope changes

* clean up unsed code

* remove old test, get all new ones running

* fix typos

* append file names with snapshot to reduce collision

* moved all fixtures into test files

* stop using tests as fixtures
2022-03-11 14:52:54 -06:00
dependabot[bot]
3b86243f04 Update typing-extensions requirement from <3.11,>=3.7.4 to >=3.7.4,<4.2 in /core (#4719)
* Update typing-extensions requirement in /core

Updates the requirements on [typing-extensions](https://github.com/python/typing) to permit the latest version.
- [Release notes](https://github.com/python/typing/releases)
- [Changelog](https://github.com/python/typing/blob/master/typing_extensions/CHANGELOG)
- [Commits](https://github.com/python/typing/compare/3.7.4...4.1.0)

---
updated-dependencies:
- dependency-name: typing-extensions
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Empty-Commit

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: ChenyuLi <chenyu.li@dbtlabs.com>
2022-03-10 15:42:20 -05:00
willbowditch
c251dae75e [CT-271] [Feature] not_null test selects column instead of * (#4777)
* Only select target column for not_null test

* If storing failures include all columns in the select, if not, only select the column being tested

It's desirable for this test to include the full row output when using --store-failures. If the query result stored in the database contained just the null values of the null column, it can't do much to contextualize why those rows are null.

* Update changelog

* chore: update changelog using changie

* Revert "Update changelog"

This reverts commit 281d805959.
2022-03-09 21:31:15 -05:00
Emily Rockman
ecfd77f1ca Small updates to clarify change destinations (#4841)
* update to reflect this branch is for the 1.1 release

* update to use next

* remove next logic

* add yaml changes also marked for unreleased 1.0.4
2022-03-08 13:18:24 -06:00
Emily Rockman
9a0abc1bfc Automate changelog (#4743)
* initial setup to use changie

* added `dbt-core` to version line

* fix formatting

* rename to be more accurate

* remove extra file

* add stug for contributing section

* updated docs for contributing and changelog

* first pass at changelog check

* Fix workflow name

* comment on handling failure

* add automatic contributors section via footer

* removed unused initialization

* add script to automate entire changelog creation and handle prereleases

* stub out README

* add changelog entry!

* no longer need to add contributors ourselves

* fixed formatted and excluded core team

* fix typo and collapse if statement

* updated to reflect automatic pre-release handling

Removed custom script in favor of built in pre-release functionality in new version of changie.

* update contributing doc

* pass at GHA

* fix path

* all changed files

* more GHA work

* continued GHA work

* try another approach

* testing

* adding comment via GHA

* added uses for GHA

* more debugging

* fixed formatting

* another comment attempt

* remove read permission

* add label check

* fix quotes

* checking label logic

* test forcing failure

* remove extra script tag

* removed logic for having changelog

* Revert "removed logic for having changelog"

This reverts commit 490bda8256.

* remove unused workflow section

* update header and readme

* update with current version of changelog

* add step failure for missing changelog file

* fix typos and formatting

* small tweaks per feedback

* Update so changelog end up onlywith current version, not past

* update changelog to recent contents

* added the rest of our releases to previous release list

* clarifying the readme

* updated to reflect current changelog state

* updated so only 1.1 changes are on main
2022-03-07 20:12:33 -06:00
Gerda Shank
490d68e076 Switch to using class scope fixtures (#4835)
* Switch to using class scope fixtures

* Reorganize some graph selection tests because of ci errors
2022-03-07 14:38:36 -05:00
Stu Kilgore
c45147fe6d Fix macro modified from previous state (#4820)
* Fix macro modified from previous state

Previously, if the first node selected by state:modified had multiple
dependencies, the first of which had not been changed, the rest of the
macro dependencies of the node would not be checked for changes. This
commit fixes this behavior, so the remainder of the macro dependencies
of the node will be checked as well.
2022-03-07 08:23:59 -06:00
Gerda Shank
bc3468e649 Convert tests in dbt-adapter-tests to use new pytest framework (#4815)
* Convert tests in dbt-adapter-tests to use new pytest framework

* Filter out ResourceWarning for log file

* Move run_sql to dbt.tests.util, fix check_cols definition

* Convert jaffle_shop fixture and test to use classes

* Tweak run_sql methods, rename some adapter file pieces, add comment
to dbt.tests.adapter.

* Add some more comments
2022-03-03 16:53:41 -05:00
Kyle Wigley
8fff6729a2 simplify and cleanup gha workflow (#4803) 2022-03-02 10:21:39 -05:00
varun-dc
08f50acb9e Fix stdout piped colored output on MacOS and Linux (#4792)
* Fix stdout pipe output coloring

* Update CHANGELOG.md

Co-authored-by: Chenyu Li <chenyulee777@gmail.com>

Co-authored-by: Chenyu Li <chenyulee777@gmail.com>
2022-03-01 17:23:51 -05:00
Chenyu Li
436a5f5cd4 add coverage (#4791) 2022-02-28 09:17:33 -05:00
Emily Rockman
aca710048f ct-237 test conversion 002_varchar_widening_tests (#4795)
* convert 002 integration test

* remove original test

* moved varchar test under basic folder
2022-02-25 14:25:22 -06:00
Emily Rockman
673ad50e21 updated index file to fix DAG errors for operations & work around null columns (#4763)
* updated index file to fix DAG errors for operations

* update index file to reflect dbt-docs fixes

* add changelog
2022-02-25 13:02:26 -06:00
Chenyu Li
8ee86a61a0 rewrite graph selection (#4783)
* rewrite graph selection
2022-02-25 12:09:11 -05:00
Gerda Shank
0dda0a90cf Fix errors on Windows tests in new tests/functional (#4767)
* [#4781] Convert reads and writes in project fixture to text/utf-8 encoding

* Switch to using write_file and read_file functions

* Add comment
2022-02-25 11:13:15 -05:00
Gerda Shank
220d8b888c Fix "dbt found two resources" error with multiple snapshot blocks in one file (#4773)
* Fix handling of multiple snapshot blocks in partial parsing

* Update tests for partial parsing snapshots
2022-02-25 10:54:07 -05:00
dependabot[bot]
42d5812577 Bump black from 21.12b0 to 22.1.0 (#4718)
Bumps [black](https://github.com/psf/black) from 21.12b0 to 22.1.0.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/commits/22.1.0)

---
updated-dependencies:
- dependency-name: black
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-24 13:28:23 -05:00
Ian Knox
dea4f5f8ff update flake8 to remove line length req (#4779) 2022-02-24 11:22:25 -06:00
Dmytro Kazanzhy
8f50eee330 Fixed misspellings, typos, and duplicated words (#4545) 2022-02-22 18:05:43 -05:00
Gerda Shank
8fd8dfcf74 Initial pass at switching integration tests to pytest (#4691)
Author: Emily Rockman <emily.rockman@dbtlabs.com>
    route logs to dbt-core/logs instead of each test folder (#4711)

 * Initial pass at switching integration tests to pytest

* Reorganize dbt.tests.tables. Cleanup adapter handling

* Move run_sql to TestProjInfo and TableComparison.
Add comments, cleanup adapter schema setup

* Tweak unique_schema name generation

* Update CHANGELOG.md
2022-02-22 15:34:14 -05:00
Hein Bekker
10b27b9633 Deduplicate postgres relations (#3058) (#4521)
* Deduplicate postgres relations (#3058)

* Add changelog entry for #3058, #4521
2022-02-21 16:48:15 -06:00
Gerda Shank
5808ee6dd7 Fix bug accessing target in deps and clean commands (#4758)
* Create DictDefaultNone for to_target_dict in deps and clean commands

* Update test case to handle

* update CHANGELOG.md

* Switch to DictDefaultEmptyStr for to_target_dict
2022-02-21 13:26:29 -05:00
2495 changed files with 19643 additions and 15669 deletions

View File

@@ -1,5 +1,5 @@
[bumpversion]
current_version = 1.0.1
current_version = 1.1.3
parse = (?P<major>\d+)
\.(?P<minor>\d+)
\.(?P<patch>\d+)
@@ -24,8 +24,6 @@ values =
[bumpversion:part:pre]
first_value = 1
[bumpversion:file:setup.py]
[bumpversion:file:core/setup.py]
[bumpversion:file:core/dbt/version.py]
@@ -37,3 +35,7 @@ first_value = 1
[bumpversion:file:plugins/postgres/dbt/adapters/postgres/__version__.py]
[bumpversion:file:docker/Dockerfile]
[bumpversion:file:tests/adapter/setup.py]
[bumpversion:file:tests/adapter/dbt/tests/adapter/__version__.py]

16
.changes/0.0.0.md Normal file
View File

@@ -0,0 +1,16 @@
## Previous Releases
For information on prior major and minor releases, see their changelogs:
* [1.0](https://github.com/dbt-labs/dbt-core/blob/1.0.latest/CHANGELOG.md)
* [0.21](https://github.com/dbt-labs/dbt-core/blob/0.21.latest/CHANGELOG.md)
* [0.20](https://github.com/dbt-labs/dbt-core/blob/0.20.latest/CHANGELOG.md)
* [0.19](https://github.com/dbt-labs/dbt-core/blob/0.19.latest/CHANGELOG.md)
* [0.18](https://github.com/dbt-labs/dbt-core/blob/0.18.latest/CHANGELOG.md)
* [0.17](https://github.com/dbt-labs/dbt-core/blob/0.17.latest/CHANGELOG.md)
* [0.16](https://github.com/dbt-labs/dbt-core/blob/0.16.latest/CHANGELOG.md)
* [0.15](https://github.com/dbt-labs/dbt-core/blob/0.15.latest/CHANGELOG.md)
* [0.14](https://github.com/dbt-labs/dbt-core/blob/0.14.latest/CHANGELOG.md)
* [0.13](https://github.com/dbt-labs/dbt-core/blob/0.13.latest/CHANGELOG.md)
* [0.12](https://github.com/dbt-labs/dbt-core/blob/0.12.latest/CHANGELOG.md)
* [0.11 and earlier](https://github.com/dbt-labs/dbt-core/blob/0.11.latest/CHANGELOG.md)

96
.changes/1.1.0.md Normal file
View File

@@ -0,0 +1,96 @@
## dbt-core 1.1.0 - April 28, 2022
### Breaking Changes
- **Relevant to maintainers of adapter plugins _only_:** The abstractmethods `get_response` and `execute` now only return a `connection.AdapterReponse` in type hints. (Previously, they could return a string.) We encourage you to update your methods to return an object of class `AdapterResponse`, or implement a subclass specific to your adapter
([#4499](https://github.com/dbt-labs/dbt-core/issues/4499), [#4869](https://github.com/dbt-labs/dbt-core/pull/4869))
- For adapter plugin maintainers only: Internal adapter methods `set_relations_cache` + `_relations_cache_for_schemas` each take an additional argument, for use with experimental `CACHE_SELECTED_ONLY` config ([#4688](https://github.com/dbt-labs/dbt-core/issues/4688), [#4860](https://github.com/dbt-labs/dbt-core/pull/4860))
### Features
- Change behaviour of `non_null` test so that it only `select`s all columns if `--store-failures` is enabled. ([#4769](https://github.com/dbt-labs/dbt-core/issues/4769), [#4777](https://github.com/dbt-labs/dbt-core/pull/4777))
- Testing framework for dbt adapter testing ([#4730](https://github.com/dbt-labs/dbt-core/issues/4730), [#4846](https://github.com/dbt-labs/dbt-core/pull/4846))
- Allow unique key to take a list implementation for postgres/redshift ([#4738](https://github.com/dbt-labs/dbt-core/issues/4738), [#4858](https://github.com/dbt-labs/dbt-core/pull/4858))
- Add `--cache_selected_only` flag to cache schema object of selected models only. ([#4688](https://github.com/dbt-labs/dbt-core/issues/4688), [#4860](https://github.com/dbt-labs/dbt-core/pull/4860))
- Support custom names for generic tests ([#3348](https://github.com/dbt-labs/dbt-core/issues/3348), [#4898](https://github.com/dbt-labs/dbt-core/pull/4898))
- Added Support for Semantic Versioning ([#4453](https://github.com/dbt-labs/dbt-core/issues/4453), [#4644](https://github.com/dbt-labs/dbt-core/pull/4644))
- New Dockerfile to support specific db adapters and platforms. See docker/README.md for details ([#4495](https://github.com/dbt-labs/dbt-core/issues/4495), [#4487](https://github.com/dbt-labs/dbt-core/pull/4487))
- Allow unique_key to take a list ([#2479](https://github.com/dbt-labs/dbt-core/issues/2479), [#4618](https://github.com/dbt-labs/dbt-core/pull/4618))
- Add `--quiet` global flag and `print` Jinja function ([#3451](https://github.com/dbt-labs/dbt-core/issues/3451), [#4701](https://github.com/dbt-labs/dbt-core/pull/4701))
- Add space before justification periods ([#4737](https://github.com/dbt-labs/dbt-core/issues/4737), [#4744](https://github.com/dbt-labs/dbt-core/pull/4744))
- Enable dbt jobs to run downstream models based on fresher sources. Compare the source freshness results between previous and current state. If any source is fresher and/or new in current vs. previous state, dbt will run and test the downstream models in scope. Example command: `dbt build --select source_status:fresher+` ([#4050](https://github.com/dbt-labs/dbt-core/issues/4050), [#4256](https://github.com/dbt-labs/dbt-core/pull/4256))
- converting unique key as list tests to new pytest format ([#4882](https://github.com/dbt-labs/dbt-core/issues/4882), [#4958](https://github.com/dbt-labs/dbt-core/pull/4958))
- Add a variable called selected_resources in the Jinja context containing a list of all the resources matching the nodes for the --select, --exclude and/or --selector parameters. ([#3471](https://github.com/dbt-labs/dbt-core/issues/3471), [#5001](https://github.com/dbt-labs/dbt-core/pull/5001))
- Support the DO_NOT_TRACK environment variable from the consoledonottrack.com initiative ([#3540](https://github.com/dbt-labs/dbt-core/issues/3540), [#5000](https://github.com/dbt-labs/dbt-core/pull/5000))
- Add `--no-print` global flag ([#4710](https://github.com/dbt-labs/dbt-core/issues/4710), [#4854](https://github.com/dbt-labs/dbt-core/pull/4854))
- add enabled as a source config ([#3662](https://github.com/dbt-labs/dbt-core/issues/3662), [#5008](https://github.com/dbt-labs/dbt-core/pull/5008))
### Fixes
- Fix bug causing empty node level meta, snapshot config errors ([#4459](https://github.com/dbt-labs/dbt-core/issues/4459), [#4726](https://github.com/dbt-labs/dbt-core/pull/4726))
- Inconsistent timestamps between inserted/updated and deleted rows in snapshots ([#4347](https://github.com/dbt-labs/dbt-core/issues/4347), [#4513](https://github.com/dbt-labs/dbt-core/pull/4513))
- Catch all Requests Exceptions on deps install to attempt retries. Also log the exceptions hit. ([#4849](https://github.com/dbt-labs/dbt-core/issues/4849), [#4865](https://github.com/dbt-labs/dbt-core/pull/4865))
- Use cli_vars instead of context to create package and selector renderers ([#4876](https://github.com/dbt-labs/dbt-core/issues/4876), [#4878](https://github.com/dbt-labs/dbt-core/pull/4878))
- depend on new dbt-extractor version with fixed github links ([#4891](https://github.com/dbt-labs/dbt-core/issues/4891), [#4890](https://github.com/dbt-labs/dbt-core/pull/4890))
- Update bumpervsion config to stop looking for missing setup.py ([#-1](https://github.com/dbt-labs/dbt-core/issues/-1), [#4896](https://github.com/dbt-labs/dbt-core/pull/4896))
- Corrected permissions settings for docker release workflow ([#4902](https://github.com/dbt-labs/dbt-core/issues/4902), [#4903](https://github.com/dbt-labs/dbt-core/pull/4903))
- User wasn't asked for permission to overwite a profile entry when running init inside an existing project ([#4375](https://github.com/dbt-labs/dbt-core/issues/4375), [#4447](https://github.com/dbt-labs/dbt-core/pull/4447))
- Add project name validation to `dbt init` ([#4490](https://github.com/dbt-labs/dbt-core/issues/4490), [#4536](https://github.com/dbt-labs/dbt-core/pull/4536))
- Allow override of string and numeric types for adapters. ([#4603](https://github.com/dbt-labs/dbt-core/issues/4603), [#4604](https://github.com/dbt-labs/dbt-core/pull/4604))
- A change in secret environment variables won't trigger a full reparse ([#4650](https://github.com/dbt-labs/dbt-core/issues/4650), [#4665](https://github.com/dbt-labs/dbt-core/pull/4665))
- Fix misspellings and typos in docstrings ([#4904](https://github.com/dbt-labs/dbt-core/issues/4904), [#4545](https://github.com/dbt-labs/dbt-core/pull/4545))
- Catch more cases to retry package retrieval for deps pointing to the hub. Also start to cache the package requests. ([#4849](https://github.com/dbt-labs/dbt-core/issues/4849), [#4982](https://github.com/dbt-labs/dbt-core/pull/4982))
- Make the warning message for a full event deque more descriptive ([#4962](https://github.com/dbt-labs/dbt-core/issues/4962), [#5011](https://github.com/dbt-labs/dbt-core/pull/5011))
- Fix hard delete snapshot test ([#4916](https://github.com/dbt-labs/dbt-core/issues/4916), [#5020](https://github.com/dbt-labs/dbt-core/pull/5020))
- Restore ability to utilize `updated_at` for check_cols snapshots ([#5076](https://github.com/dbt-labs/dbt-core/issues/5076), [#5077](https://github.com/dbt-labs/dbt-core/pull/5077))
- Use yaml renderer (with target context) for rendering selectors ([#5131](https://github.com/dbt-labs/dbt-core/issues/5131), [#5136](https://github.com/dbt-labs/dbt-core/pull/5136))
- Fix retry logic to return values after initial try ([#5023](https://github.com/dbt-labs/dbt-core/issues/5023), [#5137](https://github.com/dbt-labs/dbt-core/pull/5137))
- Scrub secret env vars from CommandError in exception stacktrace ([#5151](https://github.com/dbt-labs/dbt-core/issues/5151), [#5152](https://github.com/dbt-labs/dbt-core/pull/5152))
### Docs
- Resolve errors related to operations preventing DAG from generating in the docs. Also patch a spark issue to allow search to filter accurately past the missing columns. ([#4578](https://github.com/dbt-labs/dbt-core/issues/4578), [#4763](https://github.com/dbt-labs/dbt-core/pull/4763))
- Fixed capitalization in UI for exposures of `type: ml` ([#4984](https://github.com/dbt-labs/dbt-core/issues/4984), [#4995](https://github.com/dbt-labs/dbt-core/pull/4995))
- List packages and tags in alphabetical order ([#4984](https://github.com/dbt-labs/dbt-core/issues/4984), [#4995](https://github.com/dbt-labs/dbt-core/pull/4995))
- Bump jekyll from 3.8.7 to 3.9.0 ([#4984](https://github.com/dbt-labs/dbt-core/issues/4984), [#4995](https://github.com/dbt-labs/dbt-core/pull/4995))
- Updated docker README to reflect necessity of using BuildKit ([#4990](https://github.com/dbt-labs/dbt-core/issues/4990), [#5018](https://github.com/dbt-labs/dbt-core/pull/5018))
### Under the Hood
- Automate changelog generation with changie ([#4652](https://github.com/dbt-labs/dbt-core/issues/4652), [#4743](https://github.com/dbt-labs/dbt-core/pull/4743))
- add performance regression testing runner without orchestration ([#4021](https://github.com/dbt-labs/dbt-core/issues/4021), [#4602](https://github.com/dbt-labs/dbt-core/pull/4602))
- Fix broken links for changelog generation and tweak GHA to only post a comment once when changelog entry is missing. ([#4848](https://github.com/dbt-labs/dbt-core/issues/4848), [#4857](https://github.com/dbt-labs/dbt-core/pull/4857))
- Add support for Python 3.10 ([#4562](https://github.com/dbt-labs/dbt-core/issues/4562), [#4866](https://github.com/dbt-labs/dbt-core/pull/4866))
- Enable more dialects to snapshot sources with added columns, even those that don't support boolean datatypes ([#4488](https://github.com/dbt-labs/dbt-core/issues/4488), [#4871](https://github.com/dbt-labs/dbt-core/pull/4871))
- Add Graph Compilation and Adapter Cache tracking ([#4625](https://github.com/dbt-labs/dbt-core/issues/4625), [#4912](https://github.com/dbt-labs/dbt-core/pull/4912))
- Testing cleanup ([#3648](https://github.com/dbt-labs/dbt-core/issues/3648), [#4509](https://github.com/dbt-labs/dbt-core/pull/4509))
- Clean up test deprecation warnings ([#3988](https://github.com/dbt-labs/dbt-core/issues/3988), [#4556](https://github.com/dbt-labs/dbt-core/pull/4556))
- Use mashumaro for serialization in event logging ([#4504](https://github.com/dbt-labs/dbt-core/issues/4504), [#4505](https://github.com/dbt-labs/dbt-core/pull/4505))
- Drop support for Python 3.7.0 + 3.7.1 ([#4584](https://github.com/dbt-labs/dbt-core/issues/4584), [#4585](https://github.com/dbt-labs/dbt-core/pull/4585))
- Drop support for Python <3.7.2 ([#4584](https://github.com/dbt-labs/dbt-core/issues/4584), [#4643](https://github.com/dbt-labs/dbt-core/pull/4643))
- Re-format codebase (except tests) using pre-commit hooks ([#3195](https://github.com/dbt-labs/dbt-core/issues/3195), [#4697](https://github.com/dbt-labs/dbt-core/pull/4697))
- Add deps module README ([#4904](https://github.com/dbt-labs/dbt-core/issues/4904), [#4686](https://github.com/dbt-labs/dbt-core/pull/4686))
- Initial conversion of tests to pytest ([#4690](https://github.com/dbt-labs/dbt-core/issues/4690), [#4691](https://github.com/dbt-labs/dbt-core/pull/4691))
- Fix errors in Windows for tests/functions ([#4782](https://github.com/dbt-labs/dbt-core/issues/4782), [#4767](https://github.com/dbt-labs/dbt-core/pull/4767))
- Create a dbt.tests.adapter release when releasing dbt and postgres ([#4812](https://github.com/dbt-labs/dbt-core/issues/4812), [#4948](https://github.com/dbt-labs/dbt-core/pull/4948))
- update docker image to use python 3.10.3 ([#4904](https://github.com/dbt-labs/dbt-core/issues/4904), [#4963](https://github.com/dbt-labs/dbt-core/pull/4963))
- updates black to 22.3.0 which fixes dependency incompatibility when running with precommit. ([#4904](https://github.com/dbt-labs/dbt-core/issues/4904), [#4972](https://github.com/dbt-labs/dbt-core/pull/4972))
- Adds config util for ad-hoc creation of project objs or dicts ([#4808](https://github.com/dbt-labs/dbt-core/issues/4808), [#4981](https://github.com/dbt-labs/dbt-core/pull/4981))
- Remove TableComparison and convert existing calls to use dbt.tests.util ([#4778](https://github.com/dbt-labs/dbt-core/issues/4778), [#4986](https://github.com/dbt-labs/dbt-core/pull/4986))
- Remove unneeded create_schema in snapshot materialization ([#4742](https://github.com/dbt-labs/dbt-core/issues/4742), [#4993](https://github.com/dbt-labs/dbt-core/pull/4993))
- Added .git-blame-ignore-revs file to mask re-formmating commits from git blame ([#5004](https://github.com/dbt-labs/dbt-core/issues/5004), [#5019](https://github.com/dbt-labs/dbt-core/pull/5019))
- Convert version tests to pytest ([#5024](https://github.com/dbt-labs/dbt-core/issues/5024), [#5026](https://github.com/dbt-labs/dbt-core/pull/5026))
- Updating tests and docs to show that we now support Python 3.10 ([#4974](https://github.com/dbt-labs/dbt-core/issues/4974), [#5025](https://github.com/dbt-labs/dbt-core/pull/5025))
- Update --version output and logic ([#4724](https://github.com/dbt-labs/dbt-core/issues/4724), [#5029](https://github.com/dbt-labs/dbt-core/pull/5029))
- ([#5033](https://github.com/dbt-labs/dbt-core/issues/5033), [#5032](https://github.com/dbt-labs/dbt-core/pull/5032))
### Contributors
- [@NiallRees](https://github.com/NiallRees) ([#4447](https://github.com/dbt-labs/dbt-core/pull/4447))
- [@agoblet](https://github.com/agoblet) ([#5000](https://github.com/dbt-labs/dbt-core/pull/5000))
- [@alswang18](https://github.com/alswang18) ([#4644](https://github.com/dbt-labs/dbt-core/pull/4644))
- [@amirkdv](https://github.com/amirkdv) ([#4536](https://github.com/dbt-labs/dbt-core/pull/4536))
- [@anaisvaillant](https://github.com/anaisvaillant) ([#4256](https://github.com/dbt-labs/dbt-core/pull/4256))
- [@b-per](https://github.com/b-per) ([#5001](https://github.com/dbt-labs/dbt-core/pull/5001))
- [@dbeatty10](https://github.com/dbeatty10) ([#5077](https://github.com/dbt-labs/dbt-core/pull/5077))
- [@ehmartens](https://github.com/ehmartens) ([#4701](https://github.com/dbt-labs/dbt-core/pull/4701))
- [@joellabes](https://github.com/joellabes) ([#4744](https://github.com/dbt-labs/dbt-core/pull/4744))
- [@jonstacks](https://github.com/jonstacks) ([#4995](https://github.com/dbt-labs/dbt-core/pull/4995))
- [@kadero](https://github.com/kadero) ([#4513](https://github.com/dbt-labs/dbt-core/pull/4513))
- [@karunpoudel](https://github.com/karunpoudel) ([#4860](https://github.com/dbt-labs/dbt-core/pull/4860), [#4860](https://github.com/dbt-labs/dbt-core/pull/4860))
- [@kazanzhy](https://github.com/kazanzhy) ([#4545](https://github.com/dbt-labs/dbt-core/pull/4545))
- [@matt-winkler](https://github.com/matt-winkler) ([#4256](https://github.com/dbt-labs/dbt-core/pull/4256))
- [@mdesmet](https://github.com/mdesmet) ([#4604](https://github.com/dbt-labs/dbt-core/pull/4604))
- [@pgoslatara](https://github.com/pgoslatara) ([#4995](https://github.com/dbt-labs/dbt-core/pull/4995))
- [@poloaraujo](https://github.com/poloaraujo) ([#4854](https://github.com/dbt-labs/dbt-core/pull/4854))
- [@sungchun12](https://github.com/sungchun12) ([#4256](https://github.com/dbt-labs/dbt-core/pull/4256))
- [@willbowditch](https://github.com/willbowditch) ([#4777](https://github.com/dbt-labs/dbt-core/pull/4777))

14
.changes/1.1.1.md Normal file
View File

@@ -0,0 +1,14 @@
## dbt-core 1.1.1 - June 15, 2022
### Fixes
- Relax minimum supported version of MarkupSafe ([#4745](https://github.com/dbt-labs/dbt-core/issues/4745), [#5039](https://github.com/dbt-labs/dbt-core/pull/5039))
- When parsing 'all_sources' should be a list of unique dirs ([#5120](https://github.com/dbt-labs/dbt-core/issues/5120), [#5176](https://github.com/dbt-labs/dbt-core/pull/5176))
- Remove docs file from manifest when removing doc node ([#4146](https://github.com/dbt-labs/dbt-core/issues/4146), [#5270](https://github.com/dbt-labs/dbt-core/pull/5270))
- Fixing Windows color regression ([#5191](https://github.com/dbt-labs/dbt-core/issues/5191), [#5327](https://github.com/dbt-labs/dbt-core/pull/5327))
### Under the Hood
- Update context readme + clean up context code" ([#4796](https://github.com/dbt-labs/dbt-core/issues/4796), [#5334](https://github.com/dbt-labs/dbt-core/pull/5334))
### Dependencies
- Bumping hologram version ([#5219](https://github.com/dbt-labs/dbt-core/issues/5219), [#5218](https://github.com/dbt-labs/dbt-core/pull/5218))
- Pin networkx to <2.8.4 for v1.1 patches ([#5286](https://github.com/dbt-labs/dbt-core/issues/5286), [#5334](https://github.com/dbt-labs/dbt-core/pull/5334))
### Contributors
- [@adamantike](https://github.com/adamantike) ([#5039](https://github.com/dbt-labs/dbt-core/pull/5039))

5
.changes/1.1.2.md Normal file
View File

@@ -0,0 +1,5 @@
## dbt-core 1.1.2 - July 28, 2022
### Fixes
- Define compatibility for older manifest versions when using state: selection methods ([#5213](https://github.com/dbt-labs/dbt-core/issues/5213), [#5346](https://github.com/dbt-labs/dbt-core/pull/5346))
### Under the Hood
- Add annotation to render_value method reimplemented in #5334 ([#4796](https://github.com/dbt-labs/dbt-core/issues/4796), [#5382](https://github.com/dbt-labs/dbt-core/pull/5382))

3
.changes/1.1.3.md Normal file
View File

@@ -0,0 +1,3 @@
## dbt-core 1.1.3 - January 05, 2023
### Fixes
- Bug when partial parsing with an empty schema file ([#4850](https://github.com/dbt-labs/dbt-core/issues/4850), [#<no value>](https://github.com/dbt-labs/dbt-core/pull/<no value>))

53
.changes/README.md Normal file
View File

@@ -0,0 +1,53 @@
# CHANGELOG Automation
We use [changie](https://changie.dev/) to automate `CHANGELOG` generation. For installation and format/command specifics, see the documentation.
### Quick Tour
- All new change entries get generated under `/.changes/unreleased` as a yaml file
- `header.tpl.md` contains the contents of the entire CHANGELOG file
- `0.0.0.md` contains the contents of the footer for the entire CHANGELOG file. changie looks to be in the process of supporting a footer file the same as it supports a header file. Switch to that when available. For now, the 0.0.0 in the file name forces it to the bottom of the changelog no matter what version we are releasing.
- `.changie.yaml` contains the fields in a change, the format of a single change, as well as the format of the Contributors section for each version.
### Workflow
#### Daily workflow
Almost every code change we make associated with an issue will require a `CHANGELOG` entry. After you have created the PR in GitHub, run `changie new` and follow the command prompts to generate a yaml file with your change details. This only needs to be done once per PR.
The `changie new` command will ensure correct file format and file name. There is a one to one mapping of issues to changes. Multiple issues cannot be lumped into a single entry. If you make a mistake, the yaml file may be directly modified and saved as long as the format is preserved.
Note: If your PR has been cleared by the Core Team as not needing a changelog entry, the `Skip Changelog` label may be put on the PR to bypass the GitHub action that blacks PRs from being merged when they are missing a `CHANGELOG` entry.
#### Prerelease Workflow
These commands batch up changes in `/.changes/unreleased` to be included in this prerelease and move those files to a directory named for the release version. The `--move-dir` will be created if it does not exist and is created in `/.changes`.
```
changie batch <version> --move-dir '<version>' --prerelease 'rc1'
changie merge
```
Example
```
changie batch 1.0.5 --move-dir '1.0.5' --prerelease 'rc1'
changie merge
```
#### Final Release Workflow
These commands batch up changes in `/.changes/unreleased` as well as `/.changes/<version>` to be included in this final release and delete all prereleases. This rolls all prereleases up into a single final release. All `yaml` files in `/unreleased` and `<version>` will be deleted at this point.
```
changie batch <version> --include '<version>' --remove-prereleases
changie merge
```
Example
```
changie batch 1.0.5 --include '1.0.5' --remove-prereleases
changie merge
```
### A Note on Manual Edits & Gotchas
- Changie generates markdown files in the `.changes` directory that are parsed together with the `changie merge` command. Every time `changie merge` is run, it regenerates the entire file. For this reason, any changes made directly to `CHANGELOG.md` will be overwritten on the next run of `changie merge`.
- If changes need to be made to the `CHANGELOG.md`, make the changes to the relevant `<version>.md` file located in the `/.changes` directory. You will then run `changie merge` to regenerate the `CHANGELOG.MD`.
- Do not run `changie batch` again on released versions. Our final release workflow deletes all of the yaml files associated with individual changes. If for some reason modifications to the `CHANGELOG.md` are required after we've generated the final release `CHANGELOG.md`, the modifications need to be done manually to the `<version>.md` file in the `/.changes` directory.
- changie can modify, create and delete files depending on the command you run. This is expected. Be sure to commit everything that has been modified and deleted.

6
.changes/header.tpl.md Executable file
View File

@@ -0,0 +1,6 @@
# dbt Core Changelog
- This file provides a full account of all changes to `dbt-core` and `dbt-postgres`
- Changes are listed under the (pre)release in which they first appear. Subsequent releases include changes from previous releases.
- "Breaking changes" listed under a version may require action from end users or external maintainers when upgrading to that version.
- Do not edit this file directly. This file is auto-generated using [changie](https://github.com/miniscruff/changie). For details on how to document a change, see [the contributing guide](https://github.com/dbt-labs/dbt-core/blob/main/CONTRIBUTING.md#adding-changelog-entry)

61
.changie.yaml Executable file
View File

@@ -0,0 +1,61 @@
changesDir: .changes
unreleasedDir: unreleased
headerPath: header.tpl.md
versionHeaderPath: ""
changelogPath: CHANGELOG.md
versionExt: md
versionFormat: '## dbt-core {{.Version}} - {{.Time.Format "January 02, 2006"}}'
kindFormat: '### {{.Kind}}'
changeFormat: '- {{.Body}} ([#{{.Custom.Issue}}](https://github.com/dbt-labs/dbt-core/issues/{{.Custom.Issue}}), [#{{.Custom.PR}}](https://github.com/dbt-labs/dbt-core/pull/{{.Custom.PR}}))'
kinds:
- label: Breaking Changes
- label: Features
- label: Fixes
- label: Docs
- label: Under the Hood
- label: Dependencies
- label: Security
custom:
- key: Author
label: GitHub Username(s) (separated by a single space if multiple)
type: string
minLength: 3
- key: Issue
label: GitHub Issue Number
type: int
minLength: 4
- key: PR
label: GitHub Pull Request Number
type: int
minLength: 4
footerFormat: |
{{- $contributorDict := dict }}
{{- /* any names added to this list should be all lowercase for later matching purposes */}}
{{- $core_team := list "emmyoop" "nathaniel-may" "gshank" "leahwicz" "chenyulinx" "stu-k" "iknox-fa" "versusfacit" "mcknight-42" "jtcohen6" "dependabot" }}
{{- range $change := .Changes }}
{{- $authorList := splitList " " $change.Custom.Author }}
{{- /* loop through all authors for a PR */}}
{{- range $author := $authorList }}
{{- $authorLower := lower $author }}
{{- /* we only want to include non-core team contributors */}}
{{- if not (has $authorLower $core_team)}}
{{- $pr := $change.Custom.PR }}
{{- /* check if this contributor has other PRs associated with them already */}}
{{- if hasKey $contributorDict $author }}
{{- $prList := get $contributorDict $author }}
{{- $prList = append $prList $pr }}
{{- $contributorDict := set $contributorDict $author $prList }}
{{- else }}
{{- $prList := list $change.Custom.PR }}
{{- $contributorDict := set $contributorDict $author $prList }}
{{- end }}
{{- end}}
{{- end}}
{{- end }}
{{- /* no indentation here for formatting so the final markdown doesn't have unneeded indentations */}}
{{- if $contributorDict}}
### Contributors
{{- range $k,$v := $contributorDict }}
- [@{{$k}}](https://github.com/{{$k}}) ({{ range $index, $element := $v }}{{if $index}}, {{end}}[#{{$element}}](https://github.com/dbt-labs/dbt-core/pull/{{$element}}){{end}})
{{- end }}
{{- end }}

View File

@@ -8,5 +8,5 @@ ignore =
W504
E203 # makes Flake8 work like black
E741
max-line-length = 99
E501 # long line checking is done in black
exclude = test

2
.git-blame-ignore-revs Normal file
View File

@@ -0,0 +1,2 @@
# Reformatting dbt-core via black, flake8, mypy, and assorted pre-commit hooks.
43e3fc22c4eae4d3d901faba05e33c40f1f1dc5a

View File

@@ -18,4 +18,4 @@ resolves #
- [ ] I have signed the [CLA](https://docs.getdbt.com/docs/contributor-license-agreements)
- [ ] I have run this code in development and it appears to resolve the stated issue
- [ ] This PR includes tests, or tests are not required/relevant for this PR
- [ ] I have updated the `CHANGELOG.md` and added information about my change
- [ ] I have added information about my change to be included in the [CHANGELOG](https://github.com/dbt-labs/dbt-core/blob/main/CONTRIBUTING.md#Adding-CHANGELOG-Entry).

View File

@@ -1,95 +0,0 @@
module.exports = ({ context }) => {
const defaultPythonVersion = "3.8";
const supportedPythonVersions = ["3.7", "3.8", "3.9"];
const supportedAdapters = ["postgres"];
// if PR, generate matrix based on files changed and PR labels
if (context.eventName.includes("pull_request")) {
// `changes` is a list of adapter names that have related
// file changes in the PR
// ex: ['postgres', 'snowflake']
const changes = JSON.parse(process.env.CHANGES);
const labels = context.payload.pull_request.labels.map(({ name }) => name);
console.log("labels", labels);
console.log("changes", changes);
const testAllLabel = labels.includes("test all");
const include = [];
for (const adapter of supportedAdapters) {
if (
changes.includes(adapter) ||
testAllLabel ||
labels.includes(`test ${adapter}`)
) {
for (const pythonVersion of supportedPythonVersions) {
if (
pythonVersion === defaultPythonVersion ||
labels.includes(`test python${pythonVersion}`) ||
testAllLabel
) {
// always run tests on ubuntu by default
include.push({
os: "ubuntu-latest",
adapter,
"python-version": pythonVersion,
});
if (labels.includes("test windows") || testAllLabel) {
include.push({
os: "windows-latest",
adapter,
"python-version": pythonVersion,
});
}
if (labels.includes("test macos") || testAllLabel) {
include.push({
os: "macos-latest",
adapter,
"python-version": pythonVersion,
});
}
}
}
}
}
console.log("matrix", { include });
return {
include,
};
}
// if not PR, generate matrix of python version, adapter, and operating
// system to run integration tests on
const include = [];
// run for all adapters and python versions on ubuntu
for (const adapter of supportedAdapters) {
for (const pythonVersion of supportedPythonVersions) {
include.push({
os: 'ubuntu-latest',
adapter: adapter,
"python-version": pythonVersion,
});
}
}
// additionally include runs for all adapters, on macos and windows,
// but only for the default python version
for (const adapter of supportedAdapters) {
for (const operatingSystem of ["windows-latest", "macos-latest"]) {
include.push({
os: operatingSystem,
adapter: adapter,
"python-version": defaultPythonVersion,
});
}
}
console.log("matrix", { include });
return {
include,
};
};

View File

@@ -29,6 +29,6 @@ jobs:
name: Backport
steps:
- name: Backport
uses: tibdex/backport@v1.1.1
uses: dbt-labs/backport@v1.1.1
with:
github_token: ${{ secrets.GITHUB_TOKEN }}

78
.github/workflows/changelog-check.yml vendored Normal file
View File

@@ -0,0 +1,78 @@
# **what?**
# Checks that a file has been committed under the /.changes directory
# as a new CHANGELOG entry. Cannot check for a specific filename as
# it is dynamically generated by change type and timestamp.
# This workflow should not require any secrets since it runs for PRs
# from forked repos.
# By default, secrets are not passed to workflows running from
# a forked repo.
# **why?**
# Ensure code change gets reflected in the CHANGELOG.
# **when?**
# This will run for all PRs going into main and *.latest. It will
# run when they are opened, reopened, when any label is added or removed
# and when new code is pushed to the branch. The action will then get
# skipped if the 'Skip Changelog' label is present is any of the labels.
name: Check Changelog Entry
on:
pull_request:
types: [opened, reopened, labeled, unlabeled, synchronize]
workflow_dispatch:
defaults:
run:
shell: bash
permissions:
contents: read
pull-requests: write
env:
changelog_comment: 'Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see [the contributing guide](https://github.com/dbt-labs/dbt-core/blob/main/CONTRIBUTING.md#adding-changelog-entry).'
jobs:
changelog:
name: changelog
if: "!contains(github.event.pull_request.labels.*.name, 'Skip Changelog')"
runs-on: ubuntu-latest
steps:
- name: Check if changelog file was added
# https://github.com/marketplace/actions/paths-changes-filter
# For each filter, it sets output variable named by the filter to the text:
# 'true' - if any of changed files matches any of filter rules
# 'false' - if none of changed files matches any of filter rules
# also, returns:
# `changes` - JSON array with names of all filters matching any of the changed files
uses: dorny/paths-filter@v2
id: filter
with:
token: ${{ secrets.GITHUB_TOKEN }}
filters: |
changelog:
- added: '.changes/unreleased/**.yaml'
- name: Check if comment already exists
uses: peter-evans/find-comment@v1
id: changelog_comment
with:
issue-number: ${{ github.event.pull_request.number }}
comment-author: 'github-actions[bot]'
body-includes: ${{ env.changelog_comment }}
- name: Create PR comment if changelog entry is missing, required, and does nto exist
if: |
steps.filter.outputs.changelog == 'false' &&
steps.changelog_comment.outputs.comment-body == ''
uses: peter-evans/create-or-update-comment@v1
with:
issue-number: ${{ github.event.pull_request.number }}
body: ${{ env.changelog_comment }}
- name: Fail job if changelog entry is missing and required
if: steps.filter.outputs.changelog == 'false'
uses: actions/github-script@v6
with:
script: core.setFailed('Changelog entry required to merge.')

View File

@@ -1,222 +0,0 @@
# **what?**
# This workflow runs all integration tests for supported OS
# and python versions and core adapters. If triggered by PR,
# the workflow will only run tests for adapters related
# to code changes. Use the `test all` and `test ${adapter}`
# label to run all or additional tests. Use `ok to test`
# label to mark PRs from forked repositories that are safe
# to run integration tests for. Requires secrets to run
# against different warehouses.
# **why?**
# This checks the functionality of dbt from a user's perspective
# and attempts to catch functional regressions.
# **when?**
# This workflow will run on every push to a protected branch
# and when manually triggered. It will also run for all PRs, including
# PRs from forks. The workflow will be skipped until there is a label
# to mark the PR as safe to run.
name: Adapter Integration Tests
on:
# pushes to release branches
push:
branches:
- "main"
- "develop"
- "*.latest"
- "releases/*"
# all PRs, important to note that `pull_request_target` workflows
# will run in the context of the target branch of a PR
pull_request_target:
# manual tigger
workflow_dispatch:
# explicitly turn off permissions for `GITHUB_TOKEN`
permissions: read-all
# will cancel previous workflows triggered by the same event and for the same ref for PRs or same SHA otherwise
concurrency:
group: ${{ github.workflow }}-${{ github.event_name }}-${{ contains(github.event_name, 'pull_request') && github.event.pull_request.head.ref || github.sha }}
cancel-in-progress: true
# sets default shell to bash, for all operating systems
defaults:
run:
shell: bash
jobs:
# generate test metadata about what files changed and the testing matrix to use
test-metadata:
# run if not a PR from a forked repository or has a label to mark as safe to test
if: >-
github.event_name != 'pull_request_target' ||
github.event.pull_request.head.repo.full_name == github.repository ||
contains(github.event.pull_request.labels.*.name, 'ok to test')
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.generate-matrix.outputs.result }}
steps:
- name: Check out the repository (non-PR)
if: github.event_name != 'pull_request_target'
uses: actions/checkout@v2
with:
persist-credentials: false
- name: Check out the repository (PR)
if: github.event_name == 'pull_request_target'
uses: actions/checkout@v2
with:
persist-credentials: false
ref: ${{ github.event.pull_request.head.sha }}
- name: Check if relevant files changed
# https://github.com/marketplace/actions/paths-changes-filter
# For each filter, it sets output variable named by the filter to the text:
# 'true' - if any of changed files matches any of filter rules
# 'false' - if none of changed files matches any of filter rules
# also, returns:
# `changes` - JSON array with names of all filters matching any of the changed files
uses: dorny/paths-filter@v2
id: get-changes
with:
token: ${{ secrets.GITHUB_TOKEN }}
filters: |
postgres:
- 'core/**'
- 'plugins/postgres/**'
- 'dev-requirements.txt'
- name: Generate integration test matrix
id: generate-matrix
uses: actions/github-script@v4
env:
CHANGES: ${{ steps.get-changes.outputs.changes }}
with:
script: |
const script = require('./.github/scripts/integration-test-matrix.js')
const matrix = script({ context })
console.log(matrix)
return matrix
test:
name: ${{ matrix.adapter }} / python ${{ matrix.python-version }} / ${{ matrix.os }}
# run if not a PR from a forked repository or has a label to mark as safe to test
# also checks that the matrix generated is not empty
if: >-
needs.test-metadata.outputs.matrix &&
fromJSON( needs.test-metadata.outputs.matrix ).include[0] &&
(
github.event_name != 'pull_request_target' ||
github.event.pull_request.head.repo.full_name == github.repository ||
contains(github.event.pull_request.labels.*.name, 'ok to test')
)
runs-on: ${{ matrix.os }}
needs: test-metadata
strategy:
fail-fast: false
matrix: ${{ fromJSON(needs.test-metadata.outputs.matrix) }}
env:
TOXENV: integration-${{ matrix.adapter }}
PYTEST_ADDOPTS: "-v --color=yes -n4 --csv integration_results.csv"
DBT_INVOCATION_ENV: github-actions
steps:
- name: Check out the repository
if: github.event_name != 'pull_request_target'
uses: actions/checkout@v2
with:
persist-credentials: false
# explicity checkout the branch for the PR,
# this is necessary for the `pull_request_target` event
- name: Check out the repository (PR)
if: github.event_name == 'pull_request_target'
uses: actions/checkout@v2
with:
persist-credentials: false
ref: ${{ github.event.pull_request.head.sha }}
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Set up postgres (linux)
if: |
matrix.adapter == 'postgres' &&
runner.os == 'Linux'
uses: ./.github/actions/setup-postgres-linux
- name: Set up postgres (macos)
if: |
matrix.adapter == 'postgres' &&
runner.os == 'macOS'
uses: ./.github/actions/setup-postgres-macos
- name: Set up postgres (windows)
if: |
matrix.adapter == 'postgres' &&
runner.os == 'Windows'
uses: ./.github/actions/setup-postgres-windows
- name: Install python dependencies
run: |
pip install --user --upgrade pip
pip install tox
pip --version
tox --version
- name: Run tox (postgres)
if: matrix.adapter == 'postgres'
run: tox
- uses: actions/upload-artifact@v2
if: always()
with:
name: logs
path: ./logs
- name: Get current date
if: always()
id: date
run: echo "::set-output name=date::$(date +'%Y-%m-%dT%H_%M_%S')" #no colons allowed for artifacts
- uses: actions/upload-artifact@v2
if: always()
with:
name: integration_results_${{ matrix.python-version }}_${{ matrix.os }}_${{ matrix.adapter }}-${{ steps.date.outputs.date }}.csv
path: integration_results.csv
require-label-comment:
runs-on: ubuntu-latest
needs: test
permissions:
pull-requests: write
steps:
- name: Needs permission PR comment
if: >-
needs.test.result == 'skipped' &&
github.event_name == 'pull_request_target' &&
github.event.pull_request.head.repo.full_name != github.repository
uses: unsplash/comment-on-pr@master
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
msg: |
"You do not have permissions to run integration tests, @dbt-labs/core "\
"needs to label this PR with `ok to test` in order to run integration tests!"
check_for_duplicate_msg: true

View File

@@ -1,9 +1,8 @@
# **what?**
# Runs code quality checks, unit tests, and verifies python build on
# all code commited to the repository. This workflow should not
# require any secrets since it runs for PRs from forked repos.
# By default, secrets are not passed to workflows running from
# a forked repo.
# Runs code quality checks, unit tests, integration tests and
# verifies python build on all code commited to the repository. This workflow
# should not require any secrets since it runs for PRs from forked repos. By
# default, secrets are not passed to workflows running from a forked repos.
# **why?**
# Ensure code for dbt meets a certain quality standard.
@@ -18,7 +17,6 @@ on:
push:
branches:
- "main"
- "develop"
- "*.latest"
- "releases/*"
pull_request:
@@ -39,26 +37,26 @@ jobs:
code-quality:
name: code-quality
runs-on: ubuntu-latest
runs-on: ubuntu-20.04
steps:
- name: Check out the repository
uses: actions/checkout@v2
with:
persist-credentials: false
- name: Set up Python
uses: actions/setup-python@v2
uses: actions/setup-python@v4.3.0
with:
python-version: '3.8'
- name: Install python dependencies
run: |
pip install --user --upgrade pip
pip install pre-commit
pip install mypy==0.782
pip install -r editable-requirements.txt
pip --version
pip install pre-commit
pre-commit --version
pip install mypy==0.782
mypy --version
pip install -r editable-requirements.txt
dbt --version
- name: Run pre-commit hooks
@@ -67,12 +65,12 @@ jobs:
unit:
name: unit test / python ${{ matrix.python-version }}
runs-on: ubuntu-latest
runs-on: ubuntu-20.04
strategy:
fail-fast: false
matrix:
python-version: [3.7, 3.8, 3.9]
python-version: ['3.7', '3.8', '3.9', '3.10']
env:
TOXENV: "unit"
@@ -81,19 +79,17 @@ jobs:
steps:
- name: Check out the repository
uses: actions/checkout@v2
with:
persist-credentials: false
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v4.3.0
with:
python-version: ${{ matrix.python-version }}
- name: Install python dependencies
run: |
pip install --user --upgrade pip
pip install tox
pip --version
pip install tox
tox --version
- name: Run tox
@@ -110,21 +106,88 @@ jobs:
name: unit_results_${{ matrix.python-version }}-${{ steps.date.outputs.date }}.csv
path: unit_results.csv
build:
name: build packages
integration:
name: integration test / python ${{ matrix.python-version }} / ${{ matrix.os }}
runs-on: ubuntu-latest
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
python-version: ['3.7', '3.8', '3.9', '3.10']
os: [ubuntu-20.04]
include:
- python-version: 3.8
os: windows-latest
- python-version: 3.8
os: macos-latest
env:
TOXENV: integration
PYTEST_ADDOPTS: "-v --color=yes -n4 --csv integration_results.csv"
DBT_INVOCATION_ENV: github-actions
steps:
- name: Check out the repository
uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4.3.0
with:
persist-credentials: false
python-version: ${{ matrix.python-version }}
- name: Set up postgres (linux)
if: runner.os == 'Linux'
uses: ./.github/actions/setup-postgres-linux
- name: Set up postgres (macos)
if: runner.os == 'macOS'
uses: ./.github/actions/setup-postgres-macos
- name: Set up postgres (windows)
if: runner.os == 'Windows'
uses: ./.github/actions/setup-postgres-windows
- name: Install python tools
run: |
# pip install --user --upgrade pip
# pip --version
pip install tox
tox --version
- name: Run tests
run: tox
- name: Get current date
if: always()
id: date
run: echo "::set-output name=date::$(date +'%Y_%m_%dT%H_%M_%S')" #no colons allowed for artifacts
- uses: actions/upload-artifact@v2
if: always()
with:
name: logs_${{ matrix.python-version }}_${{ matrix.os }}_${{ steps.date.outputs.date }}
path: ./logs
- uses: actions/upload-artifact@v2
if: always()
with:
name: integration_results_${{ matrix.python-version }}_${{ matrix.os }}_${{ steps.date.outputs.date }}.csv
path: integration_results.csv
build:
name: build packages
runs-on: ubuntu-20.04
steps:
- name: Check out the repository
uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
uses: actions/setup-python@v4.3.0
with:
python-version: 3.8
python-version: '3.8'
- name: Install python dependencies
run: |
@@ -146,44 +209,6 @@ jobs:
run: |
check-wheel-contents dist/*.whl --ignore W007,W008
- uses: actions/upload-artifact@v2
with:
name: dist
path: dist/
test-build:
name: verify packages / python ${{ matrix.python-version }} / ${{ matrix.os }}
needs: build
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: [3.7, 3.8, 3.9]
steps:
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install python dependencies
run: |
pip install --user --upgrade pip
pip install --upgrade wheel
pip --version
- uses: actions/download-artifact@v2
with:
name: dist
path: dist/
- name: Show distributions
run: ls -lh dist/
- name: Install wheel distributions
run: |
find ./dist/*.whl -maxdepth 1 -type f | xargs pip install --force-reinstall --find-links=dist/

View File

@@ -1,176 +0,0 @@
name: Performance Regression Tests
# Schedule triggers
on:
# runs twice a day at 10:05am and 10:05pm
schedule:
- cron: "5 10,22 * * *"
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
jobs:
# checks fmt of runner code
# purposefully not a dependency of any other job
# will block merging, but not prevent developing
fmt:
name: Cargo fmt
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- run: rustup component add rustfmt
- uses: actions-rs/cargo@v1
with:
command: fmt
args: --manifest-path performance/runner/Cargo.toml --all -- --check
# runs any tests associated with the runner
# these tests make sure the runner logic is correct
test-runner:
name: Test Runner
runs-on: ubuntu-latest
env:
# turns errors into warnings
RUSTFLAGS: "-D warnings"
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- uses: actions-rs/cargo@v1
with:
command: test
args: --manifest-path performance/runner/Cargo.toml
# build an optimized binary to be used as the runner in later steps
build-runner:
needs: [test-runner]
name: Build Runner
runs-on: ubuntu-latest
env:
RUSTFLAGS: "-D warnings"
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- uses: actions-rs/cargo@v1
with:
command: build
args: --release --manifest-path performance/runner/Cargo.toml
- uses: actions/upload-artifact@v2
with:
name: runner
path: performance/runner/target/release/runner
# run the performance measurements on the current or default branch
measure-dev:
needs: [build-runner]
name: Measure Dev Branch
runs-on: ubuntu-latest
steps:
- name: checkout dev
uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v2.2.2
with:
python-version: "3.8"
- name: install dbt
run: pip install -r dev-requirements.txt -r editable-requirements.txt
- name: install hyperfine
run: wget https://github.com/sharkdp/hyperfine/releases/download/v1.11.0/hyperfine_1.11.0_amd64.deb && sudo dpkg -i hyperfine_1.11.0_amd64.deb
- uses: actions/download-artifact@v2
with:
name: runner
- name: change permissions
run: chmod +x ./runner
- name: run
run: ./runner measure -b dev -p ${{ github.workspace }}/performance/projects/
- uses: actions/upload-artifact@v2
with:
name: dev-results
path: performance/results/
# run the performance measurements on the release branch which we use
# as a performance baseline. This part takes by far the longest, so
# we do everything we can first so the job fails fast.
# -----
# we need to checkout dbt twice in this job: once for the baseline dbt
# version, and once to get the latest regression testing projects,
# metrics, and runner code from the develop or current branch so that
# the calculations match for both versions of dbt we are comparing.
measure-baseline:
needs: [build-runner]
name: Measure Baseline Branch
runs-on: ubuntu-latest
steps:
- name: checkout latest
uses: actions/checkout@v2
with:
ref: "0.20.latest"
- name: Setup Python
uses: actions/setup-python@v2.2.2
with:
python-version: "3.8"
- name: move repo up a level
run: mkdir ${{ github.workspace }}/../baseline/ && cp -r ${{ github.workspace }} ${{ github.workspace }}/../baseline
- name: "[debug] ls new dbt location"
run: ls ${{ github.workspace }}/../baseline/dbt/
# installation creates egg-links so we have to preserve source
- name: install dbt from new location
run: cd ${{ github.workspace }}/../baseline/dbt/ && pip install -r dev-requirements.txt -r editable-requirements.txt
# checkout the current branch to get all the target projects
# this deletes the old checked out code which is why we had to copy before
- name: checkout dev
uses: actions/checkout@v2
- name: install hyperfine
run: wget https://github.com/sharkdp/hyperfine/releases/download/v1.11.0/hyperfine_1.11.0_amd64.deb && sudo dpkg -i hyperfine_1.11.0_amd64.deb
- uses: actions/download-artifact@v2
with:
name: runner
- name: change permissions
run: chmod +x ./runner
- name: run runner
run: ./runner measure -b baseline -p ${{ github.workspace }}/performance/projects/
- uses: actions/upload-artifact@v2
with:
name: baseline-results
path: performance/results/
# detect regressions on the output generated from measuring
# the two branches. Exits with non-zero code if a regression is detected.
calculate-regressions:
needs: [measure-dev, measure-baseline]
name: Compare Results
runs-on: ubuntu-latest
steps:
- uses: actions/download-artifact@v2
with:
name: dev-results
- uses: actions/download-artifact@v2
with:
name: baseline-results
- name: "[debug] ls result files"
run: ls
- uses: actions/download-artifact@v2
with:
name: runner
- name: change permissions
run: chmod +x ./runner
- name: make results directory
run: mkdir ./final-output/
- name: run calculation
run: ./runner calculate -r ./ -o ./final-output/
# always attempt to upload the results even if there were regressions found
- uses: actions/upload-artifact@v2
if: ${{ always() }}
with:
name: final-calculations
path: ./final-output/*

View File

@@ -12,6 +12,9 @@
name: Docker release
permissions:
packages: write
on:
workflow_dispatch:
inputs:

View File

@@ -6,7 +6,6 @@
# version of our structured logging and add new documentation to
# communicate these changes.
name: Structured Logging Schema Check
on:
push:
@@ -23,16 +22,15 @@ jobs:
# run the performance measurements on the current or default branch
test-schema:
name: Test Log Schema
runs-on: ubuntu-latest
runs-on: ubuntu-20.04
env:
# turns warnings into errors
RUSTFLAGS: "-D warnings"
# points tests to the log file
LOG_DIR: "/home/runner/work/dbt-core/dbt-core/logs"
# tells integration tests to output into json format
DBT_LOG_FORMAT: 'json'
DBT_LOG_FORMAT: "json"
steps:
- name: checkout dev
uses: actions/checkout@v2
with:
@@ -49,8 +47,12 @@ jobs:
toolchain: stable
override: true
- name: install dbt
run: pip install -r dev-requirements.txt -r editable-requirements.txt
- name: Install python dependencies
run: |
pip install --user --upgrade pip
pip --version
pip install tox
tox --version
- name: Set up postgres
uses: ./.github/actions/setup-postgres-linux
@@ -61,7 +63,7 @@ jobs:
# integration tests generate a ton of logs in different files. the next step will find them all.
# we actually care if these pass, because the normal test run doesn't usually include many json log outputs
- name: Run integration tests
run: tox -e py38-postgres -- -nauto
run: tox -e integration -- -nauto
# apply our schema tests to every log event from the previous step
# skips any output that isn't valid json

View File

@@ -1,18 +1,15 @@
# **what?**
# This workflow will take a version number and a dry run flag. With that
# This workflow will take the new version number to bump to. With that
# it will run versionbump to update the version number everywhere in the
# code base and then generate an update Docker requirements file. If this
# is a dry run, a draft PR will open with the changes. If this isn't a dry
# run, the changes will be committed to the branch this is run on.
# code base and then run changie to create the corresponding changelog.
# A PR will be created with the changes that can be reviewed before committing.
# **why?**
# This is to aid in releasing dbt and making sure we have updated
# the versions and Docker requirements in all places.
# the version in all places and generated the changelog.
# **when?**
# This is triggered either manually OR
# from the repository_dispatch event "version-bump" which is sent from
# the dbt-release repo Action
# This is triggered manually
name: Version Bump
@@ -20,35 +17,21 @@ on:
workflow_dispatch:
inputs:
version_number:
description: 'The version number to bump to'
description: 'The version number to bump to (ex. 1.2.0, 1.3.0b1)'
required: true
is_dry_run:
description: 'Creates a draft PR to allow testing instead of committing to a branch'
required: true
default: 'true'
repository_dispatch:
types: [version-bump]
jobs:
bump:
runs-on: ubuntu-latest
steps:
- name: "[DEBUG] Print Variables"
run: |
echo "all variables defined as inputs"
echo The version_number: ${{ github.event.inputs.version_number }}
- name: Check out the repository
uses: actions/checkout@v2
- name: Set version and dry run values
id: variables
env:
VERSION_NUMBER: "${{ github.event.client_payload.version_number == '' && github.event.inputs.version_number || github.event.client_payload.version_number }}"
IS_DRY_RUN: "${{ github.event.client_payload.is_dry_run == '' && github.event.inputs.is_dry_run || github.event.client_payload.is_dry_run }}"
run: |
echo Repository dispatch event version: ${{ github.event.client_payload.version_number }}
echo Repository dispatch event dry run: ${{ github.event.client_payload.is_dry_run }}
echo Workflow dispatch event version: ${{ github.event.inputs.version_number }}
echo Workflow dispatch event dry run: ${{ github.event.inputs.is_dry_run }}
echo ::set-output name=VERSION_NUMBER::$VERSION_NUMBER
echo ::set-output name=IS_DRY_RUN::$IS_DRY_RUN
- uses: actions/setup-python@v2
with:
python-version: "3.8"
@@ -59,51 +42,80 @@ jobs:
source env/bin/activate
pip install --upgrade pip
- name: Create PR branch
if: ${{ steps.variables.outputs.IS_DRY_RUN == 'true' }}
- name: Add Homebrew to PATH
run: |
git checkout -b bumping-version/${{steps.variables.outputs.VERSION_NUMBER}}_$GITHUB_RUN_ID
git push origin bumping-version/${{steps.variables.outputs.VERSION_NUMBER}}_$GITHUB_RUN_ID
git branch --set-upstream-to=origin/bumping-version/${{steps.variables.outputs.VERSION_NUMBER}}_$GITHUB_RUN_ID bumping-version/${{steps.variables.outputs.VERSION_NUMBER}}_$GITHUB_RUN_ID
echo "/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin" >> $GITHUB_PATH
# - name: Generate Docker requirements
# run: |
# source env/bin/activate
# pip install -r requirements.txt
# pip freeze -l > docker/requirements/requirements.txt
# git status
- name: Install Homebrew packages
run: |
brew install pre-commit
brew tap miniscruff/changie https://github.com/miniscruff/changie
brew install changie
- name: Audit Version and Parse Into Parts
id: semver
uses: dbt-labs/actions/parse-semver@v1
with:
version: ${{ github.event.inputs.version_number }}
- name: Set branch value
id: variables
run: |
echo "::set-output name=BRANCH_NAME::prep-release/${{ github.event.inputs.version_number }}_$GITHUB_RUN_ID"
- name: Create PR branch
run: |
git checkout -b ${{ steps.variables.outputs.BRANCH_NAME }}
git push origin ${{ steps.variables.outputs.BRANCH_NAME }}
git branch --set-upstream-to=origin/${{ steps.variables.outputs.BRANCH_NAME }} ${{ steps.variables.outputs.BRANCH_NAME }}
- name: Bump version
run: |
source env/bin/activate
pip install -r dev-requirements.txt
env/bin/bumpversion --allow-dirty --new-version ${{steps.variables.outputs.VERSION_NUMBER}} major
env/bin/bumpversion --allow-dirty --new-version ${{ github.event.inputs.version_number }} major
git status
- name: Commit version bump directly
uses: EndBug/add-and-commit@v7
if: ${{ steps.variables.outputs.IS_DRY_RUN == 'false' }}
with:
author_name: 'Github Build Bot'
author_email: 'buildbot@fishtownanalytics.com'
message: 'Bumping version to ${{steps.variables.outputs.VERSION_NUMBER}}'
- name: Run changie
run: |
if [[ ${{ steps.semver.outputs.is-pre-release }} -eq 1 ]]
then
changie batch ${{ steps.semver.outputs.base-version }} --move-dir '${{ steps.semver.outputs.base-version }}' --prerelease '${{ steps.semver.outputs.pre-release }}'
else
changie batch ${{ steps.semver.outputs.base-version }} --include '${{ steps.semver.outputs.base-version }}' --remove-prereleases
fi
changie merge
git status
# this step will fail on whitespace errors but also correct them
- name: Remove trailing whitespace
continue-on-error: true
run: |
pre-commit run trailing-whitespace --files .bumpversion.cfg CHANGELOG.md .changes/*
git status
# this step will fail on newline errors but also correct them
- name: Removing extra newlines
continue-on-error: true
run: |
pre-commit run end-of-file-fixer --files .bumpversion.cfg CHANGELOG.md .changes/*
git status
- name: Commit version bump to branch
uses: EndBug/add-and-commit@v7
if: ${{ steps.variables.outputs.IS_DRY_RUN == 'true' }}
with:
author_name: 'Github Build Bot'
author_email: 'buildbot@fishtownanalytics.com'
message: 'Bumping version to ${{steps.variables.outputs.VERSION_NUMBER}}'
branch: 'bumping-version/${{steps.variables.outputs.VERSION_NUMBER}}_${{GITHUB.RUN_ID}}'
push: 'origin origin/bumping-version/${{steps.variables.outputs.VERSION_NUMBER}}_${{GITHUB.RUN_ID}}'
message: 'Bumping version to ${{ github.event.inputs.version_number }} and generate CHANGELOG'
branch: '${{ steps.variables.outputs.BRANCH_NAME }}'
push: 'origin origin/${{ steps.variables.outputs.BRANCH_NAME }}'
- name: Create Pull Request
uses: peter-evans/create-pull-request@v3
if: ${{ steps.variables.outputs.IS_DRY_RUN == 'true' }}
with:
author: 'Github Build Bot <buildbot@fishtownanalytics.com>'
draft: true
base: ${{github.ref}}
title: 'Bumping version to ${{steps.variables.outputs.VERSION_NUMBER}}'
branch: 'bumping-version/${{steps.variables.outputs.VERSION_NUMBER}}_${{GITHUB.RUN_ID}}'
title: 'Bumping version to ${{ github.event.inputs.version_number }} and generate changelog'
branch: '${{ steps.variables.outputs.BRANCH_NAME }}'
labels: |
Skip Changelog

View File

@@ -21,7 +21,7 @@ repos:
- "markdown"
- id: check-case-conflict
- repo: https://github.com/psf/black
rev: 21.12b0
rev: 22.3.0
hooks:
- id: black
args:
@@ -35,7 +35,7 @@ repos:
- "--target-version=py38"
- "--check"
- "--diff"
- repo: https://gitlab.com/pycqa/flake8
- repo: https://github.com/pycqa/flake8
rev: 4.0.1
hooks:
- id: flake8

3655
CHANGELOG.md Normal file → Executable file

File diff suppressed because it is too large Load Diff

View File

@@ -103,7 +103,7 @@ There are some tools that will be helpful to you in developing locally. While th
A short list of tools used in `dbt-core` testing that will be helpful to your understanding:
- [`tox`](https://tox.readthedocs.io/en/latest/) to manage virtualenvs across python versions. We currently target the latest patch releases for Python 3.7, Python 3.8, and Python 3.9
- [`tox`](https://tox.readthedocs.io/en/latest/) to manage virtualenvs across python versions. We currently target the latest patch releases for Python 3.7, Python 3.8, Python 3.9, and Python 3.10
- [`pytest`](https://docs.pytest.org/en/latest/) to discover/run tests
- [`make`](https://users.cs.duke.edu/~ola/courses/programming/Makefiles/Makefiles.html) - but don't worry too much, nobody _really_ understands how make works and our Makefile is super simple
- [`flake8`](https://flake8.pycqa.org/en/latest/) for code linting
@@ -203,7 +203,7 @@ suites.
#### `tox`
[`tox`](https://tox.readthedocs.io/en/latest/) takes care of managing virtualenvs and install dependencies in order to run tests. You can also run tests in parallel, for example, you can run unit tests for Python 3.7, Python 3.8, and Python 3.9 checks in parallel with `tox -p`. Also, you can run unit tests for specific python versions with `tox -e py37`. The configuration for these tests in located in `tox.ini`.
[`tox`](https://tox.readthedocs.io/en/latest/) takes care of managing virtualenvs and install dependencies in order to run tests. You can also run tests in parallel, for example, you can run unit tests for Python 3.7, Python 3.8, Python 3.9, and Python 3.10 checks in parallel with `tox -p`. Also, you can run unit tests for specific python versions with `tox -e py37`. The configuration for these tests in located in `tox.ini`.
#### `pytest`
@@ -219,6 +219,15 @@ python -m pytest test/unit/test_graph.py::GraphTest::test__dependency_list
```
> [Here](https://docs.pytest.org/en/reorganize-docs/new-docs/user/commandlineuseful.html)
> is a list of useful command-line options for `pytest` to use while developing.
## Adding CHANGELOG Entry
We use [changie](https://changie.dev) to generate `CHANGELOG` entries. Do not edit the `CHANGELOG.md` directly. Your modifications will be lost.
Follow the steps to [install `changie`](https://changie.dev/guide/installation/) for your system.
Once changie is installed and your PR is created, simply run `changie new` and changie will walk you through the process of creating a changelog entry. Commit the file that's created and your changelog entry is complete!
## Submitting a Pull Request
dbt Labs provides a CI environment to test changes to specific adapters, and periodic maintenance checks of `dbt-core` through Github Actions. For example, if you submit a pull request to the `dbt-redshift` repo, GitHub will trigger automated code checks and tests against Redshift.

View File

@@ -46,6 +46,9 @@ RUN apt-get update \
python3.9 \
python3.9-dev \
python3.9-venv \
python3.10 \
python3.10-dev \
python3.10-venv \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

View File

@@ -41,26 +41,20 @@ unit: .env ## Runs unit tests with py38.
.PHONY: test
test: .env ## Runs unit tests with py38 and code checks against staged changes.
@\
$(DOCKER_CMD) tox -p -e py38; \
$(DOCKER_CMD) tox -e py38; \
$(DOCKER_CMD) pre-commit run black-check --hook-stage manual | grep -v "INFO"; \
$(DOCKER_CMD) pre-commit run flake8-check --hook-stage manual | grep -v "INFO"; \
$(DOCKER_CMD) pre-commit run mypy-check --hook-stage manual | grep -v "INFO"
.PHONY: integration
integration: .env integration-postgres ## Alias for integration-postgres.
integration: .env ## Runs postgres integration tests with py38.
@\
$(DOCKER_CMD) tox -e py38-integration -- -nauto
.PHONY: integration-fail-fast
integration-fail-fast: .env integration-postgres-fail-fast ## Alias for integration-postgres-fail-fast.
.PHONY: integration-postgres
integration-postgres: .env ## Runs postgres integration tests with py38.
integration-fail-fast: .env ## Runs postgres integration tests with py38 in "fail fast" mode.
@\
$(DOCKER_CMD) tox -e py38-postgres -- -nauto
.PHONY: integration-postgres-fail-fast
integration-postgres-fail-fast: .env ## Runs postgres integration tests with py38 in "fail fast" mode.
@\
$(DOCKER_CMD) tox -e py38-postgres -- -x -nauto
$(DOCKER_CMD) tox -e py38-integration -- -x -nauto
.PHONY: setup-db
setup-db: ## Setup Postgres database with docker-compose for system testing.

View File

@@ -3,10 +3,7 @@
</p>
<p align="center">
<a href="https://github.com/dbt-labs/dbt-core/actions/workflows/main.yml">
<img src="https://github.com/dbt-labs/dbt-core/actions/workflows/main.yml/badge.svg?event=push" alt="Unit Tests Badge"/>
</a>
<a href="https://github.com/dbt-labs/dbt-core/actions/workflows/integration.yml">
<img src="https://github.com/dbt-labs/dbt-core/actions/workflows/integration.yml/badge.svg?event=push" alt="Integration Tests Badge"/>
<img src="https://github.com/dbt-labs/dbt-core/actions/workflows/main.yml/badge.svg?event=push" alt="CI Badge"/>
</a>
</p>

View File

@@ -3,10 +3,7 @@
</p>
<p align="center">
<a href="https://github.com/dbt-labs/dbt-core/actions/workflows/main.yml">
<img src="https://github.com/dbt-labs/dbt-core/actions/workflows/main.yml/badge.svg?event=push" alt="Unit Tests Badge"/>
</a>
<a href="https://github.com/dbt-labs/dbt-core/actions/workflows/integration.yml">
<img src="https://github.com/dbt-labs/dbt-core/actions/workflows/integration.yml/badge.svg?event=push" alt="Integration Tests Badge"/>
<img src="https://github.com/dbt-labs/dbt-core/actions/workflows/main.yml/badge.svg?event=push" alt="CI Badge"/>
</a>
</p>

View File

@@ -4,7 +4,7 @@ import os
# multiprocessing.RLock is a function returning this type
from multiprocessing.synchronize import RLock
from threading import get_ident
from typing import Dict, Tuple, Hashable, Optional, ContextManager, List, Union
from typing import Dict, Tuple, Hashable, Optional, ContextManager, List
import agate
@@ -281,15 +281,15 @@ class BaseConnectionManager(metaclass=abc.ABCMeta):
@abc.abstractmethod
def execute(
self, sql: str, auto_begin: bool = False, fetch: bool = False
) -> Tuple[Union[str, AdapterResponse], agate.Table]:
) -> Tuple[AdapterResponse, agate.Table]:
"""Execute the given SQL.
:param str sql: The sql to execute.
:param bool auto_begin: If set, and dbt is not currently inside a
transaction, automatically begin one.
:param bool fetch: If set, fetch results.
:return: A tuple of the status and the results (empty if fetch=False).
:rtype: Tuple[Union[str, AdapterResponse], agate.Table]
:return: A tuple of the query status and results (empty if fetch=False).
:rtype: Tuple[AdapterResponse, agate.Table]
"""
raise dbt.exceptions.NotImplementedException(
"`execute` is not implemented for this adapter!"

View File

@@ -221,7 +221,7 @@ class BaseAdapter(metaclass=AdapterMeta):
@available.parse(lambda *a, **k: ("", empty_table()))
def execute(
self, sql: str, auto_begin: bool = False, fetch: bool = False
) -> Tuple[Union[str, AdapterResponse], agate.Table]:
) -> Tuple[AdapterResponse, agate.Table]:
"""Execute the given SQL. This is a thin wrapper around
ConnectionManager.execute.
@@ -229,8 +229,8 @@ class BaseAdapter(metaclass=AdapterMeta):
:param bool auto_begin: If set, and dbt is not currently inside a
transaction, automatically begin one.
:param bool fetch: If set, fetch results.
:return: A tuple of the status and the results (empty if fetch=False).
:rtype: Tuple[Union[str, AdapterResponse], agate.Table]
:return: A tuple of the query status and results (empty if fetch=False).
:rtype: Tuple[AdapterResponse, agate.Table]
"""
return self.connections.execute(sql=sql, auto_begin=auto_begin, fetch=fetch)
@@ -270,12 +270,15 @@ class BaseAdapter(metaclass=AdapterMeta):
"""
return self._macro_manifest_lazy
def load_macro_manifest(self) -> MacroManifest:
def load_macro_manifest(self, base_macros_only=False) -> MacroManifest:
# base_macros_only is for the test framework
if self._macro_manifest_lazy is None:
# avoid a circular import
from dbt.parser.manifest import ManifestLoader
manifest = ManifestLoader.load_macros(self.config, self.connections.set_query_header)
manifest = ManifestLoader.load_macros(
self.config, self.connections.set_query_header, base_macros_only=base_macros_only
)
# TODO CT-211
self._macro_manifest_lazy = manifest # type: ignore[assignment]
# TODO CT-211
@@ -337,11 +340,14 @@ class BaseAdapter(metaclass=AdapterMeta):
# databases
return info_schema_name_map
def _relations_cache_for_schemas(self, manifest: Manifest) -> None:
def _relations_cache_for_schemas(
self, manifest: Manifest, cache_schemas: Set[BaseRelation] = None
) -> None:
"""Populate the relations cache for the given schemas. Returns an
iterable of the schemas populated, as strings.
"""
cache_schemas = self._get_cache_schemas(manifest)
if not cache_schemas:
cache_schemas = self._get_cache_schemas(manifest)
with executor(self.config) as tpe:
futures: List[Future[List[BaseRelation]]] = []
for cache_schema in cache_schemas:
@@ -367,14 +373,16 @@ class BaseAdapter(metaclass=AdapterMeta):
cache_update.add((relation.database, relation.schema))
self.cache.update_schemas(cache_update)
def set_relations_cache(self, manifest: Manifest, clear: bool = False) -> None:
def set_relations_cache(
self, manifest: Manifest, clear: bool = False, required_schemas: Set[BaseRelation] = None
) -> None:
"""Run a query that gets a populated cache of the relations in the
database and set the cache on this adapter.
"""
with self.cache.lock:
if clear:
self.cache.clear()
self._relations_cache_for_schemas(manifest)
self._relations_cache_for_schemas(manifest, required_schemas)
@available
def cache_added(self, relation: Optional[BaseRelation]) -> str:

View File

@@ -177,6 +177,10 @@ def get_adapter(config: AdapterRequiredConfig):
return FACTORY.lookup_adapter(config.credentials.type)
def get_adapter_by_type(adapter_type):
return FACTORY.lookup_adapter(adapter_type)
def reset_adapters():
"""Clear the adapters. This is useful for tests, which change configs."""
FACTORY.reset_adapters()

View File

@@ -155,7 +155,7 @@ class AdapterProtocol( # type: ignore[misc]
def execute(
self, sql: str, auto_begin: bool = False, fetch: bool = False
) -> Tuple[Union[str, AdapterResponse], agate.Table]:
) -> Tuple[AdapterResponse, agate.Table]:
...
def get_compiler(self) -> Compiler_T:

View File

@@ -1,6 +1,6 @@
import abc
import time
from typing import List, Optional, Tuple, Any, Iterable, Dict, Union
from typing import List, Optional, Tuple, Any, Iterable, Dict
import agate
@@ -78,7 +78,7 @@ class SQLConnectionManager(BaseConnectionManager):
return connection, cursor
@abc.abstractclassmethod
def get_response(cls, cursor: Any) -> Union[AdapterResponse, str]:
def get_response(cls, cursor: Any) -> AdapterResponse:
"""Get the status of the cursor."""
raise dbt.exceptions.NotImplementedException(
"`get_response` is not implemented for this adapter!"
@@ -117,7 +117,7 @@ class SQLConnectionManager(BaseConnectionManager):
def execute(
self, sql: str, auto_begin: bool = False, fetch: bool = False
) -> Tuple[Union[AdapterResponse, str], agate.Table]:
) -> Tuple[AdapterResponse, agate.Table]:
sql = self._add_query_comment(sql)
_, cursor = self.add_query(sql, auto_begin)
response = self.get_response(cursor)

View File

@@ -27,7 +27,7 @@ ALTER_COLUMN_TYPE_MACRO_NAME = "alter_column_type"
class SQLAdapter(BaseAdapter):
"""The default adapter with the common agate conversions and some SQL
methods implemented. This adapter has a different much shorter list of
methods was implemented. This adapter has a different much shorter list of
methods to implement, but some more macros that must be implemented.
To implement a macro, implement "${adapter_type}__${macro_name}". in the
@@ -218,3 +218,25 @@ class SQLAdapter(BaseAdapter):
kwargs = {"information_schema": information_schema, "schema": schema}
results = self.execute_macro(CHECK_SCHEMA_EXISTS_MACRO_NAME, kwargs=kwargs)
return results[0][0] > 0
# This is for use in the test suite
def run_sql_for_tests(self, sql, fetch, conn):
cursor = conn.handle.cursor()
try:
cursor.execute(sql)
if hasattr(conn.handle, "commit"):
conn.handle.commit()
if fetch == "one":
return cursor.fetchone()
elif fetch == "all":
return cursor.fetchall()
else:
return
except BaseException as e:
if conn.handle and not getattr(conn.handle, "closed", True):
conn.handle.rollback()
print(sql)
print(e)
raise
finally:
conn.transaction_open = False

View File

@@ -80,7 +80,7 @@ def table_from_rows(
def table_from_data(data, column_names: Iterable[str]) -> agate.Table:
"Convert list of dictionaries into an Agate table"
"Convert a list of dictionaries into an Agate table"
# The agate table is generated from a list of dicts, so the column order
# from `data` is not preserved. We can use `select` to reorder the columns

View File

@@ -28,7 +28,7 @@ def _is_commit(revision: str) -> bool:
def _raise_git_cloning_error(repo, revision, error):
stderr = error.stderr.decode("utf-8").strip()
stderr = error.stderr.strip()
if "usage: git" in stderr:
stderr = stderr.split("\nusage: git")[0]
if re.match("fatal: destination path '(.+)' already exists", stderr):
@@ -115,8 +115,8 @@ def checkout(cwd, repo, revision=None):
try:
return _checkout(cwd, repo, revision)
except CommandResultError as exc:
stderr = exc.stderr.decode("utf-8").strip()
bad_package_spec(repo, revision, stderr)
stderr = exc.stderr.strip()
bad_package_spec(repo, revision, stderr)
def get_current_sha(cwd):
@@ -142,7 +142,7 @@ def clone_and_checkout(
subdirectory=subdirectory,
)
except CommandResultError as exc:
err = exc.stderr.decode("utf-8")
err = exc.stderr
exists = re.match("fatal: destination path '(.+)' already exists", err)
if not exists:
raise_git_cloning_problem(repo)

View File

@@ -103,7 +103,7 @@ class NativeSandboxEnvironment(MacroFuzzEnvironment):
class TextMarker(str):
"""A special native-env marker that indicates that a value is text and is
"""A special native-env marker that indicates a value is text and is
not to be evaluated. Use this to prevent your numbery-strings from becoming
numbers!
"""
@@ -580,7 +580,7 @@ def extract_toplevel_blocks(
allowed_blocks: Optional[Set[str]] = None,
collect_raw_data: bool = True,
) -> List[Union[BlockData, BlockTag]]:
"""Extract the top level blocks with matching block types from a jinja
"""Extract the top-level blocks with matching block types from a jinja
file, with some special handling for block nesting.
:param data: The data to extract blocks from.

View File

@@ -1,7 +1,17 @@
import functools
from typing import Any, Dict, List
import requests
from dbt.events.functions import fire_event
from dbt.events.types import RegistryProgressMakingGETRequest, RegistryProgressGETResponse
from dbt.events.types import (
RegistryProgressMakingGETRequest,
RegistryProgressGETResponse,
RegistryIndexProgressMakingGETRequest,
RegistryIndexProgressGETResponse,
RegistryResponseUnexpectedType,
RegistryResponseMissingTopKeys,
RegistryResponseMissingNestedKeys,
RegistryResponseExtraNestedKeys,
)
from dbt.utils import memoized, _connection_exception_retry as connection_exception_retry
from dbt import deprecations
import os
@@ -12,55 +22,77 @@ else:
DEFAULT_REGISTRY_BASE_URL = "https://hub.getdbt.com/"
def _get_url(url, registry_base_url=None):
def _get_url(name, registry_base_url=None):
if registry_base_url is None:
registry_base_url = DEFAULT_REGISTRY_BASE_URL
url = "api/v1/{}.json".format(name)
return "{}{}".format(registry_base_url, url)
def _get_with_retries(path, registry_base_url=None):
get_fn = functools.partial(_get, path, registry_base_url)
def _get_with_retries(package_name, registry_base_url=None):
get_fn = functools.partial(_get, package_name, registry_base_url)
return connection_exception_retry(get_fn, 5)
def _get(path, registry_base_url=None):
url = _get_url(path, registry_base_url)
def _get(package_name, registry_base_url=None):
url = _get_url(package_name, registry_base_url)
fire_event(RegistryProgressMakingGETRequest(url=url))
# all exceptions from requests get caught in the retry logic so no need to wrap this here
resp = requests.get(url, timeout=30)
fire_event(RegistryProgressGETResponse(url=url, resp_code=resp.status_code))
resp.raise_for_status()
# It is unexpected for the content of the response to be None so if it is, raising this error
# will cause this function to retry (if called within _get_with_retries) and hopefully get
# a response. This seems to happen when there's an issue with the Hub.
# The response should always be a dictionary. Anything else is unexpected, raise error.
# Raising this error will cause this function to retry (if called within _get_with_retries)
# and hopefully get a valid response. This seems to happen when there's an issue with the Hub.
# Since we control what we expect the HUB to return, this is safe.
# See https://github.com/dbt-labs/dbt-core/issues/4577
if resp.json() is None:
raise requests.exceptions.ContentDecodingError(
"Request error: The response is None", response=resp
# and https://github.com/dbt-labs/dbt-core/issues/4849
response = resp.json()
if not isinstance(response, dict): # This will also catch Nonetype
error_msg = (
f"Request error: Expected a response type of <dict> but got {type(response)} instead"
)
return resp.json()
fire_event(RegistryResponseUnexpectedType(response=response))
raise requests.exceptions.ContentDecodingError(error_msg, response=resp)
# check for expected top level keys
expected_keys = {"name", "versions"}
if not expected_keys.issubset(response):
error_msg = (
f"Request error: Expected the response to contain keys {expected_keys} "
f"but is missing {expected_keys.difference(set(response))}"
)
fire_event(RegistryResponseMissingTopKeys(response=response))
raise requests.exceptions.ContentDecodingError(error_msg, response=resp)
def index(registry_base_url=None):
return _get_with_retries("api/v1/index.json", registry_base_url)
# check for the keys we need nested under each version
expected_version_keys = {"name", "packages", "downloads"}
all_keys = set().union(*(response["versions"][d] for d in response["versions"]))
if not expected_version_keys.issubset(all_keys):
error_msg = (
"Request error: Expected the response for the version to contain keys "
f"{expected_version_keys} but is missing {expected_version_keys.difference(all_keys)}"
)
fire_event(RegistryResponseMissingNestedKeys(response=response))
raise requests.exceptions.ContentDecodingError(error_msg, response=resp)
index_cached = memoized(index)
def packages(registry_base_url=None):
return _get_with_retries("api/v1/packages.json", registry_base_url)
def package(name, registry_base_url=None):
response = _get_with_retries("api/v1/{}.json".format(name), registry_base_url)
# all version responses should contain identical keys.
has_extra_keys = set().difference(*(response["versions"][d] for d in response["versions"]))
if has_extra_keys:
error_msg = (
"Request error: Keys for all versions do not match. Found extra key(s) "
f"of {has_extra_keys}."
)
fire_event(RegistryResponseExtraNestedKeys(response=response))
raise requests.exceptions.ContentDecodingError(error_msg, response=resp)
# Either redirectnamespace or redirectname in the JSON response indicate a redirect
# redirectnamespace redirects based on package ownership
# redirectname redirects based on package name
# Both can be present at the same time, or neither. Fails gracefully to old name
if ("redirectnamespace" in response) or ("redirectname" in response):
if ("redirectnamespace" in response) and response["redirectnamespace"] is not None:
@@ -74,15 +106,59 @@ def package(name, registry_base_url=None):
use_name = response["name"]
new_nwo = use_namespace + "/" + use_name
deprecations.warn("package-redirect", old_name=name, new_name=new_nwo)
deprecations.warn("package-redirect", old_name=package_name, new_name=new_nwo)
return response
def package_version(name, version, registry_base_url=None):
return _get_with_retries("api/v1/{}/{}.json".format(name, version), registry_base_url)
_get_cached = memoized(_get_with_retries)
def get_available_versions(name):
response = package(name)
return list(response["versions"])
def package(package_name, registry_base_url=None) -> Dict[str, Any]:
# returns a dictionary of metadata for all versions of a package
response = _get_cached(package_name, registry_base_url)
return response["versions"]
def package_version(package_name, version, registry_base_url=None) -> Dict[str, Any]:
# returns the metadata of a specific version of a package
response = package(package_name, registry_base_url)
return response[version]
def get_available_versions(package_name) -> List["str"]:
# returns a list of all available versions of a package
response = package(package_name)
return list(response)
def _get_index(registry_base_url=None):
url = _get_url("index", registry_base_url)
fire_event(RegistryIndexProgressMakingGETRequest(url=url))
# all exceptions from requests get caught in the retry logic so no need to wrap this here
resp = requests.get(url, timeout=30)
fire_event(RegistryIndexProgressGETResponse(url=url, resp_code=resp.status_code))
resp.raise_for_status()
# The response should be a list. Anything else is unexpected, raise an error.
# Raising this error will cause this function to retry and hopefully get a valid response.
response = resp.json()
if not isinstance(response, list): # This will also catch Nonetype
error_msg = (
f"Request error: The response type of {type(response)} is not valid: {resp.text}"
)
raise requests.exceptions.ContentDecodingError(error_msg, response=resp)
return response
def index(registry_base_url=None) -> List[str]:
# this returns a list of all packages on the Hub
get_index_fn = functools.partial(_get_index, registry_base_url)
return connection_exception_retry(get_index_fn, 5)
index_cached = memoized(index)

View File

@@ -335,7 +335,7 @@ def _handle_posix_cmd_error(exc: OSError, cwd: str, cmd: List[str]) -> NoReturn:
def _handle_posix_error(exc: OSError, cwd: str, cmd: List[str]) -> NoReturn:
"""OSError handling for posix systems.
"""OSError handling for POSIX systems.
Some things that could happen to trigger an OSError:
- cwd could not exist
@@ -386,7 +386,7 @@ def _handle_windows_error(exc: OSError, cwd: str, cmd: List[str]) -> NoReturn:
def _interpret_oserror(exc: OSError, cwd: str, cmd: List[str]) -> NoReturn:
"""Interpret an OSError exc and raise the appropriate dbt exception."""
"""Interpret an OSError exception and raise the appropriate dbt exception."""
if len(cmd) == 0:
raise dbt.exceptions.CommandError(cwd, cmd)
@@ -501,7 +501,7 @@ def move(src, dst):
directory on windows when it has read-only files in it and the move is
between two drives.
This is almost identical to the real shutil.move, except it uses our rmtree
This is almost identical to the real shutil.move, except it, uses our rmtree
and skips handling non-windows OSes since the existing one works ok there.
"""
src = convert_path(src)
@@ -536,7 +536,7 @@ def move(src, dst):
def rmtree(path):
"""Recursively remove path. On permissions errors on windows, try to remove
"""Recursively remove the path. On permissions errors on windows, try to remove
the read-only flag and try again.
"""
path = convert_path(path)

View File

@@ -132,7 +132,11 @@ def _all_source_paths(
analysis_paths: List[str],
macro_paths: List[str],
) -> List[str]:
return list(chain(model_paths, seed_paths, snapshot_paths, analysis_paths, macro_paths))
# We need to turn a list of lists into just a list, then convert to a set to
# get only unique elements, then back to a list
return list(
set(list(chain(model_paths, seed_paths, snapshot_paths, analysis_paths, macro_paths)))
)
T = TypeVar("T")

View File

@@ -1,12 +1,15 @@
from typing import Dict, Any, Tuple, Optional, Union, Callable
import re
import os
from dbt.clients.jinja import get_rendered, catch_jinja
from dbt.context.target import TargetContext
from dbt.context.secret import SecretContext
from dbt.context.secret import SecretContext, SECRET_PLACEHOLDER
from dbt.context.base import BaseContext
from dbt.contracts.connection import HasCredentials
from dbt.exceptions import DbtProjectError, CompilationException, RecursionException
from dbt.utils import deep_map_render
from dbt.logger import SECRET_ENV_PREFIX
Keypath = Tuple[Union[str, int], ...]
@@ -114,11 +117,9 @@ class DbtProjectYamlRenderer(BaseRenderer):
def name(self):
"Project config"
# Uses SecretRenderer
def get_package_renderer(self) -> BaseRenderer:
return PackageRenderer(self.context)
def get_selector_renderer(self) -> BaseRenderer:
return SelectorRenderer(self.context)
return PackageRenderer(self.ctx_obj.cli_vars)
def render_project(
self,
@@ -136,8 +137,7 @@ class DbtProjectYamlRenderer(BaseRenderer):
return package_renderer.render_data(packages)
def render_selectors(self, selectors: Dict[str, Any]):
selector_renderer = self.get_selector_renderer()
return selector_renderer.render_data(selectors)
return self.render_data(selectors)
def render_entry(self, value: Any, keypath: Keypath) -> Any:
result = super().render_entry(value, keypath)
@@ -165,18 +165,10 @@ class DbtProjectYamlRenderer(BaseRenderer):
return True
class SelectorRenderer(BaseRenderer):
@property
def name(self):
return "Selector config"
class SecretRenderer(BaseRenderer):
def __init__(self, cli_vars: Optional[Dict[str, Any]] = None) -> None:
def __init__(self, cli_vars: Dict[str, Any] = {}) -> None:
# Generate contexts here because we want to save the context
# object in order to retrieve the env_vars.
if cli_vars is None:
cli_vars = {}
self.ctx_obj = SecretContext(cli_vars)
context = self.ctx_obj.to_dict()
super().__init__(context)
@@ -185,6 +177,28 @@ class SecretRenderer(BaseRenderer):
def name(self):
return "Secret"
def render_value(self, value: Any, keypath: Optional[Keypath] = None) -> Any:
# First, standard Jinja rendering, with special handling for 'secret' environment variables
# "{{ env_var('DBT_SECRET_ENV_VAR') }}" -> "$$$DBT_SECRET_START$$$DBT_SECRET_ENV_{VARIABLE_NAME}$$$DBT_SECRET_END$$$"
# This prevents Jinja manipulation of secrets via macros/filters that might leak partial/modified values in logs
rendered = super().render_value(value, keypath)
# Now, detect instances of the placeholder value ($$$DBT_SECRET_START...DBT_SECRET_END$$$)
# and replace them with the actual secret value
if SECRET_ENV_PREFIX in str(rendered):
search_group = f"({SECRET_ENV_PREFIX}(.*))"
pattern = SECRET_PLACEHOLDER.format(search_group).replace("$", r"\$")
m = re.search(
pattern,
rendered,
)
if m:
found = m.group(1)
value = os.environ[found]
replace_this = SECRET_PLACEHOLDER.format(found)
return rendered.replace(replace_this, value)
else:
return rendered
class ProfileRenderer(SecretRenderer):
@property

View File

@@ -11,7 +11,7 @@ from .renderer import DbtProjectYamlRenderer, ProfileRenderer
from .utils import parse_cli_vars
from dbt import flags
from dbt.adapters.factory import get_relation_class_by_name, get_include_paths
from dbt.helper_types import FQNPath, PathSet
from dbt.helper_types import FQNPath, PathSet, DictDefaultEmptyStr
from dbt.config.profile import read_user_config
from dbt.contracts.connection import AdapterRequiredConfig, Credentials
from dbt.contracts.graph.manifest import ManifestMetadata
@@ -312,22 +312,26 @@ class RuntimeConfig(Project, Profile, AdapterRequiredConfig):
warn_or_error(msg, log_fmt=warning_tag("{}"))
def load_dependencies(self) -> Mapping[str, "RuntimeConfig"]:
def load_dependencies(self, base_only=False) -> Mapping[str, "RuntimeConfig"]:
if self.dependencies is None:
all_projects = {self.project_name: self}
internal_packages = get_include_paths(self.credentials.type)
# raise exception if fewer installed packages than in packages.yml
count_packages_specified = len(self.packages.packages) # type: ignore
count_packages_installed = len(tuple(self._get_project_directories()))
if count_packages_specified > count_packages_installed:
raise_compiler_error(
f"dbt found {count_packages_specified} package(s) "
f"specified in packages.yml, but only "
f"{count_packages_installed} package(s) installed "
f'in {self.packages_install_path}. Run "dbt deps" to '
f"install package dependencies."
)
project_paths = itertools.chain(internal_packages, self._get_project_directories())
if base_only:
# Test setup -- we want to load macros without dependencies
project_paths = itertools.chain(internal_packages)
else:
# raise exception if fewer installed packages than in packages.yml
count_packages_specified = len(self.packages.packages) # type: ignore
count_packages_installed = len(tuple(self._get_project_directories()))
if count_packages_specified > count_packages_installed:
raise_compiler_error(
f"dbt found {count_packages_specified} package(s) "
f"specified in packages.yml, but only "
f"{count_packages_installed} package(s) installed "
f'in {self.packages_install_path}. Run "dbt deps" to '
f"install package dependencies."
)
project_paths = itertools.chain(internal_packages, self._get_project_directories())
for project_name, project in self.load_projects(project_paths):
if project_name in all_projects:
raise_compiler_error(
@@ -396,7 +400,7 @@ class UnsetProfile(Profile):
self.threads = -1
def to_target_dict(self):
return {}
return DictDefaultEmptyStr({})
def __getattribute__(self, name):
if name in {"profile_name", "target_name", "threads"}:
@@ -431,7 +435,7 @@ class UnsetProfileConfig(RuntimeConfig):
def to_target_dict(self):
# re-override the poisoned profile behavior
return {}
return DictDefaultEmptyStr({})
@classmethod
def from_parts(

View File

@@ -3,7 +3,7 @@ from typing import Dict, Any, Union
from dbt.clients.yaml_helper import yaml, Loader, Dumper, load_yaml_text # noqa: F401
from dbt.dataclass_schema import ValidationError
from .renderer import SelectorRenderer
from .renderer import BaseRenderer
from dbt.clients.system import (
load_file_contents,
@@ -57,7 +57,7 @@ class SelectorConfig(Dict[str, Dict[str, Union[SelectionSpec, bool]]]):
def render_from_dict(
cls,
data: Dict[str, Any],
renderer: SelectorRenderer,
renderer: BaseRenderer,
) -> "SelectorConfig":
try:
rendered = renderer.render_data(data)
@@ -72,7 +72,7 @@ class SelectorConfig(Dict[str, Dict[str, Union[SelectionSpec, bool]]]):
def from_path(
cls,
path: Path,
renderer: SelectorRenderer,
renderer: BaseRenderer,
) -> "SelectorConfig":
try:
data = load_yaml_text(load_file_contents(str(path)))

View File

@@ -1,9 +1,15 @@
from typing import Dict, Any
from argparse import Namespace
from typing import Any, Dict, Optional, Union
from xmlrpc.client import Boolean
from dbt.contracts.project import UserConfig
import dbt.flags as flags
from dbt.clients import yaml_helper
from dbt.config import Profile, Project, read_user_config
from dbt.config.renderer import DbtProjectYamlRenderer, ProfileRenderer
from dbt.events.functions import fire_event
from dbt.exceptions import raise_compiler_error, ValidationException
from dbt.events.types import InvalidVarsYAML
from dbt.exceptions import ValidationException, raise_compiler_error
def parse_cli_vars(var_string: str) -> Dict[str, Any]:
@@ -21,3 +27,49 @@ def parse_cli_vars(var_string: str) -> Dict[str, Any]:
except ValidationException:
fire_event(InvalidVarsYAML())
raise
def get_project_config(
project_path: str,
profile_name: str,
args: Namespace = Namespace(),
cli_vars: Optional[Dict[str, Any]] = None,
profile: Optional[Profile] = None,
user_config: Optional[UserConfig] = None,
return_dict: Boolean = True,
) -> Union[Project, Dict]:
"""Returns a project config (dict or object) from a given project path and profile name.
Args:
project_path: Path to project
profile_name: Name of profile
args: An argparse.Namespace that represents what would have been passed in on the
command line (optional)
cli_vars: A dict of any vars that would have been passed in on the command line (optional)
(see parse_cli_vars above for formatting details)
profile: A dbt.config.profile.Profile object (optional)
user_config: A dbt.contracts.project.UserConfig object (optional)
return_dict: Return a dict if true, return the full dbt.config.project.Project object if false
Returns:
A full project config
"""
# Generate a profile if not provided
if profile is None:
# Generate user_config if not provided
if user_config is None:
user_config = read_user_config(flags.PROFILES_DIR)
# Update flags
flags.set_from_args(args, user_config)
if cli_vars is None:
cli_vars = {}
profile = Profile.render_from_args(args, ProfileRenderer(cli_vars), profile_name)
# Generate a project
project = Project.from_project_root(
project_path,
DbtProjectYamlRenderer(profile),
verify_version=bool(flags.VERSION_CHECK),
)
# Return
return project.to_project_config() if return_dict else project

View File

@@ -1 +1,51 @@
# Contexts and Jinja rendering
Contexts are used for Jinja rendering. They include context methods, executable macros, and various settings that are available in Jinja.
The most common entrypoint to Jinja rendering in dbt is a method named `get_rendered`, which takes two arguments: templated code (string), and a context used to render it (dictionary).
The context is the bundle of information that is in "scope" when rendering Jinja-templated code. For instance, imagine a simple Jinja template:
```
{% set new_value = some_macro(some_variable) %}
```
Both `some_macro()` and `some_variable` must be defined in that context. Otherwise, it will raise an error when rendering.
Different contexts are used in different places because we allow access to different methods and data in different places. Executable SQL, for example, includes all available macros and the model being run. The variables and macros in scope for Jinja defined in yaml files is much more limited.
### Implementation
The context that is passed to Jinja is always in a dictionary format, not an actual class, so a `to_dict()` is executed on a context class before it is used for rendering.
Each context has a `generate_<name>_context` function to create the context. `ProviderContext` subclasses have different generate functions for parsing and for execution, so that certain functions (notably `ref`, `source`, and `config`) can return different results
### Hierarchy
All contexts inherit from the `BaseContext`, which includes "pure" methods (e.g. `tojson`), `env_var()`, and `var()` (but only CLI values, passed via `--vars`).
Methods available in parent contexts are also available in child contexts.
```
BaseContext -- core/dbt/context/base.py
SecretContext -- core/dbt/context/secret.py
TargetContext -- core/dbt/context/target.py
ConfiguredContext -- core/dbt/context/configured.py
SchemaYamlContext -- core/dbt/context/configured.py
DocsRuntimeContext -- core/dbt/context/configured.py
MacroResolvingContext -- core/dbt/context/configured.py
ManifestContext -- core/dbt/context/manifest.py
QueryHeaderContext -- core/dbt/context/manifest.py
ProviderContext -- core/dbt/context/provider.py
MacroContext -- core/dbt/context/provider.py
ModelContext -- core/dbt/context/provider.py
TestContext -- core/dbt/context/provider.py
```
### Contexts for configuration
Contexts for rendering "special" `.yml` (configuration) files:
- `SecretContext`: Supports "secret" env vars, which are prefixed with `DBT_ENV_SECRET_`. Used for rendering in `profiles.yml` and `packages.yml` ONLY. Secrets defined elsewhere will raise explicit errors.
- `TargetContext`: The same as `Base`, plus `target` (connection profile). Used most notably in `dbt_project.yml` and `selectors.yml`.
Contexts for other `.yml` files in the project:
- `SchemaYamlContext`: Supports `vars` declared on the CLI and in `dbt_project.yml`. Does not support custom macros, beyond `var()` + `env_var()` methods. Used for all `.yml` files, to define properties and configuration.
- `DocsRuntimeContext`: Standard `.yml` file context, plus `doc()` method (with all `docs` blocks in scope). Used to resolve `description` properties.

View File

@@ -24,38 +24,7 @@ import pytz
import datetime
import re
# Contexts in dbt Core
# Contexts are used for Jinja rendering. They include context methods,
# executable macros, and various settings that are available in Jinja.
#
# Different contexts are used in different places because we allow access
# to different methods and data in different places. Executable SQL, for
# example, includes the available macros and the model, while Jinja in
# yaml files is more limited.
#
# The context that is passed to Jinja is always in a dictionary format,
# not an actual class, so a 'to_dict()' is executed on a context class
# before it is used for rendering.
#
# Each context has a generate_<name>_context function to create the context.
# ProviderContext subclasses have different generate functions for
# parsing and for execution.
#
# Context class hierarchy
#
# BaseContext -- core/dbt/context/base.py
# SecretContext -- core/dbt/context/secret.py
# TargetContext -- core/dbt/context/target.py
# ConfiguredContext -- core/dbt/context/configured.py
# SchemaYamlContext -- core/dbt/context/configured.py
# DocsRuntimeContext -- core/dbt/context/configured.py
# MacroResolvingContext -- core/dbt/context/configured.py
# ManifestContext -- core/dbt/context/manifest.py
# QueryHeaderContext -- core/dbt/context/manifest.py
# ProviderContext -- core/dbt/context/provider.py
# MacroContext -- core/dbt/context/provider.py
# ModelContext -- core/dbt/context/provider.py
# TestContext -- core/dbt/context/provider.py
# See the `contexts` module README for more information on how contexts work
def get_pytz_module_context() -> Dict[str, Any]:
@@ -552,9 +521,8 @@ class BaseContext(metaclass=ContextMeta):
{% endif %}
This supports all flags defined in flags submodule (core/dbt/flags.py)
TODO: Replace with object that provides read-only access to flag values
"""
return flags
return flags.get_flag_obj()
@contextmember
@staticmethod
@@ -569,7 +537,9 @@ class BaseContext(metaclass=ContextMeta):
{{ print("Running some_macro: " ~ arg1 ~ ", " ~ arg2) }}
{% endmacro %}"
"""
print(msg)
if not flags.NO_PRINT:
print(msg)
return ""

View File

@@ -62,6 +62,8 @@ from dbt.node_types import NodeType
from dbt.utils import merge, AttrDict, MultiDict
from dbt import selected_resources
import agate
@@ -1143,11 +1145,20 @@ class ProviderContext(ManifestContext):
msg = f"Env var required but not provided: '{var}'"
raise_parsing_error(msg)
@contextproperty
def selected_resources(self) -> List[str]:
"""The `selected_resources` variable contains a list of the resources
selected based on the parameters provided to the dbt command.
Currently, is not populated for the command `run-operation` that
doesn't support `--select`.
"""
return selected_resources.SELECTED_RESOURCES
class MacroContext(ProviderContext):
"""Internally, macros can be executed like nodes, with some restrictions:
- they don't have have all values available that nodes do:
- they don't have all values available that nodes do:
- 'this', 'pre_hooks', 'post_hooks', and 'sql' are missing
- 'schema' does not use any 'model' information
- they can't be configured with config() directives

View File

@@ -7,6 +7,9 @@ from dbt.exceptions import raise_parsing_error
from dbt.logger import SECRET_ENV_PREFIX
SECRET_PLACEHOLDER = "$$$DBT_SECRET_START$$${}$$$DBT_SECRET_END$$$"
class SecretContext(BaseContext):
"""This context is used in profiles.yml + packages.yml. It can render secret
env vars that aren't usable elsewhere"""
@@ -18,21 +21,29 @@ class SecretContext(BaseContext):
If the default is None, raise an exception for an undefined variable.
In this context *only*, env_var will return the actual values of
env vars prefixed with DBT_ENV_SECRET_
In this context *only*, env_var will accept env vars prefixed with DBT_ENV_SECRET_.
It will return the name of the secret env var, wrapped in 'start' and 'end' identifiers.
The actual value will be subbed in later in SecretRenderer.render_value()
"""
return_value = None
if var in os.environ:
# if this is a 'secret' env var, just return the name of the env var
# instead of rendering the actual value here, to avoid any risk of
# Jinja manipulation. it will be subbed out later, in SecretRenderer.render_value
if var in os.environ and var.startswith(SECRET_ENV_PREFIX):
return SECRET_PLACEHOLDER.format(var)
elif var in os.environ:
return_value = os.environ[var]
elif default is not None:
return_value = default
if return_value is not None:
# do not save secret environment variables
# store env vars in the internal manifest to power partial parsing
# if it's a 'secret' env var, we shouldn't even get here
# but just to be safe — don't save secrets
if not var.startswith(SECRET_ENV_PREFIX):
self.env_vars[var] = return_value
# return the value even if its a secret
return return_value
else:
msg = f"Env var required but not provided: '{var}'"

View File

@@ -104,7 +104,7 @@ class Connection(ExtensibleDbtClassMixin, Replaceable):
class LazyHandle:
"""Opener must be a callable that takes a Connection object and opens the
"""The opener must be a callable that takes a Connection object and opens the
connection, updating the handle on the Connection.
"""

View File

@@ -453,7 +453,7 @@ T = TypeVar("T", bound=GraphMemberNode)
def _update_into(dest: MutableMapping[str, T], new_item: T):
"""Update dest to overwrite whatever is at dest[new_item.unique_id] with
new_itme. There must be an existing value to overwrite, and they two nodes
new_itme. There must be an existing value to overwrite, and the two nodes
must have the same original file path.
"""
unique_id = new_item.unique_id
@@ -1091,7 +1091,7 @@ AnyManifest = Union[Manifest, MacroManifest]
@dataclass
@schema_version("manifest", 4)
@schema_version("manifest", 5)
class WritableManifest(ArtifactMixin):
nodes: Mapping[UniqueID, ManifestNode] = field(
metadata=dict(description=("The nodes defined in the dbt project and its dependencies"))
@@ -1135,6 +1135,10 @@ class WritableManifest(ArtifactMixin):
)
)
@classmethod
def compatible_previous_versions(self):
return [("manifest", 4)]
def _check_duplicates(value: HasUniqueID, src: Mapping[str, HasUniqueID]):
if value.unique_id in src:

View File

@@ -335,6 +335,40 @@ class BaseConfig(AdditionalPropertiesAllowed, Replaceable):
@dataclass
class SourceConfig(BaseConfig):
enabled: bool = True
# to be implmented to complete CT-201
# quoting: Dict[str, Any] = field(
# default_factory=dict,
# metadata=MergeBehavior.Update.meta(),
# )
# freshness: Optional[Dict[str, Any]] = field(
# default=None,
# metadata=CompareBehavior.Exclude.meta(),
# )
# loader: Optional[str] = field(
# default=None,
# metadata=CompareBehavior.Exclude.meta(),
# )
# # TODO what type is this? docs say: "<column_name_or_expression>"
# loaded_at_field: Optional[str] = field(
# default=None,
# metadata=CompareBehavior.Exclude.meta(),
# )
# database: Optional[str] = field(
# default=None,
# metadata=CompareBehavior.Exclude.meta(),
# )
# schema: Optional[str] = field(
# default=None,
# metadata=CompareBehavior.Exclude.meta(),
# )
# meta: Dict[str, Any] = field(
# default_factory=dict,
# metadata=MergeBehavior.Update.meta(),
# )
# tags: Union[List[str], str] = field(
# default_factory=list_str,
# metadata=metas(ShowBehavior.Hide, MergeBehavior.Append, CompareBehavior.Exclude),
# )
@dataclass
@@ -389,7 +423,9 @@ class NodeConfig(NodeAndTestConfig):
metadata=MergeBehavior.Update.meta(),
)
full_refresh: Optional[bool] = None
unique_key: Optional[Union[str, List[str]]] = None
# 'unique_key' doesn't use 'Optional' because typing.get_type_hints was
# sometimes getting the Union order wrong, causing serialization failures.
unique_key: Union[str, List[str], None] = None
on_schema_change: Optional[str] = "ignore"
@classmethod
@@ -483,7 +519,8 @@ class SnapshotConfig(EmptySnapshotConfig):
target_schema: Optional[str] = None
target_database: Optional[str] = None
updated_at: Optional[str] = None
check_cols: Optional[Union[str, List[str]]] = None
# Not using Optional because of serialization issues with a Union of str and List[str]
check_cols: Union[str, List[str], None] = None
@classmethod
def validate(cls, data):

View File

@@ -242,6 +242,7 @@ class Quoting(dbtClassMixin, Mergeable):
@dataclass
class UnparsedSourceTableDefinition(HasColumnTests, HasTests):
config: Dict[str, Any] = field(default_factory=dict)
loaded_at_field: Optional[str] = None
identifier: Optional[str] = None
quoting: Quoting = field(default_factory=Quoting)
@@ -322,6 +323,7 @@ class SourcePatch(dbtClassMixin, Replaceable):
path: Path = field(
metadata=dict(description="The path to the patch-defining yml file"),
)
config: Dict[str, Any] = field(default_factory=dict)
description: Optional[str] = None
meta: Optional[Dict[str, Any]] = None
database: Optional[str] = None

View File

@@ -253,6 +253,7 @@ class UserConfig(ExtensibleDbtClassMixin, Replaceable, UserConfigContract):
use_experimental_parser: Optional[bool] = None
static_parser: Optional[bool] = None
indirect_selection: Optional[str] = None
cache_selected_only: Optional[bool] = None
@dataclass

View File

@@ -85,6 +85,7 @@ class RunStatus(StrEnum):
class TestStatus(StrEnum):
__test__ = False
Pass = NodeStatus.Pass
Error = NodeStatus.Error
Fail = NodeStatus.Fail

View File

@@ -1,20 +1,23 @@
from pathlib import Path
from .graph.manifest import WritableManifest
from .results import RunResultsArtifact
from .results import FreshnessExecutionResultArtifact
from typing import Optional
from dbt.exceptions import IncompatibleSchemaException
class PreviousState:
def __init__(self, path: Path):
def __init__(self, path: Path, current_path: Path):
self.path: Path = path
self.current_path: Path = current_path
self.manifest: Optional[WritableManifest] = None
self.results: Optional[RunResultsArtifact] = None
self.sources: Optional[FreshnessExecutionResultArtifact] = None
self.sources_current: Optional[FreshnessExecutionResultArtifact] = None
manifest_path = self.path / "manifest.json"
if manifest_path.exists() and manifest_path.is_file():
try:
# we want to bail with an error if schema versions don't match
self.manifest = WritableManifest.read_and_check_versions(str(manifest_path))
except IncompatibleSchemaException as exc:
exc.add_filename(str(manifest_path))
@@ -23,8 +26,27 @@ class PreviousState:
results_path = self.path / "run_results.json"
if results_path.exists() and results_path.is_file():
try:
# we want to bail with an error if schema versions don't match
self.results = RunResultsArtifact.read_and_check_versions(str(results_path))
except IncompatibleSchemaException as exc:
exc.add_filename(str(results_path))
raise
sources_path = self.path / "sources.json"
if sources_path.exists() and sources_path.is_file():
try:
self.sources = FreshnessExecutionResultArtifact.read_and_check_versions(
str(sources_path)
)
except IncompatibleSchemaException as exc:
exc.add_filename(str(sources_path))
raise
sources_current_path = self.current_path / "sources.json"
if sources_current_path.exists() and sources_current_path.is_file():
try:
self.sources_current = FreshnessExecutionResultArtifact.read_and_check_versions(
str(sources_current_path)
)
except IncompatibleSchemaException as exc:
exc.add_filename(str(sources_current_path))
raise

View File

@@ -201,6 +201,14 @@ class VersionedSchema(dbtClassMixin):
result["$id"] = str(cls.dbt_schema_version)
return result
@classmethod
def is_compatible_version(cls, schema_version):
compatible_versions = [str(cls.dbt_schema_version)]
if hasattr(cls, "compatible_previous_versions"):
for name, version in cls.compatible_previous_versions():
compatible_versions.append(str(SchemaVersion(name, version)))
return str(schema_version) in compatible_versions
@classmethod
def read_and_check_versions(cls, path: str):
try:
@@ -217,7 +225,7 @@ class VersionedSchema(dbtClassMixin):
if "metadata" in data and "dbt_schema_version" in data["metadata"]:
previous_schema_version = data["metadata"]["dbt_schema_version"]
# cls.dbt_schema_version is a SchemaVersion object
if str(cls.dbt_schema_version) != previous_schema_version:
if not cls.is_compatible_version(previous_schema_version):
raise IncompatibleSchemaException(
expected=str(cls.dbt_schema_version), found=previous_schema_version
)

View File

@@ -35,7 +35,7 @@ class DateTimeSerialization(SerializationStrategy):
# jsonschemas for every class and the 'validate' method
# come from Hologram.
class dbtClassMixin(DataClassDictMixin, JsonSchemaMixin):
"""Mixin which adds methods to generate a JSON schema and
"""The Mixin adds methods to generate a JSON schema and
convert to and from JSON encodable dicts with validation
against the schema
"""

View File

@@ -64,7 +64,7 @@ class Event(metaclass=ABCMeta):
# in theory threads can change so we don't cache them.
def get_thread_name(self) -> str:
return threading.current_thread().getName()
return threading.current_thread().name
@classmethod
def get_invocation_id(cls) -> str:

View File

@@ -15,7 +15,7 @@ def format_fancy_output_line(
progress = ""
else:
progress = "{} of {} ".format(index, total)
prefix = "{progress}{message}".format(progress=progress, message=msg)
prefix = "{progress}{message} ".format(progress=progress, message=msg)
truncate_width = ui.printer_width() - 3
justified = prefix.ljust(ui.printer_width(), ".")

View File

@@ -1,4 +1,3 @@
import colorama
from colorama import Style
import dbt.events.functions as this # don't worry I hate it too.
from dbt.events.base_types import NoStdOut, Event, NoFile, ShowException, Cache
@@ -50,25 +49,6 @@ format_color = True
format_json = False
invocation_id: Optional[str] = None
# Colorama needs some help on windows because we're using logger.info
# intead of print(). If the Windows env doesn't have a TERM var set,
# then we should override the logging stream to use the colorama
# converter. If the TERM var is set (as with Git Bash), then it's safe
# to send escape characters and no log handler injection is needed.
colorama_stdout = sys.stdout
colorama_wrap = True
colorama.init(wrap=colorama_wrap)
if sys.platform == "win32" and not os.getenv("TERM"):
colorama_wrap = False
colorama_stdout = colorama.AnsiToWin32(sys.stdout).stream
elif sys.platform == "win32":
colorama_wrap = False
colorama.init(wrap=colorama_wrap)
def setup_event_logger(log_path, level_override=None):
# flags have been resolved, and log_path is known
@@ -205,8 +185,8 @@ def create_debug_text_log_line(e: T_Event) -> str:
scrubbed_msg: str = scrub_secrets(e.message(), env_secrets())
level: str = e.level_tag() if len(e.level_tag()) == 5 else f"{e.level_tag()} "
thread = ""
if threading.current_thread().getName():
thread_name = threading.current_thread().getName()
if threading.current_thread().name:
thread_name = threading.current_thread().name
thread_name = thread_name[:10]
thread_name = thread_name.ljust(10, " ")
thread = f" [{thread_name}]:"

View File

@@ -291,6 +291,25 @@ class GitProgressCheckedOutAt(DebugLevel):
return f" Checked out at {self.end_sha}."
@dataclass
class RegistryIndexProgressMakingGETRequest(DebugLevel):
url: str
code: str = "M022"
def message(self) -> str:
return f"Making package index registry request: GET {self.url}"
@dataclass
class RegistryIndexProgressGETResponse(DebugLevel):
url: str
resp_code: int
code: str = "M023"
def message(self) -> str:
return f"Response from registry index: GET {self.url} {self.resp_code}"
@dataclass
class RegistryProgressMakingGETRequest(DebugLevel):
url: str
@@ -310,6 +329,45 @@ class RegistryProgressGETResponse(DebugLevel):
return f"Response from registry: GET {self.url} {self.resp_code}"
@dataclass
class RegistryResponseUnexpectedType(DebugLevel):
response: str
code: str = "M024"
def message(self) -> str:
return f"Response was None: {self.response}"
@dataclass
class RegistryResponseMissingTopKeys(DebugLevel):
response: str
code: str = "M025"
def message(self) -> str:
# expected/actual keys logged in exception
return f"Response missing top level keys: {self.response}"
@dataclass
class RegistryResponseMissingNestedKeys(DebugLevel):
response: str
code: str = "M026"
def message(self) -> str:
# expected/actual keys logged in exception
return f"Response missing nested keys: {self.response}"
@dataclass
class RegistryResponseExtraNestedKeys(DebugLevel):
response: str
code: str = "M027"
def message(self) -> str:
# expected/actual keys logged in exception
return f"Response contained inconsistent keys: {self.response}"
# TODO this was actually `logger.exception(...)` not `logger.error(...)`
@dataclass
class SystemErrorRetrievingModTime(ErrorLevel):
@@ -2346,7 +2404,7 @@ class TrackingInitializeFailure(ShowException, DebugLevel):
class RetryExternalCall(DebugLevel):
attempt: int
max: int
code: str = "Z045"
code: str = "M020"
def message(self) -> str:
return f"Retrying external call. Attempt: {self.attempt} Max attempts: {self.max}"
@@ -2381,7 +2439,19 @@ class EventBufferFull(WarnLevel):
code: str = "Z048"
def message(self) -> str:
return "Internal event buffer full. Earliest events will be dropped (FIFO)."
return (
"Internal logging/event buffer full."
"Earliest logs/events will be dropped as new ones are fired (FIFO)."
)
@dataclass
class RecordRetryException(DebugLevel):
exc: Exception
code: str = "M021"
def message(self) -> str:
return f"External call exception: {self.exc}"
# since mypy doesn't run on every file we need to suggest to mypy that every
@@ -2413,6 +2483,14 @@ if 1 == 0:
GitNothingToDo(sha="")
GitProgressUpdatedCheckoutRange(start_sha="", end_sha="")
GitProgressCheckedOutAt(end_sha="")
RegistryIndexProgressMakingGETRequest(url="")
RegistryIndexProgressGETResponse(url="", resp_code=1234)
RegistryProgressMakingGETRequest(url="")
RegistryProgressGETResponse(url="", resp_code=1234)
RegistryResponseUnexpectedType(response=""),
RegistryResponseMissingTopKeys(response=""),
RegistryResponseMissingNestedKeys(response=""),
RegistryResponseExtraNestedKeys(response=""),
SystemErrorRetrievingModTime(path="")
SystemCouldNotWrite(path="", reason="", exc=Exception(""))
SystemExecutingCmd(cmd=[""])
@@ -2737,3 +2815,4 @@ if 1 == 0:
GeneralWarningMsg(msg="", log_fmt="")
GeneralWarningException(exc=Exception(""), log_fmt="")
EventBufferFull()
RecordRetryException(exc=Exception(""))

View File

@@ -383,10 +383,11 @@ class FailedToConnectException(DatabaseException):
class CommandError(RuntimeException):
def __init__(self, cwd, cmd, message="Error running command"):
cmd_scrubbed = list(scrub_secrets(cmd_txt, env_secrets()) for cmd_txt in cmd)
super().__init__(message)
self.cwd = cwd
self.cmd = cmd
self.args = (cwd, cmd, message)
self.cmd = cmd_scrubbed
self.args = (cwd, cmd_scrubbed, message)
def __str__(self):
if len(self.cmd) == 0:
@@ -411,9 +412,9 @@ class CommandResultError(CommandError):
def __init__(self, cwd, cmd, returncode, stdout, stderr, message="Got a non-zero returncode"):
super().__init__(cwd, cmd, message)
self.returncode = returncode
self.stdout = stdout
self.stderr = stderr
self.args = (cwd, cmd, returncode, stdout, stderr, message)
self.stdout = scrub_secrets(stdout.decode("utf-8"), env_secrets())
self.stderr = scrub_secrets(stderr.decode("utf-8"), env_secrets())
self.args = (cwd, self.cmd, returncode, self.stdout, self.stderr, message)
def __str__(self):
return "{} running: {}".format(self.msg, self.cmd)
@@ -501,7 +502,7 @@ def invalid_type_error(
def invalid_bool_error(got_value, macro_name) -> NoReturn:
"""Raise a CompilationException when an macro expects a boolean but gets some
"""Raise a CompilationException when a macro expects a boolean but gets some
other value.
"""
msg = (
@@ -704,7 +705,6 @@ def missing_materialization(model, adapter_type):
def bad_package_spec(repo, spec, error_message):
msg = "Error checking out spec='{}' for repo {}\n{}".format(spec, repo, error_message)
raise InternalException(scrub_secrets(msg, env_secrets()))
@@ -838,31 +838,47 @@ def raise_duplicate_macro_name(node_1, node_2, namespace) -> NoReturn:
def raise_duplicate_resource_name(node_1, node_2):
duped_name = node_1.name
node_type = NodeType(node_1.resource_type)
pluralized = (
node_type.pluralize()
if node_1.resource_type == node_2.resource_type
else "resources" # still raise if ref() collision, e.g. model + seed
)
if node_1.resource_type in NodeType.refable():
get_func = 'ref("{}")'.format(duped_name)
elif node_1.resource_type == NodeType.Source:
action = "looking for"
# duplicate 'ref' targets
if node_type in NodeType.refable():
formatted_name = f'ref("{duped_name}")'
# duplicate sources
elif node_type == NodeType.Source:
duped_name = node_1.get_full_source_name()
get_func = node_1.get_source_representation()
elif node_1.resource_type == NodeType.Documentation:
get_func = 'doc("{}")'.format(duped_name)
elif node_1.resource_type == NodeType.Test and "schema" in node_1.tags:
return
formatted_name = node_1.get_source_representation()
# duplicate docs blocks
elif node_type == NodeType.Documentation:
formatted_name = f'doc("{duped_name}")'
# duplicate generic tests
elif node_type == NodeType.Test and hasattr(node_1, "test_metadata"):
column_name = f'column "{node_1.column_name}" in ' if node_1.column_name else ""
model_name = node_1.file_key_name
duped_name = f'{node_1.name}" defined on {column_name}"{model_name}'
action = "running"
formatted_name = "tests"
# all other resource types
else:
get_func = '"{}"'.format(duped_name)
formatted_name = duped_name
# should this be raise_parsing_error instead?
raise_compiler_error(
'dbt found two resources with the name "{}". Since these resources '
"have the same name,\ndbt will be unable to find the correct resource "
"when {} is used. To fix this,\nchange the name of one of "
"these resources:\n- {} ({})\n- {} ({})".format(
duped_name,
get_func,
node_1.unique_id,
node_1.original_file_path,
node_2.unique_id,
node_2.original_file_path,
)
f"""
dbt found two {pluralized} with the name "{duped_name}".
Since these resources have the same name, dbt will be unable to find the correct resource
when {action} {formatted_name}.
To fix this, change the name of one of these resources:
- {node_1.unique_id} ({node_1.original_file_path})
- {node_2.unique_id} ({node_2.original_file_path})
""".strip()
)
@@ -966,11 +982,11 @@ def raise_duplicate_source_patch_name(patch_1, patch_2):
)
def raise_invalid_schema_yml_version(path, issue):
def raise_invalid_property_yml_version(path, issue):
raise_compiler_error(
"The schema file at {} is invalid because {}. Please consult the "
"documentation for more information on schema.yml syntax:\n\n"
"https://docs.getdbt.com/docs/schemayml-files".format(path, issue)
"The yml property file at {} is invalid because {}. Please consult the "
"documentation for more information on yml property file syntax:\n\n"
"https://docs.getdbt.com/reference/configs-and-properties".format(path, issue)
)
@@ -1048,7 +1064,7 @@ CONTEXT_EXPORTS = {
raise_dependency_error,
raise_duplicate_patch_name,
raise_duplicate_resource_name,
raise_invalid_schema_yml_version,
raise_invalid_property_yml_version,
raise_not_implemented,
relation_wrong_type,
]

View File

@@ -1,7 +1,9 @@
import os
# Do not import the os package because we expose this package in jinja
from os import name as os_name, path as os_path, getenv as os_getenv
import multiprocessing
from argparse import Namespace
if os.name != "nt":
if os_name != "nt":
# https://bugs.python.org/issue41567
import multiprocessing.popen_spawn_posix # type: ignore
from pathlib import Path
@@ -10,8 +12,8 @@ from typing import Optional
# PROFILES_DIR must be set before the other flags
# It also gets set in main.py and in set_from_args because the rpc server
# doesn't go through exactly the same main arg processing.
DEFAULT_PROFILES_DIR = os.path.join(os.path.expanduser("~"), ".dbt")
PROFILES_DIR = os.path.expanduser(os.getenv("DBT_PROFILES_DIR", DEFAULT_PROFILES_DIR))
DEFAULT_PROFILES_DIR = os_path.join(os_path.expanduser("~"), ".dbt")
PROFILES_DIR = os_path.expanduser(os_getenv("DBT_PROFILES_DIR", DEFAULT_PROFILES_DIR))
STRICT_MODE = False # Only here for backwards compatibility
FULL_REFRESH = False # subcommand
@@ -35,6 +37,18 @@ INDIRECT_SELECTION = None
LOG_CACHE_EVENTS = None
EVENT_BUFFER_SIZE = 100000
QUIET = None
NO_PRINT = None
CACHE_SELECTED_ONLY = None
_NON_BOOLEAN_FLAGS = [
"LOG_FORMAT",
"PRINTER_WIDTH",
"PROFILES_DIR",
"INDIRECT_SELECTION",
"EVENT_BUFFER_SIZE",
]
_NON_DBT_ENV_FLAGS = ["DO_NOT_TRACK"]
# Global CLI defaults. These flags are set from three places:
# CLI args, environment variables, and user_config (profiles.yml).
@@ -57,14 +71,16 @@ flag_defaults = {
"LOG_CACHE_EVENTS": False,
"EVENT_BUFFER_SIZE": 100000,
"QUIET": False,
"NO_PRINT": False,
"CACHE_SELECTED_ONLY": False,
}
def env_set_truthy(key: str) -> Optional[str]:
"""Return the value if it was set to a "truthy" string value, or None
"""Return the value if it was set to a "truthy" string value or None
otherwise.
"""
value = os.getenv(key)
value = os_getenv(key)
if not value or value.lower() in ("0", "false", "f"):
return None
return value
@@ -77,7 +93,7 @@ def env_set_bool(env_value):
def env_set_path(key: str) -> Optional[Path]:
value = os.getenv(key)
value = os_getenv(key)
if value is None:
return value
else:
@@ -106,7 +122,7 @@ def set_from_args(args, user_config):
global STRICT_MODE, FULL_REFRESH, WARN_ERROR, USE_EXPERIMENTAL_PARSER, STATIC_PARSER
global WRITE_JSON, PARTIAL_PARSE, USE_COLORS, STORE_FAILURES, PROFILES_DIR, DEBUG, LOG_FORMAT
global INDIRECT_SELECTION, VERSION_CHECK, FAIL_FAST, SEND_ANONYMOUS_USAGE_STATS
global PRINTER_WIDTH, WHICH, LOG_CACHE_EVENTS, EVENT_BUFFER_SIZE, QUIET
global PRINTER_WIDTH, WHICH, LOG_CACHE_EVENTS, EVENT_BUFFER_SIZE, QUIET, NO_PRINT, CACHE_SELECTED_ONLY
STRICT_MODE = False # backwards compatibility
# cli args without user_config or env var option
@@ -132,40 +148,69 @@ def set_from_args(args, user_config):
LOG_CACHE_EVENTS = get_flag_value("LOG_CACHE_EVENTS", args, user_config)
EVENT_BUFFER_SIZE = get_flag_value("EVENT_BUFFER_SIZE", args, user_config)
QUIET = get_flag_value("QUIET", args, user_config)
NO_PRINT = get_flag_value("NO_PRINT", args, user_config)
CACHE_SELECTED_ONLY = get_flag_value("CACHE_SELECTED_ONLY", args, user_config)
_set_overrides_from_env()
def _set_overrides_from_env():
global SEND_ANONYMOUS_USAGE_STATS
flag_value = _get_flag_value_from_env("DO_NOT_TRACK")
if flag_value is None:
return
SEND_ANONYMOUS_USAGE_STATS = not flag_value
def get_flag_value(flag, args, user_config):
lc_flag = flag.lower()
flag_value = getattr(args, lc_flag, None)
if flag_value is None:
# Environment variables use pattern 'DBT_{flag name}'
env_flag = f"DBT_{flag}"
env_value = os.getenv(env_flag)
if env_value is not None and env_value != "":
env_value = env_value.lower()
# non Boolean values
if flag in [
"LOG_FORMAT",
"PRINTER_WIDTH",
"PROFILES_DIR",
"INDIRECT_SELECTION",
"EVENT_BUFFER_SIZE",
]:
flag_value = env_value
else:
flag_value = env_set_bool(env_value)
elif user_config is not None and getattr(user_config, lc_flag, None) is not None:
flag_value = getattr(user_config, lc_flag)
else:
flag_value = flag_defaults[flag]
flag_value = _load_flag_value(flag, args, user_config)
if flag in ["PRINTER_WIDTH", "EVENT_BUFFER_SIZE"]: # must be ints
flag_value = int(flag_value)
if flag == "PROFILES_DIR":
flag_value = os.path.abspath(flag_value)
flag_value = os_path.abspath(flag_value)
return flag_value
def _load_flag_value(flag, args, user_config):
lc_flag = flag.lower()
flag_value = getattr(args, lc_flag, None)
if flag_value is not None:
return flag_value
flag_value = _get_flag_value_from_env(flag)
if flag_value is not None:
return flag_value
if user_config is not None and getattr(user_config, lc_flag, None) is not None:
return getattr(user_config, lc_flag)
return flag_defaults[flag]
def _get_flag_value_from_env(flag):
# Environment variables use pattern 'DBT_{flag name}'
env_flag = _get_env_flag(flag)
env_value = os_getenv(env_flag)
if env_value is None or env_value == "":
return None
env_value = env_value.lower()
if flag in _NON_BOOLEAN_FLAGS:
flag_value = env_value
else:
flag_value = env_set_bool(env_value)
return flag_value
def _get_env_flag(flag):
return flag if flag in _NON_DBT_ENV_FLAGS else f"DBT_{flag}"
def get_flag_dict():
return {
"use_experimental_parser": USE_EXPERIMENTAL_PARSER,
@@ -185,4 +230,20 @@ def get_flag_dict():
"log_cache_events": LOG_CACHE_EVENTS,
"event_buffer_size": EVENT_BUFFER_SIZE,
"quiet": QUIET,
"no_print": NO_PRINT,
"cache_selected_only": CACHE_SELECTED_ONLY,
}
# This is used by core/dbt/context/base.py to return a flag object
# in Jinja.
def get_flag_obj():
new_flags = Namespace()
for k, v in get_flag_dict().items():
setattr(new_flags, k.upper(), v)
# The following 3 are CLI arguments only so they're not full-fledged flags,
# but we put in flags for users.
setattr(new_flags, "FULL_REFRESH", FULL_REFRESH)
setattr(new_flags, "STORE_FAILURES", STORE_FAILURES)
setattr(new_flags, "WHICH", WHICH)
return new_flags

View File

@@ -17,6 +17,8 @@ from dbt.contracts.graph.compiled import GraphMemberNode
from dbt.contracts.graph.manifest import Manifest
from dbt.contracts.state import PreviousState
from dbt import selected_resources
def get_package_names(nodes):
return set([node.split(".")[1] for node in nodes])
@@ -269,6 +271,7 @@ class NodeSelector(MethodManager):
dependecies.
"""
selected_nodes = self.get_selected(spec)
selected_resources.set_selected_resources(selected_nodes)
new_graph = self.full_graph.get_subset_graph(selected_nodes)
# should we give a way here for consumers to mutate the graph?
return GraphQueue(new_graph.graph, self.manifest, selected_nodes)

View File

@@ -48,6 +48,7 @@ class MethodName(StrEnum):
Exposure = "exposure"
Metric = "metric"
Result = "result"
SourceStatus = "source_status"
def is_selected_node(fqn: List[str], node_selector: str):
@@ -414,20 +415,24 @@ class StateSelectorMethod(SelectorMethod):
return modified
def recursively_check_macros_modified(self, node, previous_macros):
def recursively_check_macros_modified(self, node, visited_macros):
# loop through all macros that this node depends on
for macro_uid in node.depends_on.macros:
# avoid infinite recursion if we've already seen this macro
if macro_uid in previous_macros:
if macro_uid in visited_macros:
continue
previous_macros.append(macro_uid)
visited_macros.append(macro_uid)
# is this macro one of the modified macros?
if macro_uid in self.modified_macros:
return True
# if not, and this macro depends on other macros, keep looping
macro_node = self.manifest.macros[macro_uid]
if len(macro_node.depends_on.macros) > 0:
return self.recursively_check_macros_modified(macro_node, previous_macros)
return self.recursively_check_macros_modified(macro_node, visited_macros)
# this macro hasn't been modified, but we haven't checked
# the other macros the node depends on, so keep looking
elif len(node.depends_on.macros) > len(visited_macros):
continue
else:
return False
@@ -440,8 +445,8 @@ class StateSelectorMethod(SelectorMethod):
return False
# recursively loop through upstream macros to see if any is modified
else:
previous_macros = []
return self.recursively_check_macros_modified(node, previous_macros)
visited_macros = []
return self.recursively_check_macros_modified(node, visited_macros)
# TODO check modifed_content and check_modified macro seems a bit redundent
def check_modified_content(self, old: Optional[SelectorTarget], new: SelectorTarget) -> bool:
@@ -522,6 +527,62 @@ class ResultSelectorMethod(SelectorMethod):
yield node
class SourceStatusSelectorMethod(SelectorMethod):
def search(self, included_nodes: Set[UniqueId], selector: str) -> Iterator[UniqueId]:
if self.previous_state is None or self.previous_state.sources is None:
raise InternalException(
"No previous state comparison freshness results in sources.json"
)
elif self.previous_state.sources_current is None:
raise InternalException(
"No current state comparison freshness results in sources.json"
)
current_state_sources = {
result.unique_id: getattr(result, "max_loaded_at", None)
for result in self.previous_state.sources_current.results
if hasattr(result, "max_loaded_at")
}
current_state_sources_runtime_error = {
result.unique_id
for result in self.previous_state.sources_current.results
if not hasattr(result, "max_loaded_at")
}
previous_state_sources = {
result.unique_id: getattr(result, "max_loaded_at", None)
for result in self.previous_state.sources.results
if hasattr(result, "max_loaded_at")
}
previous_state_sources_runtime_error = {
result.unique_id
for result in self.previous_state.sources_current.results
if not hasattr(result, "max_loaded_at")
}
matches = set()
if selector == "fresher":
for unique_id in current_state_sources:
if unique_id not in previous_state_sources:
matches.add(unique_id)
elif current_state_sources[unique_id] > previous_state_sources[unique_id]:
matches.add(unique_id)
for unique_id in matches:
if (
unique_id in previous_state_sources_runtime_error
or unique_id in current_state_sources_runtime_error
):
matches.remove(unique_id)
for node, real_node in self.all_nodes(included_nodes):
if node in matches:
yield node
class MethodManager:
SELECTOR_METHODS: Dict[MethodName, Type[SelectorMethod]] = {
MethodName.FQN: QualifiedNameSelectorMethod,
@@ -537,6 +598,7 @@ class MethodManager:
MethodName.Exposure: ExposureSelectorMethod,
MethodName.Metric: MetricSelectorMethod,
MethodName.Result: ResultSelectorMethod,
MethodName.SourceStatus: SourceStatusSelectorMethod,
}
def __init__(

View File

@@ -27,7 +27,7 @@ class IndirectSelection(StrEnum):
def _probably_path(value: str):
"""Decide if value is probably a path. Windows has two path separators, so
"""Decide if the value is probably a path. Windows has two path separators, so
we should check both sep ('\\') and altsep ('/') there.
"""
if os.path.sep in value:

View File

@@ -131,3 +131,10 @@ class Lazy(Generic[T]):
if self.memo is None:
self.memo = self._typed_eval_f()
return self.memo
# This class is used in to_target_dict, so that accesses to missing keys
# will return an empty string instead of Undefined
class DictDefaultEmptyStr(dict):
def __getitem__(self, key):
return dict.get(self, key, "")

View File

@@ -1 +1,15 @@
# Include README
# Include Module
The Include module is reponsible for housing default macro definitions, starter project scaffold, and the html file used to generate the docs page.
# Directories
## `global_project`
Defines the default implementations of jinja2 macros for `dbt-core` which can be overwritten in each adapter repo to work more in line with those adapter plugins. To view adapter specific jinja2 changes please check the relevant adapter repo [`adapter.sql` ](https://github.com/dbt-labs/dbt-bigquery/blob/main/dbt/include/bigquery/macros/adapters.sql) file in the `include` directory or in the [`impl.py`](https://github.com/dbt-labs/dbt-bigquery/blob/main/dbt/adapters/bigquery/impl.py) file for some ex. BigQuery (truncate_relation).
## `starter_project`
Produces the default project after running the `dbt init` command for the CLI. `dbt-cloud` initializes the project by using [dbt-starter-project](https://github.com/dbt-labs/dbt-starter-project).
# Files
- `index.html` a file generated from [dbt-docs](https://github.com/dbt-labs/dbt-docs) prior to new releases and replaced in the `dbt-core` directory. It is used to generate the docs page after using the `generate docs` command in dbt.

View File

@@ -1,6 +1,8 @@
{% macro default__test_not_null(model, column_name) %}
select *
{% set column_list = '*' if should_store_failures() else column_name %}
select {{ column_list }}
from {{ model }}
where {{ column_name }} is null

View File

@@ -56,13 +56,26 @@
{%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute="name")) -%}
{% if unique_key is not none %}
delete from {{ target }}
where ({{ unique_key }}) in (
select ({{ unique_key }})
from {{ source }}
);
{% endif %}
{% if unique_key %}
{% if unique_key is sequence and unique_key is not string %}
delete from {{target }}
using {{ source }}
where (
{% for key in unique_key %}
{{ source }}.{{ key }} = {{ target }}.{{ key }}
{{ "and " if not loop.last }}
{% endfor %}
);
{% else %}
delete from {{ target }}
where (
{{ unique_key }}) in (
select ({{ unique_key }})
from {{ source }}
);
{% endif %}
{% endif %}
insert into {{ target }} ({{ dest_cols_csv }})
(

View File

@@ -22,6 +22,13 @@
{# no-op #}
{% endmacro %}
{% macro get_true_sql() %}
{{ adapter.dispatch('get_true_sql', 'dbt')() }}
{% endmacro %}
{% macro default__get_true_sql() %}
{{ return('TRUE') }}
{% endmacro %}
{% macro snapshot_staging_table(strategy, source_sql, target_relation) -%}
{{ adapter.dispatch('snapshot_staging_table', 'dbt')(strategy, source_sql, target_relation) }}

View File

@@ -6,10 +6,6 @@
{%- set strategy_name = config.get('strategy') -%}
{%- set unique_key = config.get('unique_key') %}
{% if not adapter.check_schema_exists(model.database, model.schema) %}
{% do create_schema(model.database, model.schema) %}
{% endif %}
{% set target_relation_exists, target_relation = get_or_create_relation(
database=model.database,
schema=model.schema,

View File

@@ -132,17 +132,7 @@
{% set check_cols_config = config['check_cols'] %}
{% set primary_key = config['unique_key'] %}
{% set invalidate_hard_deletes = config.get('invalidate_hard_deletes', false) %}
{% set select_current_time -%}
select {{ snapshot_get_time() }} as snapshot_start
{%- endset %}
{#-- don't access the column by name, to avoid dealing with casing issues on snowflake #}
{%- set now = run_query(select_current_time)[0][0] -%}
{% if now is none or now is undefined -%}
{%- do exceptions.raise_compiler_error('Could not get a snapshot start time from the database') -%}
{%- endif %}
{% set updated_at = config.get('updated_at', snapshot_string_as_time(now)) %}
{% set updated_at = config.get('updated_at', snapshot_get_time()) %}
{% set column_added = false %}
@@ -157,7 +147,7 @@
{%- set row_changed_expr -%}
(
{%- if column_added -%}
TRUE
{{ get_true_sql() }}
{%- else -%}
{%- for col in check_cols -%}
{{ snapshotted_rel }}.{{ col }} != {{ current_rel }}.{{ col }}

File diff suppressed because one or more lines are too long

View File

@@ -15,26 +15,16 @@ import colorama
import logbook
from dbt.dataclass_schema import dbtClassMixin
# Colorama needs some help on windows because we're using logger.info
# intead of print(). If the Windows env doesn't have a TERM var set,
# then we should override the logging stream to use the colorama
# converter. If the TERM var is set (as with Git Bash), then it's safe
# to send escape characters and no log handler injection is needed.
colorama_stdout = sys.stdout
colorama_wrap = True
colorama.init(wrap=colorama_wrap)
if sys.platform == "win32" and not os.getenv("TERM"):
colorama_wrap = False
colorama_stdout = colorama.AnsiToWin32(sys.stdout).stream
elif sys.platform == "win32":
colorama_wrap = False
colorama.init(wrap=colorama_wrap)
# Colorama is needed for colored logs on Windows because we're using logger.info
# intead of print(). If the Windows env doesn't have a TERM var set or it is set to None
# (i.e. in the case of Git Bash on Windows- this emulates Unix), then it's safe to initialize
# Colorama with wrapping turned on which allows us to strip ANSI sequences from stdout.
# You can safely initialize Colorama for any OS and the coloring stays the same except
# when piped to anoter process for Linux and MacOS, then it loses the coloring. To combat
# that, we will just initialize Colorama when needed on Windows using a non-Unix terminal.
if sys.platform == "win32" and (not os.getenv("TERM") or os.getenv("TERM") == "None"):
colorama.init(wrap=True)
STDOUT_LOG_FORMAT = "{record.message}"
DEBUG_LOG_FORMAT = (
@@ -464,7 +454,7 @@ class DelayedFileHandler(logbook.RotatingFileHandler, FormatterMixin):
class LogManager(logbook.NestedSetup):
def __init__(self, stdout=colorama_stdout, stderr=sys.stderr):
def __init__(self, stdout=sys.stdout, stderr=sys.stderr):
self.stdout = stdout
self.stderr = stderr
self._null_handler = logbook.NullHandler()
@@ -632,7 +622,7 @@ def list_handler(
lst: Optional[List[LogMessage]],
level=logbook.NOTSET,
) -> ContextManager:
"""Return a context manager that temporarly attaches a list to the logger."""
"""Return a context manager that temporarily attaches a list to the logger."""
return ListLogHandler(lst=lst, level=level, bubble=True)

View File

@@ -5,6 +5,7 @@ import argparse
import os.path
import sys
import traceback
import warnings
from contextlib import contextmanager
from pathlib import Path
@@ -46,7 +47,7 @@ from dbt.exceptions import InternalException, NotImplementedException, FailedToC
class DBTVersion(argparse.Action):
"""This is very very similar to the builtin argparse._Version action,
"""This is very similar to the built-in argparse._Version action,
except it just calls dbt.version.get_version_information().
"""
@@ -118,6 +119,9 @@ class DBTArgumentParser(argparse.ArgumentParser):
def main(args=None):
# Logbook warnings are ignored so we don't have to fork logbook to support python 3.10.
# This _only_ works for regular cli invocations.
warnings.filterwarnings("ignore", category=DeprecationWarning, module="logbook")
if args is None:
args = sys.argv[1:]
with log_manager.applicationbound():
@@ -1084,6 +1088,36 @@ def parse_args(args, cls=DBTArgumentParser):
""",
)
p.add_argument(
"--no-print",
action="store_true",
default=None,
help="""
Suppress all {{ print() }} macro calls.
""",
)
schema_cache_flag = p.add_mutually_exclusive_group()
schema_cache_flag.add_argument(
"--cache-selected-only",
action="store_const",
const=True,
default=None,
dest="cache_selected_only",
help="""
Pre cache database objects relevant to selected resource only.
""",
)
schema_cache_flag.add_argument(
"--no-cache-selected-only",
action="store_const",
const=False,
dest="cache_selected_only",
help="""
Pre cache all database objects related to the project.
""",
)
subs = p.add_subparsers(title="Available sub-commands")
base_subparser = _build_base_subparser()

View File

@@ -25,9 +25,13 @@ from dbt.exceptions import raise_compiler_error, raise_parsing_error
from dbt.parser.search import FileBlock
def get_nice_generic_test_name(
def synthesize_generic_test_names(
test_type: str, test_name: str, args: Dict[str, Any]
) -> Tuple[str, str]:
# Using the type, name, and arguments to this generic test, synthesize a (hopefully) unique name
# Will not be unique if multiple tests have same name + arguments, and only configs differ
# Returns a shorter version (hashed/truncated, for the compiled file)
# as well as the full name (for the unique_id + FQN)
flat_args = []
for arg_name in sorted(args):
# the model is already embedded in the name, so skip it
@@ -263,13 +267,25 @@ class TestBuilder(Generic[Testable]):
if self.namespace is not None:
self.package_name = self.namespace
compiled_name, fqn_name = self.get_test_name()
self.compiled_name: str = compiled_name
self.fqn_name: str = fqn_name
# If the user has provided a custom name for this generic test, use it
# Then delete the "name" argument to avoid passing it into the test macro
# Otherwise, use an auto-generated name synthesized from test inputs
self.compiled_name: str = ""
self.fqn_name: str = ""
# use hashed name as alias if too long
if compiled_name != fqn_name and "alias" not in self.config:
self.config["alias"] = compiled_name
if "name" in self.args:
# Assign the user-defined name here, which will be checked for uniqueness later
# we will raise an error if two tests have same name for same model + column combo
self.compiled_name = self.args["name"]
self.fqn_name = self.args["name"]
del self.args["name"]
else:
short_name, full_name = self.get_synthetic_test_names()
self.compiled_name = short_name
self.fqn_name = full_name
# use hashed name as alias if full name is too long
if short_name != full_name and "alias" not in self.config:
self.config["alias"] = short_name
def _bad_type(self) -> TypeError:
return TypeError('invalid target type "{}"'.format(type(self.target)))
@@ -281,13 +297,23 @@ class TestBuilder(Generic[Testable]):
"test must be dict or str, got {} (value {})".format(type(test), test)
)
test = list(test.items())
if len(test) != 1:
raise_parsing_error(
"test definition dictionary must have exactly one key, got"
" {} instead ({} keys)".format(test, len(test))
)
test_name, test_args = test[0]
# If the test is a dictionary with top-level keys, the test name is "test_name"
# and the rest are arguments
# {'name': 'my_favorite_test', 'test_name': 'unique', 'config': {'where': '1=1'}}
if "test_name" in test.keys():
test_name = test.pop("test_name")
test_args = test
# If the test is a nested dictionary with one top-level key, the test name
# is the dict name, and nested keys are arguments
# {'unique': {'name': 'my_favorite_test', 'config': {'where': '1=1'}}}
else:
test = list(test.items())
if len(test) != 1:
raise_parsing_error(
"test definition dictionary must have exactly one key, got"
" {} instead ({} keys)".format(test, len(test))
)
test_name, test_args = test[0]
if not isinstance(test_args, dict):
raise_parsing_error(
@@ -401,7 +427,8 @@ class TestBuilder(Generic[Testable]):
macro_name = "{}.{}".format(self.namespace, macro_name)
return macro_name
def get_test_name(self) -> Tuple[str, str]:
def get_synthetic_test_names(self) -> Tuple[str, str]:
# Returns two names: shorter (for the compiled file), full (for the unique_id + FQN)
if isinstance(self.target, UnparsedNodeUpdate):
name = self.name
elif isinstance(self.target, UnpatchedSourceDefinition):
@@ -410,7 +437,7 @@ class TestBuilder(Generic[Testable]):
raise self._bad_type()
if self.namespace is not None:
name = "{}_{}".format(self.namespace, name)
return get_nice_generic_test_name(name, self.target.name, self.args)
return synthesize_generic_test_names(name, self.target.name, self.args)
def construct_config(self) -> str:
configs = ",".join(

View File

@@ -465,6 +465,8 @@ class ManifestLoader:
else:
dct = block.file.dict_from_yaml
parser.parse_file(block, dct=dct)
# Came out of here with UnpatchedSourceDefinition containing configs at the source level
# and not configs at the table level (as expected)
else:
parser.parse_file(block)
project_parsed_path_count += 1
@@ -659,7 +661,8 @@ class ManifestLoader:
reparse_reason = ReparseReason.file_not_found
# this event is only fired if a full reparse is needed
dbt.tracking.track_partial_parser({"full_reparse_reason": reparse_reason})
if dbt.tracking.active_user is not None: # no active_user if doing load_macros
dbt.tracking.track_partial_parser({"full_reparse_reason": reparse_reason})
return None
@@ -777,9 +780,13 @@ class ManifestLoader:
cls,
root_config: RuntimeConfig,
macro_hook: Callable[[Manifest], Any],
base_macros_only=False,
) -> Manifest:
with PARSING_STATE:
projects = root_config.load_dependencies()
# base_only/base_macros_only: for testing only,
# allows loading macros without running 'dbt deps' first
projects = root_config.load_dependencies(base_only=base_macros_only)
# This creates a loader object, including result,
# and then throws it away, returning only the
# manifest
@@ -1035,7 +1042,7 @@ def _process_docs_for_metrics(context: Dict[str, Any], metric: ParsedMetric) ->
def _process_refs_for_exposure(manifest: Manifest, current_project: str, exposure: ParsedExposure):
"""Given a manifest and a exposure in that manifest, process its refs"""
"""Given a manifest and exposure in that manifest, process its refs"""
for ref in exposure.refs:
target_model: Optional[Union[Disabled, ManifestNode]] = None
target_model_name: str

View File

@@ -291,10 +291,10 @@ class PartialParsing:
if self.already_scheduled_for_parsing(old_source_file):
return
# These files only have one node.
unique_id = None
# These files only have one node except for snapshots
unique_ids = []
if old_source_file.nodes:
unique_id = old_source_file.nodes[0]
unique_ids = old_source_file.nodes
else:
# It's not clear when this would actually happen.
# Logging in case there are other associated errors.
@@ -305,7 +305,7 @@ class PartialParsing:
self.deleted_manifest.files[file_id] = old_source_file
self.saved_files[file_id] = deepcopy(new_source_file)
self.add_to_pp_files(new_source_file)
if unique_id:
for unique_id in unique_ids:
self.remove_node_in_saved(new_source_file, unique_id)
def remove_node_in_saved(self, source_file, unique_id):
@@ -379,7 +379,7 @@ class PartialParsing:
if not source_file.nodes:
fire_event(PartialParsingMissingNodes(file_id=source_file.file_id))
return
# There is generally only 1 node for SQL files, except for macros
# There is generally only 1 node for SQL files, except for macros and snapshots
for unique_id in source_file.nodes:
self.remove_node_in_saved(source_file, unique_id)
self.schedule_referencing_nodes_for_parsing(unique_id)
@@ -573,6 +573,10 @@ class PartialParsing:
# doc source_file.nodes
self.schedule_nodes_for_parsing(source_file.nodes)
source_file.nodes = []
# Remove the file object
self.deleted_manifest.files[source_file.file_id] = self.saved_manifest.files.pop(
source_file.file_id
)
# Schema files -----------------------
# Changed schema files

View File

@@ -49,7 +49,7 @@ from dbt.exceptions import (
warn_invalid_patch,
validator_error_message,
JSONValidationException,
raise_invalid_schema_yml_version,
raise_invalid_property_yml_version,
ValidationException,
ParsingException,
raise_duplicate_patch_name,
@@ -94,7 +94,10 @@ schema_file_keys = (
def error_context(
path: str, key: str, data: Any, cause: Union[str, ValidationException, JSONValidationException]
path: str,
key: str,
data: Any,
cause: Union[str, ValidationException, JSONValidationException],
) -> str:
"""Provide contextual information about an error while parsing"""
if isinstance(cause, str):
@@ -112,7 +115,8 @@ def yaml_from_file(source_file: SchemaSourceFile) -> Dict[str, Any]:
"""If loading the yaml fails, raise an exception."""
path = source_file.path.relative_path
try:
return load_yaml_text(source_file.contents)
# source_file.contents can sometimes be None
return load_yaml_text(source_file.contents or "")
except ValidationException as e:
reason = validator_error_message(e)
raise ParsingException(
@@ -544,15 +548,23 @@ class SchemaParser(SimpleParser[GenericTestBlock, ParsedGenericTestNode]):
def check_format_version(file_path, yaml_dct) -> None:
if "version" not in yaml_dct:
raise_invalid_schema_yml_version(file_path, "no version is specified")
raise_invalid_property_yml_version(
file_path, "the yml property file {} is missing a version tag".format(file_path)
)
version = yaml_dct["version"]
# if it's not an integer, the version is malformed, or not
# set. Either way, only 'version: 2' is supported.
if not isinstance(version, int):
raise_invalid_schema_yml_version(file_path, "the version is not an integer")
raise_invalid_property_yml_version(
file_path,
"its 'version:' tag must be an integer (e.g. version: 2)."
" {} is not an integer".format(version),
)
if version != 2:
raise_invalid_schema_yml_version(file_path, "version {} is not supported".format(version))
raise_invalid_property_yml_version(
file_path, "its 'version:' tag is set to {}. Only 2 is supported".format(version)
)
Parsed = TypeVar("Parsed", UnpatchedSourceDefinition, ParsedNodePatch, ParsedMacroPatch)

View File

@@ -1,6 +1,6 @@
import itertools
from pathlib import Path
from typing import Iterable, Dict, Optional, Set, List, Any
from typing import Iterable, Dict, Optional, Set, Any
from dbt.adapters.factory import get_adapter
from dbt.config import RuntimeConfig
from dbt.context.context_config import (
@@ -137,15 +137,13 @@ class SourcePatcher:
tags = sorted(set(itertools.chain(source.tags, table.tags)))
config = self._generate_source_config(
fqn=target.fqn,
target=target,
rendered=True,
project_name=target.package_name,
)
unrendered_config = self._generate_source_config(
fqn=target.fqn,
target=target,
rendered=False,
project_name=target.package_name,
)
if not isinstance(config, SourceConfig):
@@ -261,19 +259,29 @@ class SourcePatcher:
)
return node
def _generate_source_config(self, fqn: List[str], rendered: bool, project_name: str):
def _generate_source_config(self, target: UnpatchedSourceDefinition, rendered: bool):
generator: BaseContextConfigGenerator
if rendered:
generator = ContextConfigGenerator(self.root_project)
else:
generator = UnrenderedConfigGenerator(self.root_project)
# configs with precendence set
precedence_configs = dict()
# first apply source configs
precedence_configs.update(target.source.config)
# then overrite anything that is defined on source tables
# this is not quite complex enough for configs that can be set as top-level node keys, but
# it works while source configs can only include `enabled`.
precedence_configs.update(target.table.config)
return generator.calculate_node_config(
config_call_dict={},
fqn=fqn,
fqn=target.fqn,
resource_type=NodeType.Source,
project_name=project_name,
project_name=target.package_name,
base=False,
patch_config_dict=precedence_configs,
)
def _get_relation_name(self, node: ParsedSourceDefinition):

View File

@@ -0,0 +1,6 @@
SELECTED_RESOURCES = []
def set_selected_resources(selected_resources):
global SELECTED_RESOURCES
SELECTED_RESOURCES = list(selected_resources)

View File

@@ -41,7 +41,7 @@ CATALOG_FILENAME = "catalog.json"
def get_stripped_prefix(source: Dict[str, Any], prefix: str) -> Dict[str, Any]:
"""Go through source, extracting every key/value pair where the key starts
"""Go through the source, extracting every key/value pair where the key starts
with the given prefix.
"""
cut = len(prefix)

View File

@@ -436,8 +436,9 @@ class RunTask(CompileTask):
def before_run(self, adapter, selected_uids: AbstractSet[str]):
with adapter.connection_named("master"):
self.create_schemas(adapter, selected_uids)
self.populate_adapter_cache(adapter)
required_schemas = self.get_model_schemas(adapter, selected_uids)
self.create_schemas(adapter, required_schemas)
self.populate_adapter_cache(adapter, required_schemas)
self.defer_to_manifest(adapter, selected_uids)
self.safe_run_hooks(adapter, RunHookType.Start, {})

View File

@@ -1,6 +1,7 @@
import os
import time
import json
from pathlib import Path
from abc import abstractmethod
from concurrent.futures import as_completed
from datetime import datetime
@@ -50,6 +51,7 @@ from dbt.exceptions import (
from dbt.graph import GraphQueue, NodeSelector, SelectionSpec, parse_difference, Graph
from dbt.parser.manifest import ManifestLoader
import dbt.tracking
import dbt.exceptions
from dbt import flags
@@ -88,7 +90,12 @@ class ManifestTask(ConfiguredTask):
def _runtime_initialize(self):
self.load_manifest()
start_compile_manifest = time.perf_counter()
self.compile_manifest()
compile_time = time.perf_counter() - start_compile_manifest
if dbt.tracking.active_user is not None:
dbt.tracking.track_runnable_timing({"graph_compilation_elapsed": compile_time})
class GraphRunnableTask(ManifestTask):
@@ -110,7 +117,9 @@ class GraphRunnableTask(ManifestTask):
def set_previous_state(self):
if self.args.state is not None:
self.previous_state = PreviousState(self.args.state)
self.previous_state = PreviousState(
path=self.args.state, current_path=Path(self.config.target_path)
)
def index_offset(self, value: int) -> int:
return value
@@ -390,8 +399,17 @@ class GraphRunnableTask(ManifestTask):
for dep_node_id in self.graph.get_dependent_nodes(node_id):
self._skipped_children[dep_node_id] = cause
def populate_adapter_cache(self, adapter):
adapter.set_relations_cache(self.manifest)
def populate_adapter_cache(self, adapter, required_schemas: Set[BaseRelation] = None):
start_populate_cache = time.perf_counter()
if flags.CACHE_SELECTED_ONLY is True:
adapter.set_relations_cache(self.manifest, required_schemas=required_schemas)
else:
adapter.set_relations_cache(self.manifest)
cache_populate_time = time.perf_counter() - start_populate_cache
if dbt.tracking.active_user is not None:
dbt.tracking.track_runnable_timing(
{"adapter_cache_construction_elapsed": cache_populate_time}
)
def before_hooks(self, adapter):
pass
@@ -489,8 +507,7 @@ class GraphRunnableTask(ManifestTask):
return result
def create_schemas(self, adapter, selected_uids: Iterable[str]):
required_schemas = self.get_model_schemas(adapter, selected_uids)
def create_schemas(self, adapter, required_schemas: Set[BaseRelation]):
# we want the string form of the information schema database
required_databases: Set[BaseRelation] = set()
for required in required_schemas:

View File

@@ -230,6 +230,8 @@ class TestTask(RunTask):
constraints are satisfied.
"""
__test__ = False
def raise_on_first_error(self):
return False

1
core/dbt/tests/fixtures/__init__.py vendored Normal file
View File

@@ -0,0 +1 @@
# dbt.tests.fixtures directory

462
core/dbt/tests/fixtures/project.py vendored Normal file
View File

@@ -0,0 +1,462 @@
import os
import pytest # type: ignore
import random
from argparse import Namespace
from datetime import datetime
import warnings
import yaml
import dbt.flags as flags
from dbt.config.runtime import RuntimeConfig
from dbt.adapters.factory import get_adapter, register_adapter, reset_adapters, get_adapter_by_type
from dbt.events.functions import setup_event_logger
from dbt.tests.util import (
write_file,
run_sql_with_adapter,
TestProcessingException,
get_connection,
)
# These are the fixtures that are used in dbt core functional tests
#
# The main functional test fixture is the 'project' fixture, which combines
# other fixtures, writes out a dbt project in a temporary directory, creates a temp
# schema in the testing database, and returns a `TestProjInfo` object that
# contains information from the other fixtures for convenience.
#
# The models, macros, seeds, snapshots, tests, and analysis fixtures all
# represent directories in a dbt project, and are all dictionaries with
# file name keys and file contents values.
#
# The other commonly used fixture is 'project_config_update'. Other
# occasionally used fixtures are 'profiles_config_update', 'packages',
# and 'selectors'.
#
# Most test cases have fairly small files which are best included in
# the test case file itself as string variables, to make it easy to
# understand what is happening in the test. Files which are used
# in multiple test case files can be included in a common file, such as
# files.py or fixtures.py. Large files, such as seed files, which would
# just clutter the test file can be pulled in from 'data' subdirectories
# in the test directory.
#
# Test logs are written in the 'logs' directory in the root of the repo.
# Every test case writes to a log directory with the same 'prefix' as the
# test's unique schema.
#
# These fixture have "class" scope. Class scope fixtures can be used both
# in classes and in single test functions (which act as classes for this
# purpose). Pytest will collect all classes starting with 'Test', so if
# you have a class that you want to be subclassed, it's generally best to
# not start the class name with 'Test'. All standalone functions starting with
# 'test_' and methods in classes starting with 'test_' (in classes starting
# with 'Test') will be collected.
#
# Please see the pytest docs for further information:
# https://docs.pytest.org
# Used in constructing the unique_schema and logs_dir
@pytest.fixture(scope="class")
def prefix():
# create a directory name that will be unique per test session
_randint = random.randint(0, 9999)
_runtime_timedelta = datetime.utcnow() - datetime(1970, 1, 1, 0, 0, 0)
_runtime = (int(_runtime_timedelta.total_seconds() * 1e6)) + _runtime_timedelta.microseconds
prefix = f"test{_runtime}{_randint:04}"
return prefix
# Every test has a unique schema
@pytest.fixture(scope="class")
def unique_schema(request, prefix) -> str:
test_file = request.module.__name__
# We only want the last part of the name
test_file = test_file.split(".")[-1]
unique_schema = f"{prefix}_{test_file}"
return unique_schema
# Create a directory for the profile using tmpdir fixture
@pytest.fixture(scope="class")
def profiles_root(tmpdir_factory):
return tmpdir_factory.mktemp("profile")
# Create a directory for the project using tmpdir fixture
@pytest.fixture(scope="class")
def project_root(tmpdir_factory):
# tmpdir docs - https://docs.pytest.org/en/6.2.x/tmpdir.html
project_root = tmpdir_factory.mktemp("project")
print(f"\n=== Test project_root: {project_root}")
return project_root
# This is for data used by multiple tests, in the 'tests/data' directory
@pytest.fixture(scope="session")
def shared_data_dir(request):
return os.path.join(request.config.rootdir, "tests", "data")
# This is for data for a specific test directory, i.e. tests/basic/data
@pytest.fixture(scope="module")
def test_data_dir(request):
return os.path.join(request.fspath.dirname, "data")
# This contains the profile target information, for simplicity in setting
# up different profiles, particularly in the adapter repos.
# Note: because we load the profile to create the adapter, this
# fixture can't be used to test vars and env_vars or errors. The
# profile must be written out after the test starts.
@pytest.fixture(scope="class")
def dbt_profile_target():
return {
"type": "postgres",
"threads": 4,
"host": "localhost",
"port": int(os.getenv("POSTGRES_TEST_PORT", 5432)),
"user": os.getenv("POSTGRES_TEST_USER", "root"),
"pass": os.getenv("POSTGRES_TEST_PASS", "password"),
"dbname": os.getenv("POSTGRES_TEST_DATABASE", "dbt"),
}
# This fixture can be overridden in a project. The data provided in this
# fixture will be merged into the default project dictionary via a python 'update'.
@pytest.fixture(scope="class")
def profiles_config_update():
return {}
# The profile dictionary, used to write out profiles.yml. It will pull in updates
# from two separate sources, the 'profile_target' and 'profiles_config_update'.
# The second one is useful when using alternative targets, etc.
@pytest.fixture(scope="class")
def dbt_profile_data(unique_schema, dbt_profile_target, profiles_config_update):
profile = {
"config": {"send_anonymous_usage_stats": False},
"test": {
"outputs": {
"default": {},
},
"target": "default",
},
}
target = dbt_profile_target
target["schema"] = unique_schema
profile["test"]["outputs"]["default"] = target
if profiles_config_update:
profile.update(profiles_config_update)
return profile
# Write out the profile data as a yaml file
@pytest.fixture(scope="class")
def profiles_yml(profiles_root, dbt_profile_data):
os.environ["DBT_PROFILES_DIR"] = str(profiles_root)
write_file(yaml.safe_dump(dbt_profile_data), profiles_root, "profiles.yml")
yield dbt_profile_data
del os.environ["DBT_PROFILES_DIR"]
# Data used to update the dbt_project config data.
@pytest.fixture(scope="class")
def project_config_update():
return {}
# Combines the project_config_update dictionary with project_config defaults to
# produce a project_yml config and write it out as dbt_project.yml
@pytest.fixture(scope="class")
def dbt_project_yml(project_root, project_config_update, logs_dir):
project_config = {
"config-version": 2,
"name": "test",
"version": "0.1.0",
"profile": "test",
"log-path": logs_dir,
}
if project_config_update:
project_config.update(project_config_update)
write_file(yaml.safe_dump(project_config), project_root, "dbt_project.yml")
return project_config
# Fixture to provide packages as either yaml or dictionary
@pytest.fixture(scope="class")
def packages():
return {}
# Write out the packages.yml file
@pytest.fixture(scope="class")
def packages_yml(project_root, packages):
if packages:
if isinstance(packages, str):
data = packages
else:
data = yaml.safe_dump(packages)
write_file(data, project_root, "packages.yml")
# Fixture to provide selectors as either yaml or dictionary
@pytest.fixture(scope="class")
def selectors():
return {}
# Write out the selectors.yml file
@pytest.fixture(scope="class")
def selectors_yml(project_root, selectors):
if selectors:
if isinstance(selectors, str):
data = selectors
else:
data = yaml.safe_dump(selectors)
write_file(data, project_root, "selectors.yml")
# This creates an adapter that is used for running test setup, such as creating
# the test schema, and sql commands that are run in tests prior to the first
# dbt command. After a dbt command is run, the project.adapter property will
# return the current adapter (for this adapter type) from the adapter factory.
# The adapter produced by this fixture will contain the "base" macros (not including
# macros from dependencies).
#
# Anything used here must be actually working (dbt_project, profile, project and internal macros),
# otherwise this will fail. So to test errors in those areas, you need to copy the files
# into the project in the tests instead of putting them in the fixtures.
@pytest.fixture(scope="class")
def adapter(unique_schema, project_root, profiles_root, profiles_yml, dbt_project_yml):
# The profiles.yml and dbt_project.yml should already be written out
args = Namespace(
profiles_dir=str(profiles_root), project_dir=str(project_root), target=None, profile=None
)
flags.set_from_args(args, {})
runtime_config = RuntimeConfig.from_args(args)
register_adapter(runtime_config)
adapter = get_adapter(runtime_config)
# We only need the base macros, not macros from dependencies, and don't want
# to run 'dbt deps' here.
adapter.load_macro_manifest(base_macros_only=True)
yield adapter
adapter.cleanup_connections()
reset_adapters()
# Start at directory level.
def write_project_files(project_root, dir_name, file_dict):
path = project_root.mkdir(dir_name)
if file_dict:
write_project_files_recursively(path, file_dict)
# Write files out from file_dict. Can be nested directories...
def write_project_files_recursively(path, file_dict):
if type(file_dict) is not dict:
raise TestProcessingException(f"Error creating {path}. Did you forget the file extension?")
for name, value in file_dict.items():
if name.endswith(".sql") or name.endswith(".csv") or name.endswith(".md"):
write_file(value, path, name)
elif name.endswith(".yml") or name.endswith(".yaml"):
if isinstance(value, str):
data = value
else:
data = yaml.safe_dump(value)
write_file(data, path, name)
else:
write_project_files_recursively(path.mkdir(name), value)
# models, macros, seeds, snapshots, tests, analysis
# Provide a dictionary of file names to contents. Nested directories
# are handle by nested dictionaries.
# models directory
@pytest.fixture(scope="class")
def models():
return {}
# macros directory
@pytest.fixture(scope="class")
def macros():
return {}
# seeds directory
@pytest.fixture(scope="class")
def seeds():
return {}
# snapshots directory
@pytest.fixture(scope="class")
def snapshots():
return {}
# tests directory
@pytest.fixture(scope="class")
def tests():
return {}
# analysis directory
@pytest.fixture(scope="class")
def analysis():
return {}
# Write out the files provided by models, macros, snapshots, seeds, tests, analysis
@pytest.fixture(scope="class")
def project_files(project_root, models, macros, snapshots, seeds, tests, analysis):
write_project_files(project_root, "models", models)
write_project_files(project_root, "macros", macros)
write_project_files(project_root, "snapshots", snapshots)
write_project_files(project_root, "seeds", seeds)
write_project_files(project_root, "tests", tests)
write_project_files(project_root, "analysis", analysis)
# We have a separate logs dir for every test
@pytest.fixture(scope="class")
def logs_dir(request, prefix):
return os.path.join(request.config.rootdir, "logs", prefix)
# This fixture is for customizing tests that need overrides in adapter
# repos. Example in dbt.tests.adapter.basic.test_base.
@pytest.fixture(scope="class")
def test_config():
return {}
# This class is returned from the 'project' fixture, and contains information
# from the pytest fixtures that may be needed in the test functions, including
# a 'run_sql' method.
class TestProjInfo:
def __init__(
self,
project_root,
profiles_dir,
adapter_type,
test_dir,
shared_data_dir,
test_data_dir,
test_schema,
database,
test_config,
):
self.project_root = project_root
self.profiles_dir = profiles_dir
self.adapter_type = adapter_type
self.test_dir = test_dir
self.shared_data_dir = shared_data_dir
self.test_data_dir = test_data_dir
self.test_schema = test_schema
self.database = database
self.test_config = test_config
@property
def adapter(self):
# This returns the last created "adapter" from the adapter factory. Each
# dbt command will create a new one. This allows us to avoid patching the
# providers 'get_adapter' function.
return get_adapter_by_type(self.adapter_type)
# Run sql from a path
def run_sql_file(self, sql_path, fetch=None):
with open(sql_path, "r") as f:
statements = f.read().split(";")
for statement in statements:
self.run_sql(statement, fetch)
# Run sql from a string, using adapter saved at test startup
def run_sql(self, sql, fetch=None):
return run_sql_with_adapter(self.adapter, sql, fetch=fetch)
# Create the unique test schema. Used in test setup, so that we're
# ready for initial sql prior to a run_dbt command.
def create_test_schema(self):
with get_connection(self.adapter):
relation = self.adapter.Relation.create(
database=self.database, schema=self.test_schema
)
self.adapter.create_schema(relation)
# Drop the unique test schema, usually called in test cleanup
def drop_test_schema(self):
with get_connection(self.adapter):
relation = self.adapter.Relation.create(
database=self.database, schema=self.test_schema
)
self.adapter.drop_schema(relation)
# This return a dictionary of table names to 'view' or 'table' values.
def get_tables_in_schema(self):
sql = """
select table_name,
case when table_type = 'BASE TABLE' then 'table'
when table_type = 'VIEW' then 'view'
else table_type
end as materialization
from information_schema.tables
where {}
order by table_name
"""
sql = sql.format("{} ilike '{}'".format("table_schema", self.test_schema))
result = self.run_sql(sql, fetch="all")
return {model_name: materialization for (model_name, materialization) in result}
# This is the main fixture that is used in all functional tests. It pulls in the other
# fixtures that are necessary to set up a dbt project, and saves some of the information
# in a TestProjInfo class, which it returns, so that individual test cases do not have
# to pull in the other fixtures individually to access their information.
@pytest.fixture(scope="class")
def project(
project_root,
profiles_root,
request,
unique_schema,
profiles_yml,
dbt_project_yml,
packages_yml,
selectors_yml,
adapter,
project_files,
shared_data_dir,
test_data_dir,
logs_dir,
test_config,
):
# Logbook warnings are ignored so we don't have to fork logbook to support python 3.10.
# This _only_ works for tests in `tests/` that use the project fixture.
warnings.filterwarnings("ignore", category=DeprecationWarning, module="logbook")
setup_event_logger(logs_dir)
orig_cwd = os.getcwd()
os.chdir(project_root)
# Return whatever is needed later in tests but can only come from fixtures, so we can keep
# the signatures in the test signature to a minimum.
project = TestProjInfo(
project_root=project_root,
profiles_dir=profiles_root,
adapter_type=adapter.type(),
test_dir=request.fspath.dirname,
shared_data_dir=shared_data_dir,
test_data_dir=test_data_dir,
test_schema=unique_schema,
database=adapter.config.credentials.database,
test_config=test_config,
)
project.drop_test_schema()
project.create_test_schema()
yield project
project.drop_test_schema()
os.chdir(orig_cwd)

432
core/dbt/tests/util.py Normal file
View File

@@ -0,0 +1,432 @@
import os
import shutil
import yaml
import json
import warnings
from typing import List
from contextlib import contextmanager
from dbt.main import handle_and_check
from dbt.logger import log_manager
from dbt.contracts.graph.manifest import Manifest
from dbt.events.functions import fire_event, capture_stdout_logs, stop_capture_stdout_logs
from dbt.events.test_types import IntegrationTestDebug
# =============================================================================
# Test utilities
# run_dbt
# run_dbt_and_capture
# get_manifest
# copy_file
# rm_file
# write_file
# read_file
# get_artifact
# update_config_file
# write_config_file
# get_unique_ids_in_results
# check_result_nodes_by_name
# check_result_nodes_by_unique_id
# SQL related utilities that use the adapter
# run_sql_with_adapter
# relation_from_name
# check_relation_types (table/view)
# check_relations_equal
# check_relations_equal_with_relations
# check_table_does_exist
# check_table_does_not_exist
# get_relation_columns
# update_rows
# generate_update_clause
# =============================================================================
# 'run_dbt' is used in pytest tests to run dbt commands. It will return
# different objects depending on the command that is executed.
# For a run command (and most other commands) it will return a list
# of results. For the 'docs generate' command it returns a CatalogArtifact.
# The first parameter is a list of dbt command line arguments, such as
# run_dbt(["run", "--vars", "seed_name: base"])
# If the command is expected to fail, pass in "expect_pass=False"):
# run_dbt("test"], expect_pass=False)
def run_dbt(args: List[str] = None, expect_pass=True):
# Ignore logbook warnings
warnings.filterwarnings("ignore", category=DeprecationWarning, module="logbook")
# The logger will complain about already being initialized if
# we don't do this.
log_manager.reset_handlers()
if args is None:
args = ["run"]
print("\n\nInvoking dbt with {}".format(args))
res, success = handle_and_check(args)
assert success == expect_pass, "dbt exit state did not match expected"
return res
# Use this if you need to capture the command logs in a test
def run_dbt_and_capture(args: List[str] = None, expect_pass=True):
try:
stringbuf = capture_stdout_logs()
res = run_dbt(args, expect_pass=expect_pass)
stdout = stringbuf.getvalue()
finally:
stop_capture_stdout_logs()
return res, stdout
# Used in test cases to get the manifest from the partial parsing file
# Note: this uses an internal version of the manifest, and in the future
# parts of it will not be supported for external use.
def get_manifest(project_root):
path = os.path.join(project_root, "target", "partial_parse.msgpack")
if os.path.exists(path):
with open(path, "rb") as fp:
manifest_mp = fp.read()
manifest: Manifest = Manifest.from_msgpack(manifest_mp)
return manifest
else:
return None
# Used in tests to copy a file, usually from a data directory to the project directory
def copy_file(src_path, src, dest_path, dest) -> None:
# dest is a list, so that we can provide nested directories, like 'models' etc.
# copy files from the data_dir to appropriate project directory
shutil.copyfile(
os.path.join(src_path, src),
os.path.join(dest_path, *dest),
)
# Used in tests when you want to remove a file from the project directory
def rm_file(*paths) -> None:
# remove files from proj_path
os.remove(os.path.join(*paths))
# Used in tests to write out the string contents of a file to a
# file in the project directory.
# We need to explicitly use encoding="utf-8" because otherwise on
# Windows we'll get codepage 1252 and things might break
def write_file(contents, *paths):
with open(os.path.join(*paths), "w", encoding="utf-8") as fp:
fp.write(contents)
# Used in test utilities
def read_file(*paths):
contents = ""
with open(os.path.join(*paths), "r") as fp:
contents = fp.read()
return contents
# Get an artifact (usually from the target directory) such as
# manifest.json or catalog.json to use in a test
def get_artifact(*paths):
contents = read_file(*paths)
dct = json.loads(contents)
return dct
# For updating yaml config files
def update_config_file(updates, *paths):
current_yaml = read_file(*paths)
config = yaml.safe_load(current_yaml)
config.update(updates)
new_yaml = yaml.safe_dump(config)
write_file(new_yaml, *paths)
# Write new config file
def write_config_file(data, *paths):
if type(data) is dict:
data = yaml.safe_dump(data)
write_file(data, *paths)
# Get the unique_ids in dbt command results
def get_unique_ids_in_results(results):
unique_ids = []
for result in results:
unique_ids.append(result.node.unique_id)
return unique_ids
# Check the nodes in the results returned by a dbt run command
def check_result_nodes_by_name(results, names):
result_names = []
for result in results:
result_names.append(result.node.name)
assert set(names) == set(result_names)
# Check the nodes in the results returned by a dbt run command
def check_result_nodes_by_unique_id(results, unique_ids):
result_unique_ids = []
for result in results:
result_unique_ids.append(result.node.unique_id)
assert set(unique_ids) == set(result_unique_ids)
class TestProcessingException(Exception):
pass
# Testing utilities that use adapter code
# Uses:
# adapter.config.credentials
# adapter.quote
# adapter.run_sql_for_tests
def run_sql_with_adapter(adapter, sql, fetch=None):
if sql.strip() == "":
return
# substitute schema and database in sql
kwargs = {
"schema": adapter.config.credentials.schema,
"database": adapter.quote(adapter.config.credentials.database),
}
sql = sql.format(**kwargs)
msg = f'test connection "__test" executing: {sql}'
fire_event(IntegrationTestDebug(msg=msg))
with get_connection(adapter) as conn:
return adapter.run_sql_for_tests(sql, fetch, conn)
# Get a Relation object from the identifer (name of table/view).
# Uses the default database and schema. If you need a relation
# with a different schema, it should be constructed in the test.
# Uses:
# adapter.Relation
# adapter.config.credentials
# Relation.get_default_quote_policy
# Relation.get_default_include_policy
def relation_from_name(adapter, name: str):
"""reverse-engineer a relation from a given name and
the adapter. The relation name is split by the '.' character.
"""
# Different adapters have different Relation classes
cls = adapter.Relation
credentials = adapter.config.credentials
quote_policy = cls.get_default_quote_policy().to_dict()
include_policy = cls.get_default_include_policy().to_dict()
# Make sure we have database/schema/identifier parts, even if
# only identifier was supplied.
relation_parts = name.split(".")
if len(relation_parts) == 1:
relation_parts.insert(0, credentials.schema)
if len(relation_parts) == 2:
relation_parts.insert(0, credentials.database)
kwargs = {
"database": relation_parts[0],
"schema": relation_parts[1],
"identifier": relation_parts[2],
}
relation = cls.create(
include_policy=include_policy,
quote_policy=quote_policy,
**kwargs,
)
return relation
# Ensure that models with different materialiations have the
# corrent table/view.
# Uses:
# adapter.list_relations_without_caching
def check_relation_types(adapter, relation_to_type):
"""
Relation name to table/view
{
"base": "table",
"other": "view",
}
"""
expected_relation_values = {}
found_relations = []
schemas = set()
for key, value in relation_to_type.items():
relation = relation_from_name(adapter, key)
expected_relation_values[relation] = value
schemas.add(relation.without_identifier())
with get_connection(adapter):
for schema in schemas:
found_relations.extend(adapter.list_relations_without_caching(schema))
for key, value in relation_to_type.items():
for relation in found_relations:
# this might be too broad
if relation.identifier == key:
assert relation.type == value, (
f"Got an unexpected relation type of {relation.type} "
f"for relation {key}, expected {value}"
)
# Replaces assertTablesEqual. assertManyTablesEqual can be replaced
# by doing a separate call for each set of tables/relations.
# Wraps check_relations_equal_with_relations by creating relations
# from the list of names passed in.
def check_relations_equal(adapter, relation_names, compare_snapshot_cols=False):
if len(relation_names) < 2:
raise TestProcessingException(
"Not enough relations to compare",
)
relations = [relation_from_name(adapter, name) for name in relation_names]
return check_relations_equal_with_relations(
adapter, relations, compare_snapshot_cols=compare_snapshot_cols
)
# This can be used when checking relations in different schemas, by supplying
# a list of relations. Called by 'check_relations_equal'.
# Uses:
# adapter.get_columns_in_relation
# adapter.get_rows_different_sql
# adapter.execute
def check_relations_equal_with_relations(adapter, relations, compare_snapshot_cols=False):
with get_connection(adapter):
basis, compares = relations[0], relations[1:]
# Skip columns starting with "dbt_" because we don't want to
# compare those, since they are time sensitive
# (unless comparing "dbt_" snapshot columns is explicitly enabled)
column_names = [
c.name
for c in adapter.get_columns_in_relation(basis)
if not c.name.lower().startswith("dbt_") or compare_snapshot_cols
]
for relation in compares:
sql = adapter.get_rows_different_sql(basis, relation, column_names=column_names)
_, tbl = adapter.execute(sql, fetch=True)
num_rows = len(tbl)
assert (
num_rows == 1
), f"Invalid sql query from get_rows_different_sql: incorrect number of rows ({num_rows})"
num_cols = len(tbl[0])
assert (
num_cols == 2
), f"Invalid sql query from get_rows_different_sql: incorrect number of cols ({num_cols})"
row_count_difference = tbl[0][0]
assert (
row_count_difference == 0
), f"Got {row_count_difference} difference in row count betwen {basis} and {relation}"
rows_mismatched = tbl[0][1]
assert (
rows_mismatched == 0
), f"Got {rows_mismatched} different rows between {basis} and {relation}"
# Uses:
# adapter.update_column_sql
# adapter.execute
# adapter.commit_if_has_connection
def update_rows(adapter, update_rows_config):
"""
{
"name": "base",
"dst_col": "some_date"
"clause": {
"type": "add_timestamp",
"src_col": "some_date",
"where" "id > 10"
}
"""
for key in ["name", "dst_col", "clause"]:
if key not in update_rows_config:
raise TestProcessingException(f"Invalid update_rows: no {key}")
clause = update_rows_config["clause"]
clause = generate_update_clause(adapter, clause)
where = None
if "where" in update_rows_config:
where = update_rows_config["where"]
name = update_rows_config["name"]
dst_col = update_rows_config["dst_col"]
relation = relation_from_name(adapter, name)
with get_connection(adapter):
sql = adapter.update_column_sql(
dst_name=str(relation),
dst_column=dst_col,
clause=clause,
where_clause=where,
)
adapter.execute(sql, auto_begin=True)
adapter.commit_if_has_connection()
# This is called by the 'update_rows' function.
# Uses:
# adapter.timestamp_add_sql
# adapter.string_add_sql
def generate_update_clause(adapter, clause) -> str:
"""
Called by update_rows function. Expects the "clause" dictionary
documented in 'update_rows.
"""
if "type" not in clause or clause["type"] not in ["add_timestamp", "add_string"]:
raise TestProcessingException("invalid update_rows clause: type missing or incorrect")
clause_type = clause["type"]
if clause_type == "add_timestamp":
if "src_col" not in clause:
raise TestProcessingException("Invalid update_rows clause: no src_col")
add_to = clause["src_col"]
kwargs = {k: v for k, v in clause.items() if k in ("interval", "number")}
with get_connection(adapter):
return adapter.timestamp_add_sql(add_to=add_to, **kwargs)
elif clause_type == "add_string":
for key in ["src_col", "value"]:
if key not in clause:
raise TestProcessingException(f"Invalid update_rows clause: no {key}")
src_col = clause["src_col"]
value = clause["value"]
location = clause.get("location", "append")
with get_connection(adapter):
return adapter.string_add_sql(src_col, value, location)
return ""
@contextmanager
def get_connection(adapter, name="_test"):
with adapter.connection_named(name):
conn = adapter.connections.get_thread_connection()
yield conn
# Uses:
# adapter.get_columns_in_relation
def get_relation_columns(adapter, name):
relation = relation_from_name(adapter, name)
with get_connection(adapter):
columns = adapter.get_columns_in_relation(relation)
return sorted(((c.name, c.dtype, c.char_size) for c in columns), key=lambda x: x[0])
def check_table_does_not_exist(adapter, name):
columns = get_relation_columns(adapter, name)
assert len(columns) == 0
def check_table_does_exist(adapter, name):
columns = get_relation_columns(adapter, name)
assert len(columns) > 0

View File

@@ -44,6 +44,7 @@ LOAD_ALL_TIMING_SPEC = "iglu:com.dbt/load_all_timing/jsonschema/1-0-3"
RESOURCE_COUNTS = "iglu:com.dbt/resource_counts/jsonschema/1-0-0"
EXPERIMENTAL_PARSER = "iglu:com.dbt/experimental_parser/jsonschema/1-0-0"
PARTIAL_PARSER = "iglu:com.dbt/partial_parser/jsonschema/1-0-1"
RUNNABLE_TIMING = "iglu:com.dbt/runnable/jsonschema/1-0-0"
DBT_INVOCATION_ENV = "DBT_INVOCATION_ENV"
@@ -412,6 +413,18 @@ def track_partial_parser(options):
)
def track_runnable_timing(options):
context = [SelfDescribingJson(RUNNABLE_TIMING, options)]
assert active_user is not None, "Cannot track runnable info when active user is None"
track(
active_user,
category="dbt",
action="runnable_timing",
label=get_invocation_id(),
context=context,
)
def flush():
fire_event(FlushEvents())
try:

View File

@@ -17,7 +17,7 @@ from pathlib import PosixPath, WindowsPath
from contextlib import contextmanager
from dbt.exceptions import ConnectionException
from dbt.events.functions import fire_event
from dbt.events.types import RetryExternalCall
from dbt.events.types import RetryExternalCall, RecordRetryException
from dbt import flags
from enum import Enum
from typing_extensions import Protocol
@@ -211,7 +211,7 @@ def deep_map_render(func: Callable[[Any, Tuple[Union[str, int], ...]], Any], val
It maps the function func() onto each non-container value in 'value'
recursively, returning a new value. As long as func does not manipulate
value, then deep_map_render will also not manipulate it.
the value, then deep_map_render will also not manipulate it.
value should be a value returned by `yaml.safe_load` or `json.load` - the
only expected types are list, dict, native python number, str, NoneType,
@@ -319,7 +319,7 @@ def timestring() -> str:
class JSONEncoder(json.JSONEncoder):
"""A 'custom' json encoder that does normal json encoder things, but also
handles `Decimal`s. and `Undefined`s. Decimals can lose precision because
handles `Decimal`s and `Undefined`s. Decimals can lose precision because
they get converted to floats. Undefined's are serialized to an empty string
"""
@@ -394,7 +394,7 @@ def translate_aliases(
If recurse is True, perform this operation recursively.
:return: A dict containing all the values in kwargs referenced by their
:returns: A dict containing all the values in kwargs referenced by their
canonical key.
:raises: `AliasException`, if a canonical key is defined more than once.
"""
@@ -600,22 +600,22 @@ class MultiDict(Mapping[str, Any]):
def _connection_exception_retry(fn, max_attempts: int, attempt: int = 0):
"""Attempts to run a function that makes an external call, if the call fails
on a connection error, timeout or decompression issue, it will be tried up to 5 more times.
See https://github.com/dbt-labs/dbt-core/issues/4579 for context on this decompression issues
specifically.
on a Requests exception or decompression issue (ReadError), it will be tried
up to 5 more times. All exceptions that Requests explicitly raises inherit from
requests.exceptions.RequestException. See https://github.com/dbt-labs/dbt-core/issues/4579
for context on this decompression issues specifically.
"""
try:
return fn()
except (
requests.exceptions.ConnectionError,
requests.exceptions.Timeout,
requests.exceptions.ContentDecodingError,
requests.exceptions.RequestException,
ReadError,
) as exc:
if attempt <= max_attempts - 1:
fire_event(RecordRetryException(exc=exc))
fire_event(RetryExternalCall(attempt=attempt, max=max_attempts))
time.sleep(1)
_connection_exception_retry(fn, max_attempts, attempt + 1)
return _connection_exception_retry(fn, max_attempts, attempt + 1)
else:
raise ConnectionException("External connection exception occurred: " + str(exc))

View File

@@ -3,7 +3,7 @@ import importlib.util
import os
import glob
import json
from typing import Iterator
from typing import Iterator, List, Optional, Tuple
import requests
@@ -16,7 +16,34 @@ from dbt import flags
PYPI_VERSION_URL = "https://pypi.org/pypi/dbt-core/json"
def get_latest_version(version_url: str = PYPI_VERSION_URL):
def get_version_information() -> str:
flags.USE_COLORS = True if not flags.USE_COLORS else None
installed = get_installed_version()
latest = get_latest_version()
core_msg_lines, core_info_msg = _get_core_msg_lines(installed, latest)
core_msg = _format_core_msg(core_msg_lines)
plugin_version_msg = _get_plugins_msg(installed)
msg_lines = [core_msg]
if core_info_msg != "":
msg_lines.append(core_info_msg)
msg_lines.append(plugin_version_msg)
msg_lines.append("")
return "\n\n".join(msg_lines)
def get_installed_version() -> dbt.semver.VersionSpecifier:
return dbt.semver.VersionSpecifier.from_version_string(__version__)
def get_latest_version(
version_url: str = PYPI_VERSION_URL,
) -> Optional[dbt.semver.VersionSpecifier]:
try:
resp = requests.get(version_url)
data = resp.json()
@@ -27,81 +54,168 @@ def get_latest_version(version_url: str = PYPI_VERSION_URL):
return dbt.semver.VersionSpecifier.from_version_string(version_string)
def get_installed_version():
return dbt.semver.VersionSpecifier.from_version_string(__version__)
def _get_core_msg_lines(installed, latest) -> Tuple[List[List[str]], str]:
installed_s = installed.to_version_string(skip_matcher=True)
installed_line = ["installed", installed_s, ""]
update_info = ""
if latest is None:
update_info = (
" The latest version of dbt-core could not be determined!\n"
" Make sure that the following URL is accessible:\n"
f" {PYPI_VERSION_URL}"
)
return [installed_line], update_info
latest_s = latest.to_version_string(skip_matcher=True)
latest_line = ["latest", latest_s, green("Up to date!")]
if installed > latest:
latest_line[2] = green("Ahead of latest version!")
elif installed < latest:
latest_line[2] = yellow("Update available!")
update_info = (
" Your version of dbt-core is out of date!\n"
" You can find instructions for upgrading here:\n"
" https://docs.getdbt.com/docs/installation"
)
return [
installed_line,
latest_line,
], update_info
def _format_core_msg(lines: List[List[str]]) -> str:
msg = "Core:\n"
msg_lines = []
for name, version, update_msg in _pad_lines(lines, seperator=":"):
line_msg = f" - {name} {version}"
if update_msg != "":
line_msg += f" - {update_msg}"
msg_lines.append(line_msg)
return msg + "\n".join(msg_lines)
def _get_plugins_msg(installed: dbt.semver.VersionSpecifier) -> str:
msg_lines = ["Plugins:"]
plugins = []
display_update_msg = False
for name, version_s in _get_dbt_plugins_info():
compatability_msg, needs_update = _get_plugin_msg_info(name, version_s, installed)
if needs_update:
display_update_msg = True
plugins.append([name, version_s, compatability_msg])
for plugin in _pad_lines(plugins, seperator=":"):
msg_lines.append(_format_single_plugin(plugin, ""))
if display_update_msg:
update_msg = (
" At least one plugin is out of date or incompatible with dbt-core.\n"
" You can find instructions for upgrading here:\n"
" https://docs.getdbt.com/docs/installation"
)
msg_lines += ["", update_msg]
return "\n".join(msg_lines)
def _get_plugin_msg_info(
name: str, version_s: str, core: dbt.semver.VersionSpecifier
) -> Tuple[str, bool]:
plugin = dbt.semver.VersionSpecifier.from_version_string(version_s)
latest_plugin = get_latest_version(version_url=get_package_pypi_url(name))
needs_update = False
if plugin.major != core.major or plugin.minor != core.minor:
compatibility_msg = red("Not compatible!")
needs_update = True
return (compatibility_msg, needs_update)
if not latest_plugin:
compatibility_msg = yellow("Could not determine latest version")
return (compatibility_msg, needs_update)
if plugin < latest_plugin:
compatibility_msg = yellow("Update available!")
needs_update = True
elif plugin > latest_plugin:
compatibility_msg = green("Ahead of latest version!")
else:
compatibility_msg = green("Up to date!")
return (compatibility_msg, needs_update)
def _format_single_plugin(plugin: List[str], update_msg: str) -> str:
name, version_s, compatability_msg = plugin
msg = f" - {name} {version_s} - {compatability_msg}"
if update_msg != "":
msg += f"\n{update_msg}\n"
return msg
def _pad_lines(lines: List[List[str]], seperator: str = "") -> List[List[str]]:
if len(lines) == 0:
return []
# count the max line length for each column in the line
counter = [0] * len(lines[0])
for line in lines:
for i, item in enumerate(line):
counter[i] = max(counter[i], len(item))
result: List[List[str]] = []
for i, line in enumerate(lines):
# add another list to hold padded strings
if len(result) == i:
result.append([""] * len(line))
# iterate over columns in the line
for j, item in enumerate(line):
# the last column does not need padding
if j == len(line) - 1:
result[i][j] = item
continue
# if the following column has no length
# the string does not need padding
if counter[j + 1] == 0:
result[i][j] = item
continue
# only add the seperator to the first column
offset = 0
if j == 0 and seperator != "":
item += seperator
offset = len(seperator)
result[i][j] = item.ljust(counter[j] + offset)
return result
def get_package_pypi_url(package_name: str) -> str:
return f"https://pypi.org/pypi/dbt-{package_name}/json"
def get_version_information():
flags.USE_COLORS = True if not flags.USE_COLORS else None
installed = get_installed_version()
latest = get_latest_version()
installed_s = installed.to_version_string(skip_matcher=True)
if latest is None:
latest_s = "unknown"
else:
latest_s = latest.to_version_string(skip_matcher=True)
version_msg = "installed version: {}\n" " latest version: {}\n\n".format(
installed_s, latest_s
)
plugin_version_msg = "Plugins:\n"
for plugin_name, version in _get_dbt_plugins_info():
plugin_version = dbt.semver.VersionSpecifier.from_version_string(version)
latest_plugin_version = get_latest_version(version_url=get_package_pypi_url(plugin_name))
plugin_update_msg = ""
if installed == plugin_version or (
latest_plugin_version and plugin_version == latest_plugin_version
):
compatibility_msg = green("Up to date!")
else:
if latest_plugin_version:
if installed.major == plugin_version.major:
compatibility_msg = yellow("Update available!")
else:
compatibility_msg = red("Out of date!")
plugin_update_msg = (
" Your version of dbt-{} is out of date! "
"You can find instructions for upgrading here:\n"
" https://docs.getdbt.com/dbt-cli/install/overview\n\n"
).format(plugin_name)
else:
compatibility_msg = yellow("No PYPI version available")
plugin_version_msg += (" - {}: {} - {}\n" "{}").format(
plugin_name, version, compatibility_msg, plugin_update_msg
)
if latest is None:
return (
"{}The latest version of dbt could not be determined!\n"
"Make sure that the following URL is accessible:\n{}\n\n{}".format(
version_msg, PYPI_VERSION_URL, plugin_version_msg
)
)
if installed == latest:
return f"{version_msg}{green('Up to date!')}\n\n{plugin_version_msg}"
elif installed > latest:
return "{}Your version of dbt is ahead of the latest " "release!\n\n{}".format(
version_msg, plugin_version_msg
)
else:
return (
"{}Your version of dbt is out of date! "
"You can find instructions for upgrading here:\n"
"https://docs.getdbt.com/docs/installation\n\n{}".format(
version_msg, plugin_version_msg
)
)
def _get_dbt_plugins_info() -> Iterator[Tuple[str, str]]:
for plugin_name in _get_adapter_plugin_names():
if plugin_name == "core":
continue
try:
mod = importlib.import_module(f"dbt.adapters.{plugin_name}.__version__")
except ImportError:
# not an adapter
continue
yield plugin_name, mod.version # type: ignore
def _get_adapter_plugin_names() -> Iterator[str]:
@@ -120,17 +234,5 @@ def _get_adapter_plugin_names() -> Iterator[str]:
yield plugin_name
def _get_dbt_plugins_info():
for plugin_name in _get_adapter_plugin_names():
if plugin_name == "core":
continue
try:
mod = importlib.import_module(f"dbt.adapters.{plugin_name}.__version__")
except ImportError:
# not an adapter
continue
yield plugin_name, mod.version
__version__ = "1.0.1"
__version__ = "1.1.3"
installed = get_installed_version()

View File

@@ -273,12 +273,12 @@ def parse_args(argv=None):
parser.add_argument("adapter")
parser.add_argument("--title-case", "-t", default=None)
parser.add_argument("--dependency", action="append")
parser.add_argument("--dbt-core-version", default="1.0.1")
parser.add_argument("--dbt-core-version", default="1.1.3")
parser.add_argument("--email")
parser.add_argument("--author")
parser.add_argument("--url")
parser.add_argument("--sql", action="store_true")
parser.add_argument("--package-version", default="1.0.1")
parser.add_argument("--package-version", default="1.1.3")
parser.add_argument("--project-version", default="1.0")
parser.add_argument("--no-dependency", action="store_false", dest="set_dependency")
parsed = parser.parse_args()

View File

@@ -25,7 +25,7 @@ with open(os.path.join(this_directory, "README.md")) as f:
package_name = "dbt-core"
package_version = "1.0.1"
package_version = "1.1.3"
description = """With dbt, data analysts and engineers can build analytics \
the way engineers build applications."""
@@ -52,20 +52,20 @@ setup(
],
install_requires=[
"Jinja2==2.11.3",
"MarkupSafe==2.0.1",
"MarkupSafe>=0.23,<2.1",
"agate>=1.6,<1.6.4",
"click>=7.0,<9",
"colorama>=0.3.9,<0.4.5",
"hologram==0.0.14",
"hologram>=0.0.14,<=0.0.15",
"isodate>=0.6,<0.7",
"logbook>=1.5,<1.6",
"mashumaro==2.9",
"minimal-snowplow-tracker==0.0.2",
"networkx>=2.3,<3",
"networkx>=2.3,<2.8.4",
"packaging>=20.9,<22.0",
"sqlparse>=0.2.3,<0.5",
"dbt-extractor==0.4.0",
"typing-extensions>=3.7.4,<3.11",
"dbt-extractor~=0.4.1",
"typing-extensions>=3.7.4",
"werkzeug>=1,<3",
# the following are all to match snowflake-connector-python
"requests<3.0.0",
@@ -82,6 +82,7 @@ setup(
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
],
python_requires=">=3.7.2",
)

View File

@@ -1,4 +1,4 @@
black==21.12b0
black==22.3.0
bumpversion
flake8
flaky
@@ -8,9 +8,11 @@ mypy==0.782
pip-tools
pre-commit
pytest
pytest-cov
pytest-csv
pytest-dotenv
pytest-logbook
pytest-csv
pytest-mock
pytest-xdist
pytz
tox>=3.13

View File

@@ -9,17 +9,17 @@ ARG build_for=linux/amd64
##
# base image (abstract)
##
FROM --platform=$build_for python:3.9.9-slim-bullseye as base
FROM --platform=$build_for python:3.10.3-slim-bullseye as base
# N.B. The refs updated automagically every release via bumpversion
# N.B. dbt-postgres is currently found in the core codebase so a value of dbt-core@<some_version> is correct
ARG dbt_core_ref=dbt-core@v1.0.1
ARG dbt_postgres_ref=dbt-core@v1.0.1
ARG dbt_redshift_ref=dbt-redshift@v1.0.0
ARG dbt_bigquery_ref=dbt-bigquery@v1.0.0
ARG dbt_snowflake_ref=dbt-snowflake@v1.0.0
ARG dbt_spark_ref=dbt-spark@v1.0.0
ARG dbt_core_ref=dbt-core@v1.1.3
ARG dbt_postgres_ref=dbt-core@v1.1.3
ARG dbt_redshift_ref=dbt-redshift@v1.1.0
ARG dbt_bigquery_ref=dbt-bigquery@v1.1.0
ARG dbt_snowflake_ref=dbt-snowflake@v1.1.0
ARG dbt_spark_ref=dbt-spark@v1.1.0
# special case args
ARG dbt_spark_version=all
ARG dbt_third_party

View File

@@ -17,6 +17,11 @@ In order to build a new image, run the following docker command.
```
docker build --tag <your_image_name> --target <target_name> <path/to/dockerfile>
```
---
> **Note:** Docker must be configured to use [BuildKit](https://docs.docker.com/develop/develop-images/build_enhancements/) in order for images to build properly!
---
By default the images will be populated with the most recent release of `dbt-core` and whatever database adapter you select. If you need to use a different version you can specify it by git ref using the `--build-arg` flag:
```
docker build --tag <your_image_name> \
@@ -32,7 +37,10 @@ valid arg names for versioning are:
* `dbt_snowflake_ref`
* `dbt_spark_ref`
> Note: Only overide a _single_ build arg for each build. Using multiple overides may lead to a non-functioning image.
---
>**NOTE:** Only override a _single_ build arg for each build. Using multiple overrides may lead to a non-functioning image.
---
If you wish to build an image with a third-party adapter you can use the `dbt-third-party` target. This target requires you provide a path to the adapter that can be processed by `pip` by using the `dbt_third_party` build arg:
```
@@ -101,6 +109,9 @@ docker run \
my-dbt \
ls
```
> Notes:
> * Bind-mount sources _must_ be an absolute path
> * You may need to make adjustments to the docker networking setting depending on the specifics of your data warehouse/database host.
---
**Notes:**
* Bind-mount sources _must_ be an absolute path
* You may need to make adjustments to the docker networking setting depending on the specifics of your data warehouse/database host.
---

View File

@@ -1,2 +1,3 @@
-e ./core
-e ./plugins/postgres
-e ./tests/adapter

View File

@@ -1,18 +1,118 @@
# Performance Regression Testing
This directory includes dbt project setups to test on and a test runner written in Rust which runs specific dbt commands on each of the projects. Orchestration is done via the GitHub Action workflow in `/.github/workflows/performance.yml`. The workflow is scheduled to run every night, but it can also be triggered manually.
The github workflow hardcodes our baseline branch for performance metrics as `0.20.latest`. As future versions become faster, this branch will be updated to hold us to those new standards.
## Attention!
## Adding a new dbt project
Just make a new directory under `performance/projects/`. It will automatically be picked up by the tests.
PLEASE READ THIS README IN THE MAIN BRANCH
The performance runner is always pulled from main regardless of the version being modeled or sampled. If you are not in the main branch, this information may be stale.
## Adding a new dbt command
In `runner/src/measure.rs::measure` add a metric to the `metrics` Vec. The Github Action will handle recompilation if you don't have the rust toolchain installed.
## Description
This test suite samples the performance characteristics of individual commits against performance models for prior releases. Performance is measured in project-command pairs which are assumed to conform to a normal distribution. The sampling and comparison is effecient enough to run against PRs.
This collection of projects and commands should expand over time to reflect user feedback about poorly performing projects to protect against poor performance in these scenarios in future versions.
Here are all the components of the testing module:
- dbt project setups that are known performance bottlenecks which you can find in `/performance/projects/`, and a runner written in Rust that runs specific dbt commands on each of the projects.
- Performance characteristics called "baselines" from released dbt versions in `/performance/baselines/`. Each branch will only have the baselines for its ancestors because when we compare samples, we compare against the lastest baseline available in the branch.
- A GitHub action for modeling the performance distribution for a new release: `/.github/workflows/model_performance.yml`.
- A GitHub action for sampling performance of dbt at your commit and comparing it against a previous release: `/.github/workflows/sample_performance.yml`.
At this time, the biggest risk in the design of this project is how to account for the natural variation of GitHub Action runs. Typically, performance work is done on dedicated hardware to elimiate this factor. However, there are ways to integrate the variation in obeservation tools if it can be measured.
## Adding Test Scenarios
A clear process for maintainers and community members to add new performance testing targets will exist after the next stage of the test suite is complete. For details, see #4768.
## Investigating Regressions
If your commit has failed one of the performance regression tests, it does not necessarily mean your commit has a performance regression. However, the observed runtime value was so much slower than the expected value that it was unlikely to be random noise. If it is not due to random noise, this commit contains the code that is causing this performance regression. However, it may not be the commit that introduced that code. That code may have been introduced in the commit before even if it passed due to natural variation in sampling. When investigating a performance regression, start with the failing commit and working your way backwards.
Here's an example of how this could happen:
```
Commit
A <- last release
B
C <- perf regression
D
E
F <- the first failing commit
```
- Commit A is measured to have an expected value for one performance metric of 30 seconds with a standard deviation of 0.5 seconds.
- Commit B doesn't introduce a performance regression and passes the performance regression tests.
- Commit C introduces a performance regression such that the new expected value of the metric is 32 seconds with a standard deviation still at 0.5 seconds, but we don't know this because we don't estimate the whole performance distribution on every commit because that is far too much work to run on every commit. It passes the performance regression test because we happened to sample a value of 31 seconds which is within our threshold for the original model. It's also only 2 standard deviations away from the actual performance model of commit C so even though it's not going to be a super common situation, it is expected to happen sometimes.
- Commit D samples a value of 31.4 seconds and passes
- Commit E samples a value of 31.2 seconds and passes
- Commit F samples a value of 32.9 seconds and fails
Because these performance regression tests are non-deterministic, it is frequently going to be possible to rerun the test on a failing commit and get it to pass. The more often we do this, the farther down the commit history we will be punting detection.
If your PR is against `main` your commits will be compared against the latest baseline measurement found in `performance/baselines`. If this commit needs to be backported, that PR will be against the `.latest` branch and will also compare against the latest baseline measurement found in `performance/baselines` in that branch. These two versions may be the same or they may be different. For example, If the latest version of dbt is v1.99.0, the performance sample of your PR against main will compare against the baseline for v1.99.0. When those commits are backported to `1.98.latest` those commits will be compared against the baseline for v1.98.6 (or whatever the latest is at that time). Even if the compared baseline is the same, a different sample is taken for each PR. In this case, even though it should be rare, it is possible for a performance regression to be detected in one of the two PRs even with the same baseline due to variation in sampling.
## The Statistics
Particle physicists need to be confident in declaring new discoveries, snack manufacturers need to be sure each individual item is within the regulated margin of error for nutrition facts, and weight-rated climbing gear needs to be produced so you can trust your life to every unit that comes off the line. All of these use cases use the same kind of math to meet their needs: sigma-based p-values. This section will peel apart that math with the help of a physicist and walk through how we apply this approach to performance regression testing in this test suite.
You are likely familiar with forming a hypothesis of the form "A and B are correlated" which is known as _the research hypothesis_. Additionally, it follows that the hypothesis "A and B are not correlated" is relevant and is known as _the null hypothesis_. When looking at data, we commonly use a _p-value_ to determine the significance of the data. Formally, a _p-value_ is the probability of obtaining data at least as extreme as the ones observed, if the null hypothesis is true. To refine this definition, The experimental partical physicist [Dr. Tommaso Dorigo](https://userswww.pd.infn.it/~dorigo/#about) has an excellent [glossary](https://www.science20.com/quantum_diaries_survivor/fundamental_glossary_higgs_broadcast-85365) of these terms that helps clarify: "'Extreme' is quite tricky instead: it depends on what is your 'alternate hypothesis' of reference, and what kind of departure it would produce on the studied statistic derived from the data. So 'extreme' will mean 'departing from the typical values expected for the null hypothesis, toward the values expected from the alternate hypothesis.'" In the context of performance regression testing, our research hypothesis is that "after commit A, the codebase includes a performance regression" which means we expect the runtime of our measured processes to be _slower_, not faster than the expected value.
Given this definition of p-value, we need to explicitly call out the common tendancy to apply _probability inversion_ to our observations. To quote [Dr. Tommaso Dorigo](https://www.science20.com/quantum_diaries_survivor/fundamental_glossary_higgs_broadcast-85365) again, "If your ability on the long jump puts you in the 99.99% percentile, that does not mean that you are a kangaroo, and neither can one infer that the probability that you belong to the human race is 0.01%." Using our previously defined terms, the p-value is _not_ the probability that the null hypothesis _is true_.
This brings us to calculating sigma values. Sigma refers to the standard deviation of a statistical model, which is used as a measurement of how far away an observed value is from the expected value. When we say that we have a "3 sigma result" we are saying that if the null hypothesis is true, this is a particularly unlikely observation—not that the null hypothesis is false. Exactly how unlikely depends on what the expected values from our research hypothesis are. In the context of performance regression testing, if the null hypothesis is false, we are expecting the results to be _slower_ than the expected value not _slower or faster_. Looking at a normal distrubiton below, we can see that we only care about one _half_ of the distribution: the half where the values are slower than the expected value. This means that when we're calculating the p-value we are not including both sides of the normal distribution.
![normal distibution](./images/normal.svg)
Because of this, the following table describes the significance of each sigma level for our _one-sided_ hypothesis:
| σ | p-value | scientific significance |
| --- | -------------- | ----------------------- |
| 1 σ | 1 in 6 | |
| 2 σ | 1 in 44 | |
| 3 σ | 1 in 741 | evidence |
| 4 σ | 1 in 31,574 | |
| 5 σ | 1 in 3,486,914 | discovery |
When detecting performance regressions that trigger alerts, block PRs, or delay releases we want to be conservative enough that detections are infrequently triggered by noise, but not so conservative as to miss most actual regressions. This test suite uses a 3 sigma standard so that only about 1 in every 700 runs is expected to fail the performance regression test suite due to expected variance in our measurements.
In practice, the number of performance regression failures due to random noise will be higher because we are not incorporating the variance of the tools we use to measure, namely GHA.
### Concrete Example: Performance Regression Detection
The following example data was collected by running the code in this repository in Github Actions.
In dbt v1.0.3, we have the following mean and standard deviation when parsing a dbt project with 2000 models:
μ (mean): 41.22<br/>
σ (stddev): 0.2525<br/>
The 2-sided 3 sigma range can be calculated with these two values via:
x < μ - 3 σ or x > μ + 3 σ<br/>
x < 41.22 - 3 * 0.2525 or x > 41.22 + 3 * 0.2525 <br/>
x < 40.46 or x > 41.98<br/>
It follows that the 1-sided 3 sigma range for performance regressions is just:<br/>
x > 41.98
If when we sample a single `dbt parse` of the same project with a commit slated to go into dbt v1.0.4, we observe a 42s parse time, then this observation is so unlikely if there were no code-induced performance regressions, that we should investigate if there is a performance regression in any of the commits between this failure and the commit where the initial distribution was measured.
Observations with 3 sigma significance that are _not_ performance regressions could be due to observing unlikely values (roughly 1 in every 750 observations), or variations in the instruments we use to take these measurements such as github actions. At this time we do not measure the variation in the instruments we use to account for these in our calculations which means failures due to random noise are more likely than they would be if we did take them into account.
### Concrete Example: Performance Modeling
Once a new dbt version is released (excluding pre-releases), the performance characteristics of that released version need to be measured. In this repository this measurement is referred to as a baseline.
After dbt v1.0.99 is released, a github action running from `main`, for the latest version of that action, takes the following steps:
- Checks out main for the latest performance runner
- pip installs dbt v1.0.99
- builds the runner if it's not already in the github actions cache
- uses the performance runner model sub command with `./runner model`.
- The model subcommand calls hyperfine to run all of the project-command pairs a large number of times (maybe 20 or so) and save the hyperfine outputs to files in `performance/baselines/1.0.99/` one file per command-project pair.
- The action opens two PRs with these files: one against `main` and one against `1.0.latest` so that future PRs against these branches will detect regressions against the performance characteristics of dbt v1.0.99 instead of v1.0.98.
- The release driver for dbt v1.0.99 reviews and merges these PRs which is the sole deliverable of the performance modeling work.
## Future work
- add more projects to test different configurations that have been known bottlenecks
- add more dbt commands to measure
- possibly using the uploaded json artifacts to store these results so they can be graphed over time
- reading new metrics from a file so no one has to edit rust source to add them to the suite
- instead of building the rust every time, we could publish and pull down the latest version.
- instead of manually setting the baseline version of dbt to test, pull down the latest stable version as the baseline.
- pin commands to projects by reading commands from a file defined in the project.
- add a postgres warehouse to run `dbt compile` and `dbt run` commands
- add more projects to test different configurations that have been known performance bottlenecks
- Account for github action variation: Either measure it, or eliminate it. To measure it we could set up another action that periodically samples the same version of dbt and use a 7 day rolling variation. To eliminate it we could run the action using something like [act](https://github.com/nektos/act) on dedicated hardware.
- build in a git-bisect run to automatically identify the commits that caused a performance regression by modeling each commit's expected value for the failing metric. Running this automatically, or even providing a script to do this locally would be useful.

Some files were not shown because too many files have changed in this diff Show More