Compare commits

...

82 Commits

Author SHA1 Message Date
leahwicz
a5d529219b Update README.md 2022-05-20 11:58:01 -04:00
github-actions[bot]
016613552e Bumping version to 1.0.6 (#5177)
* Bumping version to 1.0.6

* Updating changelog

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: Leah Antkiewicz <leah.antkiewicz@fishtownanalytics.com>
2022-04-27 15:18:01 -04:00
github-actions[bot]
0c228c5383 Bumping version to 1.0.6rc1 (#5165)
* Bumping version to 1.0.6rc1

* Changelog update

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: Leah Antkiewicz <leah.antkiewicz@fishtownanalytics.com>
2022-04-26 13:02:51 -04:00
Gerda Shank
6cdf373143 Backport 1.0 ct 540 use target context for selectors (#5160)
* Use yaml renderer (with target context) for rendering selectors

* Changie
2022-04-26 12:11:20 -04:00
Chenyu Li
b771d8b59e Even more scrubbing (#5152) (#5158)
* Even more scrubbing

* Changelog entry

* Even more

* remove reduendent scrub

* remove reduendent scrub

* fix encoding issue

* keep scrubbed log in args

Co-authored-by: Chenyu Li <chenyu.li@dbtlabs.com>
(cherry picked from commit ce0bcc08a6)

Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
2022-04-26 10:04:20 -06:00
Emily Rockman
de1c1a1b29 Backport 5137 5069 (#5147)
* move deprecation check outside package caching (#5069)

* move deprecation check outside package caching

* add changelog

* fix retry logic failures (#5137)

* fix retry logic failures

* changelog

* add tests to make sure data is getting where it needs to

* rename file

* remove duplicate file

* move unit test to old framework since new one doesn't exist here
2022-04-25 09:46:06 -05:00
github-actions[bot]
c7e5a6c6b3 Bumping version to 1.0.5 (#5115)
* Bumping version to 1.0.5

* Adding changelog

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: Leah Antkiewicz <leah.antkiewicz@fishtownanalytics.com>
2022-04-20 12:05:00 -04:00
leahwicz
9f5688bf84 Flexibilize MarkupSafe pinned version (#5039) (#5110)
* Flexibilize MarkupSafe pinned version

The current `MarkupSafe` pinned version has been added in #4746 as a
temporary fix for #4745.

However, the current restrictive approach isn't compatible with other
libraries that could require an even older version of `MarkupSafe`, like
Airflow `2.2.2` [0], which requires `markupsafe>=1.1.1, <2.0`.

To avoid that issue, we can allow a greater range of supported
`MarkupSafe` versions. Considering the direct dependency `dbt-core` has
is `Jinja2==2.11.3`, we can use its pinning as the lower bound, which is
`MarkupSafe>=0.23` [1].

This fix should be also backported this to `1.0.latest` for inclusion in
the next v1.0 patch.

[0] https://github.com/adamantike/airflow/blob/2.2.2/setup.cfg#L125
[1] https://github.com/pallets/jinja/blob/2.11.3/setup.py#L53

Co-authored-by: Michael Manganiello <adamantike@users.noreply.github.com>
2022-04-19 14:37:53 -04:00
Nathaniel May
4838411039 backport perf readme (#5042) 2022-04-18 12:05:59 -04:00
Emily Rockman
37344dd87c backporting (#5040) 2022-04-12 13:48:10 -05:00
github-actions[bot]
7202a1c78e Bumping version to 1.0.5rc3 (#5038)
* Bumping version to 1.0.5rc3

* Add Changelog

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: Leah Antkiewicz <leah.antkiewicz@fishtownanalytics.com>
2022-04-12 11:05:57 -04:00
github-actions[bot]
8489e99854 cache after retrying instead of while retrying (#5028) (#5031)
Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com>
2022-04-11 20:08:28 -05:00
github-actions[bot]
4a1d8a2986 Bumping version to 1.0.5rc2 (#5014)
* Bumping version to 1.0.5rc2

* Creating Changelog

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: Leah Antkiewicz <leah.antkiewicz@fishtownanalytics.com>
2022-04-08 10:56:15 -04:00
Emily Rockman
64ff87d7e4 Backport 4982 deps (#5007)
* resolve merge conflicts

* clean up missed conflict issue

* remove failing test with comment

* fix typo
2022-04-07 16:02:34 -05:00
leahwicz
5d0ebd502b Adding packages field to setup (#5010) 2022-04-07 16:48:01 -04:00
Jeremy Cohen
7aa7259b1a v1.0.4 changelog with one entry (#4941)
* v1.0.4 changelog with one entry

* Rm 1.0.4 change from 1.0.5

* Create 1.0.4 release notes

* Update 1.0.5-rc1.md
2022-03-23 20:44:28 +01:00
github-actions[bot]
7d1410acc9 Bumping version to 1.0.5rc1 (#4913)
* Bumping version to 1.0.5rc1

Co-authored-by: Gerda Shank <gerda@dbtlabs.com>
2022-03-21 14:28:44 -04:00
github-actions[bot]
88fc45b156 Use cli_vars instead of context to create package and selector renderers (#4878) (#4886)
Co-authored-by: Gerda Shank <gerda@dbtlabs.com>
2022-03-21 14:14:09 -04:00
Nathaniel May
c6cde6ee2d use pep 0440 compatible release operator for dbt-extractor dependency and bump (#4892) 2022-03-21 12:01:31 -04:00
Gerda Shank
c8f3f22e15 Fix "dbt found two resources" error with multiple snapshot blocks in one file (#4773) (#4877)
* Fix handling of multiple snapshot blocks in partial parsing

* Update tests for partial parsing snapshots
2022-03-16 17:56:17 -04:00
Emily Rockman
2748e4b822 [Backport] 4865 dep retries (#4867)
* catch all requests exceptions to retry (#4865)
* catch all requests exceptions to retry

* add changelog

* fixed pre-1.1 serialization issues
2022-03-16 09:04:47 -05:00
Emily Rockman
7fca9ec2c9 Small changie fixes (#4857) (#4859)
* fix broken links, update GHA to not repost comment

* tweak GHA

* convert GHA used

* consolidate GHA

* fix PR numbers and pull comment as var

* fix name of workflow step

* changie merge to fix link at top of changelog

* add changelog yaml
# Conflicts:
#	CHANGELOG.md
2022-03-14 09:18:38 -05:00
Emily Rockman
ad3063a612 [Backport] automate changelog (#4840)
* Automate changelog (#4743)

* initial setup to use changie

* added `dbt-core` to version line

* fix formatting

* rename to be more accurate

* remove extra file

* add stug for contributing section

* updated docs for contributing and changelog

* first pass at changelog check

* Fix workflow name

* comment on handling failure

* add automatic contributors section via footer

* removed unused initialization

* add script to automate entire changelog creation and handle prereleases

* stub out README

* add changelog entry!

* no longer need to add contributors ourselves

* fixed formatted and excluded core team

* fix typo and collapse if statement

* updated to reflect automatic pre-release handling

Removed custom script in favor of built in pre-release functionality in new version of changie.

* update contributing doc

* pass at GHA

* fix path

* all changed files

* more GHA work

* continued GHA work

* try another approach

* testing

* adding comment via GHA

* added uses for GHA

* more debugging

* fixed formatting

* another comment attempt

* remove read permission

* add label check

* fix quotes

* checking label logic

* test forcing failure

* remove extra script tag

* removed logic for having changelog

* Revert "removed logic for having changelog"

This reverts commit 490bda8256.

* remove unused workflow section

* update header and readme

* update with current version of changelog

* add step failure for missing changelog file

* fix typos and formatting

* small tweaks per feedback

* Update so changelog end up onlywith current version, not past

* update changelog to recent contents

* added the rest of our releases to previous release list

* clarifying the readme

* updated to reflect current changelog state

* updated so only 1.1 changes are on main
# Conflicts:
#	CHANGELOG.md

* updated to reflect current state of 1.0.latest

* convert backports to changie entries
2022-03-09 16:17:24 -06:00
leahwicz
5218438704 task init: support older click v7.0 (#4681) (#4817)
* task init: support older click v7.0

`dbt init` uses click for interactively setting up a project. The
version constraints currently ask for click >= 8 but v7.0 has nearly the
same prompt/confirm/echo API. prompt added a feature that isn't used.
confirm has a behavior change if the default is None, but
confirm(..., default=None) is not used. Long story short, we can relax
the version constraint to allow installing with an older click library.

Ref: Issue #4566

* Update CHANGELOG.md

Co-authored-by: Chenyu Li <chenyulee777@gmail.com>

Co-authored-by: Chenyu Li <chenyulee777@gmail.com>

Co-authored-by: Tristan Willy <twilly@users.noreply.github.com>
Co-authored-by: Chenyu Li <chenyulee777@gmail.com>
2022-03-09 16:23:35 -05:00
Nathaniel May
33d08f8faa add performance baseline for 1.0.3 (#4847) 2022-03-09 14:58:48 -05:00
Stu Kilgore
9ff2c8024c Fix macro modified from previous state (#4820) (#4834)
* Fix macro modified from previous state

Previously, if the first node selected by state:modified had multiple macro
dependencies, the first of which had not been changed, the rest of the
macro dependencies of the node would not be checked for changes. This
commit fixes this behavior, so the remainder of the macro dependencies
of the node will be checked as well.
2022-03-09 08:23:18 -06:00
Emily Rockman
75696a1797 [backport] updated index file to fix DAG errors for operations & work around null columns (#4763) (#4797)
* updated index file to fix DAG errors for operations

* update index file to reflect dbt-docs fixes

* add changelog
# Conflicts:
#	CHANGELOG.md
2022-02-28 11:25:05 -06:00
Gerda Shank
5b41b12779 [Backport] Fix bug causing empty node level meta, snapshot config errors (#4774)
* Do not overwrite node.meta with empty patch.meta

* Restore config_call_dict in snapshot node transform

* Test for snapshot with schema file config

* Test for meta in both toplevel node and node config
2022-02-23 16:09:07 -05:00
github-actions[bot]
27ed2f961b Bumping version to 1.0.3 (#4760)
Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: leahwicz <60146280+leahwicz@users.noreply.github.com>
2022-02-21 14:20:18 -05:00
Gerda Shank
f2dcb6f23c [Backport] Fix bug accessing target in deps and clean commands (#4759)
* Create DictDefaultNone for to_target_dict in deps and clean commands

* Update test case to handle

* update CHANGELOG.md

* Switch to DictDefaultEmptyStr for to_target_dict
2022-02-21 14:09:46 -05:00
github-actions[bot]
77afe63c7c Bumping version to 1.0.2 (#4750)
* Bumping version to 1.0.2

* Update CHANGELOG.md

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: leahwicz <60146280+leahwicz@users.noreply.github.com>
2022-02-18 09:28:03 -05:00
Jeremy Cohen
ca7c4c147a Pin MarkupSafe==2.0.1 (#4746) (#4749) 2022-02-18 09:15:27 -05:00
Nathaniel May
4145834c5b fix test to use a secret username (#4683) 2022-02-04 15:07:11 -05:00
github-actions[bot]
aaeb94d683 Bumping version to 1.0.2rc1 manually removed docker requirements update (#4679)
Co-authored-by: Nathaniel May <nathaniel.may@fishtownanalytics.com>
2022-02-04 14:11:23 -05:00
Chenyu Li
a2662b2f83 Chenyu/backport 4565 (#4677)
* adapter compability messaging added. (#4565)

* adapter compability messaging added.

* edited plugin version compatibility message

* edited test version for plugin compability

* compare using only major and minor

* Add checking PYPI and update changelog

Co-authored-by: Chenyu Li <chenyulee777@gmail.com>
Co-authored-by: ChenyuLi <chenyu.li@dbtlabs.com>

* fix changelog

* fix changelog

Co-authored-by: nkyuray <95860273+nkyuray@users.noreply.github.com>
2022-02-04 08:36:40 -05:00
Chenyu Li
056db408cf fix comparision for new model/body (#4631) (#4676)
* fix comparison for new model/body
2022-02-03 17:34:14 -05:00
Chenyu Li
bec6becd18 Validate project names in interactive dbt init (#4536) (#4675)
* Validate project names in interactive dbt init

- workflow: ask the user to provide a valid project name until they do.
- new integration tests
- supported scenarios:
  - dbt init
  - dbt init -s
  - dbt init [name]
  - dbt init [name] -s

* Update Changelog.md

* Add full URLs to CHANGELOG.md

Co-authored-by: Chenyu Li <chenyulee777@gmail.com>

Co-authored-by: Chenyu Li <chenyulee777@gmail.com>

Co-authored-by: Amir Kadivar <amir@amirkdv.ca>
2022-02-03 17:23:35 -05:00
Nathaniel May
3be057b6a4 Avoid saving secrets in SecretContext (#4665) (#4672) 2022-02-03 15:47:46 -05:00
Nathaniel May
e2a6c25a6d Alternative Modified Backport of #4619 (#4660)
* adds new function fire_event_if
2022-02-02 15:20:22 -05:00
leahwicz
92b3fc470d Run check_if_can_write_profile before create_profile_using_project_profile_template [CT-67] [Backport 1.0.latest] (#4447) (#4658)
* Run check_if_can_write_profile before create_profile_using_project_profile_template

* Changelog

Co-authored-by: Ian Knox <81931810+iknox-fa@users.noreply.github.com>

Co-authored-by: Niall Woodward <niall@niallrees.com>
Co-authored-by: Ian Knox <81931810+iknox-fa@users.noreply.github.com>
2022-02-01 17:32:24 -05:00
Jeremy Cohen
1e9fe67393 Change InvalidRefInTestNode level to DEBUG (#4647) (#4655)
* Debug-level test depends on disabled

* Add PR link to Changelog
2022-02-01 18:18:55 +01:00
Gerda Shank
d9361259f4 [#4554] Don't require a profile for dbt deps and clean commands (#4610) (#4651) 2022-01-31 14:52:41 -05:00
Emily Rockman
7990974bd8 Retry after failure to download or failure to open files (#4609) (#4649)
* add retry logic, tests when extracting tarfile fails

* fixed bug with not catching empty responses

* specify compression type

* WIP test

* more testing work

* fixed up unit test

* add changelog

* Add more comments!

* clarify why we do the json() check for None
# Conflicts:
#	CHANGELOG.md
2022-01-31 11:32:29 -06:00
github-actions[bot]
544d3e7a3a Clarify "incompatible package version" error msg (#4587) (#4628)
* Clarify "incompatible package version" error msg

* Clarify error message when they shouldn't fall fwd

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>
2022-01-27 14:34:15 -05:00
Emily Rockman
31962beb14 Rename data directory to seeds (#4589) (#4592)
* Rename data directory to seeds

* Update CHANGELOG.md
# Conflicts:
#	CHANGELOG.md

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>
2022-01-20 08:57:26 -06:00
leahwicz
f6a0853901 Bumping version to 1.0.1 (#4543) (#4544)
* Bumping version to 1.0.1

* Update CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: leahwicz <60146280+leahwicz@users.noreply.github.com>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
2022-01-03 13:19:25 -05:00
leahwicz
336a3d4987 Bumping version to 1.0.1rc1 (#4517) (#4518)
* Bumping version to 1.0.1rc1

* Update CHANGELOG.md

Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
Co-authored-by: leahwicz <60146280+leahwicz@users.noreply.github.com>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
2021-12-20 14:59:54 -05:00
Gerda Shank
74dc5c49ae [#4523] Fix error with env_var in hook (#4526) 2021-12-20 14:49:36 -05:00
leahwicz
29fa687349 Fix bool coercion to 0/1 (#4512) (#4516)
* Fix bool coercion

* Fix unit test

Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
2021-12-20 10:02:14 -05:00
Emily Rockman
39d4e729c9 scrub message of secrets (#4507) (#4510)
* scrub message of secrets

* update changelog

* use new scrubbing and scrub more places using git

* fixed small miss of string conv and missing raise

* fix bug with cloning error

* resolving message issues

* better, more specific scrubbing
2021-12-19 10:08:36 -05:00
Gerda Shank
406bdcc89c [#4470 BACKPORT] Improve checking of schema version for pre-1.0.0 manifests (#4497) (#4503) 2021-12-17 15:37:00 -05:00
Emily Rockman
9702aa733f update log message to use adapter name (#4501) (#4502)
* update log message to use adapter name

* add changelog
2021-12-16 12:59:45 -06:00
Emily Rockman
44265716f9 [BACKPORT] compile new index file for docs (#4484) (#4500)
* compile new index file for docs

* Add changelog

* move changleog entries for docs changes
2021-12-16 10:22:55 -06:00
Gerda Shank
20b27fd3b6 [#4464] Check specifically for generic node type for some partial parsing actions (#4465) (#4494)
* [#4464] Check specifically for generic node type for some partial parsing actions

* Add check for existence of macro file in saved_files

* Check for existence of patch file in saved_files
2021-12-15 09:48:00 -05:00
Emily Rockman
76c2e182ba updated DepsStartPackageInstall event to use package name (#4482) (#4485)
* updated event to user package name

* add changelog
2021-12-14 15:27:08 -06:00
Matthew McKnight
791625ddf5 made change to test of str (#4463) (#4478)
* made change to test of str

* changelog update
2021-12-13 16:04:22 -06:00
Emily Rockman
1baa05a764 Fix dbt docs overview to working url (#4442) (#4460)
* Fix to working url

* add fix to changelog

Co-authored-by: Rebekka Moyson <remoyson@gmail.com>
2021-12-08 13:06:31 -06:00
Nathaniel May
1b47b53aff point latest version check to dbt-core package (#4434) (#4435) 2021-12-03 16:19:15 -05:00
leahwicz
ec1f609f3e Bumping version to 1.0.0 (#4431) (#4432)
Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com>
2021-12-03 13:34:41 -05:00
Jeremy Cohen
b4ea003559 Changelog entries for rc3 -> final (#4389) (#4430)
* Changelog entries for rc3 -> final

* More updates

* Final entry

* Last fix, and the date

* These few, these happy few
2021-12-03 19:24:43 +01:00
Jeremy Cohen
23e1a9aa4f relax version specifier for dbt-extractor (#4427) (#4429)
Co-authored-by: Nathaniel May <nathaniel.may@fishtownanalytics.com>
2021-12-03 19:20:40 +01:00
Jeremy Cohen
9882d08a24 add new interop tests for black-box json log schema testing (#4327) (#4428)
Co-authored-by: Nathaniel May <nathaniel.may@fishtownanalytics.com>
2021-12-03 19:15:41 +01:00
leahwicz
79cc811a68 stringify generic exceptions (#4424) (#4425)
Co-authored-by: Ian Knox <81931810+iknox-fa@users.noreply.github.com>
2021-12-03 12:36:14 -05:00
leahwicz
c82572f745 Info vs debug text formatting (#4418) (#4421)
Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
2021-12-03 09:22:14 -05:00
leahwicz
42a38e4deb Sources aren't materialized (#4417) (#4420)
Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
2021-12-03 09:03:24 -05:00
leahwicz
ecf0ffe68c Add flag to main.py. Reinstantiate after flags (#4416) (#4419)
Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
2021-12-03 08:54:48 -05:00
leahwicz
e9f26ef494 add node type codes to more events + more hook log data (#4378) (#4415)
* add node type codes to more events + more hook log

* minor fixes

* renames started/finished keys

* made process more clear

* fixed errors

* Put back report_node_data in fresshness.py

Co-authored-by: Gerda Shank <gerda@dbtlabs.com>

Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com>
Co-authored-by: Gerda Shank <gerda@dbtlabs.com>
2021-12-02 19:31:20 -05:00
leahwicz
c77dc59af8 use reference keys instead of relations (#4410) (#4414)
Co-authored-by: Nathaniel May <nathaniel.may@fishtownanalytics.com>
2021-12-02 18:41:20 -05:00
leahwicz
a5ebe4ff59 Logging README (#4395) (#4413)
* WIP

* more README cleanup

* readme tweaks

* small tweaks

* wording updates

Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com>
2021-12-02 18:12:28 -05:00
leahwicz
5c01f9006c user configurable event buffer size (#4411) (#4412)
Co-authored-by: Ian Knox <81931810+iknox-fa@users.noreply.github.com>
2021-12-02 18:05:57 -05:00
Jeremy Cohen
c92e1ed9f2 [Backport] #4388 + #4405 (#4408)
* A few final logging touch-ups (#4388)

* Rm unused events, per #4104

* More structured ConcurrencyLine

* Replace \n prefixes with EmptyLine

* Reimplement ui.warning_tag to centralize logic

* Use warning_tag for deprecations too

* Rm more unused event types

* Exclude EmptyLine from json logs

* loglines are not always created by events (#4406)

Co-authored-by: Nathaniel May <nathaniel.may@fishtownanalytics.com>

* Rollover + backup for dbt.log (#4405)

Co-authored-by: Nathaniel May <nathaniel.may@fishtownanalytics.com>
2021-12-02 17:51:08 -05:00
Emily Rockman
85dee41a9f update file name (#4402) (#4407)
Co-authored-by: leahwicz <60146280+leahwicz@users.noreply.github.com>
2021-12-02 17:08:32 -05:00
leahwicz
a4456feff0 change json override strategy (#4396) (#4403)
Co-authored-by: Nathaniel May <nathaniel.may@fishtownanalytics.com>
2021-12-02 17:05:33 -05:00
leahwicz
8d27764b0f allow log_format to be set in profile configs (#4394) (#4401)
Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com>
2021-12-02 16:49:41 -05:00
leahwicz
e56256d968 use rfc3339 format for log time stamps (#4384) (#4400)
Co-authored-by: Nathaniel May <nathaniel.may@fishtownanalytics.com>
2021-12-02 15:42:46 -05:00
leahwicz
86cb3ba6fa [#4354] Different output for console and file logs (#4379) (#4399)
* [#4354] Different output for console and file logs

* Tweak some log formats

* Change loging of thread names

Co-authored-by: Gerda Shank <gerda@fishtownanalytics.com>
2021-12-02 15:39:53 -05:00
leahwicz
4d0d2d0d6f Add windows OS error supressing for temp dir cleanups (#4380) (#4398)
Co-authored-by: Ian Knox <81931810+iknox-fa@users.noreply.github.com>
2021-12-02 15:33:10 -05:00
leahwicz
f8a3c27fb8 move event code up a level (#4381) (#4397)
move event code up a level plus minor fixes

Co-authored-by: Nathaniel May <nathaniel.may@fishtownanalytics.com>
2021-12-02 15:27:04 -05:00
leahwicz
30f05b0213 Fix release process (#4385) (#4393) 2021-12-02 12:33:41 -05:00
Jeremy Cohen
f1bebb3629 Tiny touchups for deps, clean (#4366) (#4387)
* Use actual profile name for log msg

* Raise clean dep warning iff configured path missing
2021-12-02 17:35:51 +01:00
Gerda Shank
e7a40345ad Make the stdout logger actually go to stdout (#4368) (#4376) 2021-12-01 11:13:24 -05:00
Emily Rockman
ba94b8212c only log events in cache.py when flag is set set (#4371)
flag is --log-cache-events
2021-11-30 16:05:20 -06:00
105 changed files with 3253 additions and 4405 deletions

View File

@@ -1,5 +1,5 @@
[bumpversion]
current_version = 1.0.0rc3
current_version = 1.0.6
parse = (?P<major>\d+)
\.(?P<minor>\d+)
\.(?P<patch>\d+)

15
.changes/0.0.0.md Normal file
View File

@@ -0,0 +1,15 @@
## Previous Releases
For information on prior major and minor releases, see their changelogs:
* [0.21](https://github.com/dbt-labs/dbt-core/blob/0.21.latest/CHANGELOG.md)
* [0.20](https://github.com/dbt-labs/dbt-core/blob/0.20.latest/CHANGELOG.md)
* [0.19](https://github.com/dbt-labs/dbt-core/blob/0.19.latest/CHANGELOG.md)
* [0.18](https://github.com/dbt-labs/dbt-core/blob/0.18.latest/CHANGELOG.md)
* [0.17](https://github.com/dbt-labs/dbt-core/blob/0.17.latest/CHANGELOG.md)
* [0.16](https://github.com/dbt-labs/dbt-core/blob/0.16.latest/CHANGELOG.md)
* [0.15](https://github.com/dbt-labs/dbt-core/blob/0.15.latest/CHANGELOG.md)
* [0.14](https://github.com/dbt-labs/dbt-core/blob/0.14.latest/CHANGELOG.md)
* [0.13](https://github.com/dbt-labs/dbt-core/blob/0.13.latest/CHANGELOG.md)
* [0.12](https://github.com/dbt-labs/dbt-core/blob/0.12.latest/CHANGELOG.md)
* [0.11 and earlier](https://github.com/dbt-labs/dbt-core/blob/0.11.latest/CHANGELOG.md)

250
.changes/1.0.3.md Normal file
View File

@@ -0,0 +1,250 @@
## dbt-core 1.0.3 (February 21, 2022)
### Fixes
- Fix bug accessing target fields in deps and clean commands ([#4752](https://github.com/dbt-labs/dbt-core/issues/4752), [#4758](https://github.com/dbt-labs/dbt-core/issues/4758))
## dbt-core 1.0.2 (February 18, 2022)
### Dependencies
- Pin `MarkupSafe==2.0.1`. Deprecation of `soft_unicode` in `MarkupSafe==2.1.0` is not supported by `Jinja2==2.11`
## dbt-core 1.0.2rc1 (February 4, 2022)
### Fixes
- Projects created using `dbt init` now have the correct `seeds` directory created (instead of `data`) ([#4588](https://github.com/dbt-labs/dbt-core/issues/4588), [#4599](https://github.com/dbt-labs/dbt-core/pull/4589))
- Don't require a profile for dbt deps and clean commands ([#4554](https://github.com/dbt-labs/dbt-core/issues/4554), [#4610](https://github.com/dbt-labs/dbt-core/pull/4610))
- Select modified.body works correctly when new model added([#4570](https://github.com/dbt-labs/dbt-core/issues/4570), [#4631](https://github.com/dbt-labs/dbt-core/pull/4631))
- Fix bug in retry logic for bad response from hub and when there is a bad git tarball download. ([#4577](https://github.com/dbt-labs/dbt-core/issues/4577), [#4579](https://github.com/dbt-labs/dbt-core/issues/4579), [#4609](https://github.com/dbt-labs/dbt-core/pull/4609))
- Restore previous log level (DEBUG) when a test depends on a disabled resource. Still WARN if the resource is missing ([#4594](https://github.com/dbt-labs/dbt-core/issues/4594), [#4647](https://github.com/dbt-labs/dbt-core/pull/4647))
- User wasn't asked for permission to overwite a profile entry when running init inside an existing project ([#4375](https://github.com/dbt-labs/dbt-core/issues/4375), [#4447](https://github.com/dbt-labs/dbt-core/pull/4447))
- A change in secret environment variables won't trigger a full reparse [#4650](https://github.com/dbt-labs/dbt-core/issues/4650) [4665](https://github.com/dbt-labs/dbt-core/pull/4665)
- adapter compability messaging added([#4438](https://github.com/dbt-labs/dbt-core/pull/4438) [#4565](https://github.com/dbt-labs/dbt-core/pull/4565))
- Add project name validation to `dbt init` ([#4490](https://github.com/dbt-labs/dbt-core/issues/4490),[#4536](https://github.com/dbt-labs/dbt-core/pull/4536))
Contributors:
- [@NiallRees](https://github.com/NiallRees) ([#4447](https://github.com/dbt-labs/dbt-core/pull/4447))
- [@amirkdv](https://github.com/amirkdv) ([#4536](https://github.com/dbt-labs/dbt-core/pull/4536))
- [@nkyuray](https://github.com/nkyuray) ([#4565](https://github.com/dbt-labs/dbt-core/pull/4565))
## dbt-core 1.0.1 (January 03, 2022)
## dbt-core 1.0.1rc1 (December 20, 2021)
### Fixes
- Fix wrong url in the dbt docs overview homepage ([#4442](https://github.com/dbt-labs/dbt-core/pull/4442))
- Fix redefined status param of SQLQueryStatus to typecheck the string which passes on `._message` value of `AdapterResponse` or the `str` value sent by adapter plugin. ([#4463](https://github.com/dbt-labs/dbt-core/pull/4463#issuecomment-990174166))
- Fix `DepsStartPackageInstall` event to use package name instead of version number. ([#4482](https://github.com/dbt-labs/dbt-core/pull/4482))
- Reimplement log message to use adapter name instead of the object method. ([#4501](https://github.com/dbt-labs/dbt-core/pull/4501))
- Issue better error message for incompatible schemas ([#4470](https://github.com/dbt-labs/dbt-core/pull/4442), [#4497](https://github.com/dbt-labs/dbt-core/pull/4497))
- Remove secrets from error related to packages. ([#4507](https://github.com/dbt-labs/dbt-core/pull/4507))
- Prevent coercion of boolean values (`True`, `False`) to numeric values (`0`, `1`) in query results ([#4511](https://github.com/dbt-labs/dbt-core/issues/4511), [#4512](https://github.com/dbt-labs/dbt-core/pull/4512))
- Fix error with an env_var in a project hook ([#4523](https://github.com/dbt-labs/dbt-core/issues/4523), [#4524](https://github.com/dbt-labs/dbt-core/pull/4524))
### Docs
- Fix missing data on exposures in docs ([#4467](https://github.com/dbt-labs/dbt-core/issues/4467))
Contributors:
- [remoyson](https://github.com/remoyson) ([#4442](https://github.com/dbt-labs/dbt-core/pull/4442))
## dbt-core 1.0.0 (December 3, 2021)
### Fixes
- Configure the CLI logger destination to use stdout instead of stderr ([#4368](https://github.com/dbt-labs/dbt-core/pull/4368))
- Make the size of `EVENT_HISTORY` configurable, via `EVENT_BUFFER_SIZE` global config ([#4411](https://github.com/dbt-labs/dbt-core/pull/4411), [#4416](https://github.com/dbt-labs/dbt-core/pull/4416))
- Change type of `log_format` in `profiles.yml` user config to be string, not boolean ([#4394](https://github.com/dbt-labs/dbt-core/pull/4394))
### Under the hood
- Only log cache events if `LOG_CACHE_EVENTS` is enabled, and disable by default. This restores previous behavior ([#4369](https://github.com/dbt-labs/dbt-core/pull/4369))
- Move event codes to be a top-level attribute of JSON-formatted logs, rather than nested in `data` ([#4381](https://github.com/dbt-labs/dbt-core/pull/4381))
- Fix failing integration test on Windows ([#4380](https://github.com/dbt-labs/dbt-core/pull/4380))
- Clean up warning messages for `clean` + `deps` ([#4366](https://github.com/dbt-labs/dbt-core/pull/4366))
- Use RFC3339 timestamps for log messages ([#4384](https://github.com/dbt-labs/dbt-core/pull/4384))
- Different text output for console (info) and file (debug) logs ([#4379](https://github.com/dbt-labs/dbt-core/pull/4379), [#4418](https://github.com/dbt-labs/dbt-core/pull/4418))
- Remove unused events. More structured `ConcurrencyLine`. Replace `\n` message starts/ends with `EmptyLine` events, and exclude `EmptyLine` from JSON-formatted output ([#4388](https://github.com/dbt-labs/dbt-core/pull/4388))
- Update `events` module README ([#4395](https://github.com/dbt-labs/dbt-core/pull/4395))
- Rework approach to JSON serialization for events with non-standard properties ([#4396](https://github.com/dbt-labs/dbt-core/pull/4396))
- Update legacy logger file name to `dbt.log.legacy` ([#4402](https://github.com/dbt-labs/dbt-core/pull/4402))
- Rollover `dbt.log` at 10 MB, and keep up to 5 backups, restoring previous behavior ([#4405](https://github.com/dbt-labs/dbt-core/pull/4405))
- Use reference keys instead of full relation objects in cache events ([#4410](https://github.com/dbt-labs/dbt-core/pull/4410))
- Add `node_type` contextual info to more events ([#4378](https://github.com/dbt-labs/dbt-core/pull/4378))
- Make `materialized` config optional in `node_type` ([#4417](https://github.com/dbt-labs/dbt-core/pull/4417))
- Stringify exception in `GenericExceptionOnRun` to support JSON serialization ([#4424](https://github.com/dbt-labs/dbt-core/pull/4424))
- Add "interop" tests for machine consumption of structured log output ([#4327](https://github.com/dbt-labs/dbt-core/pull/4327))
- Relax version specifier for `dbt-extractor` to `~=0.4.0`, to support compiled wheels for additional architectures when available ([#4427](https://github.com/dbt-labs/dbt-core/pull/4427))
## dbt-core 1.0.0rc3 (November 30, 2021)
### Fixes
- Support partial parsing of env_vars in metrics ([#4253](https://github.com/dbt-labs/dbt-core/issues/4293), [#4322](https://github.com/dbt-labs/dbt-core/pull/4322))
- Fix typo in `UnparsedSourceDefinition.__post_serialize__` ([#3545](https://github.com/dbt-labs/dbt-core/issues/3545), [#4349](https://github.com/dbt-labs/dbt-core/pull/4349))
### Under the hood
- Change some CompilationExceptions to ParsingExceptions ([#4254](http://github.com/dbt-labs/dbt-core/issues/4254), [#4328](https://github.com/dbt-core/pull/4328))
- Reorder logic for static parser sampling to speed up model parsing ([#4332](https://github.com/dbt-labs/dbt-core/pull/4332))
- Use more augmented assignment statements ([#4315](https://github.com/dbt-labs/dbt-core/issues/4315)), ([#4311](https://github.com/dbt-labs/dbt-core/pull/4331))
- Adjust logic when finding approximate matches for models and tests ([#3835](https://github.com/dbt-labs/dbt-core/issues/3835)), [#4076](https://github.com/dbt-labs/dbt-core/pull/4076))
- Restore small previous behaviors for logging: JSON formatting for first few events; `WARN`-level stdout for `list` task; include tracking events in `dbt.log` ([#4341](https://github.com/dbt-labs/dbt-core/pull/4341))
Contributors:
- [@sarah-weatherbee](https://github.com/sarah-weatherbee) ([#4331](https://github.com/dbt-labs/dbt-core/pull/4331))
- [@emilieschario](https://github.com/emilieschario) ([#4076](https://github.com/dbt-labs/dbt-core/pull/4076))
- [@sneznaj](https://github.com/sneznaj) ([#4349](https://github.com/dbt-labs/dbt-core/pull/4349))
## dbt-core 1.0.0rc2 (November 22, 2021)
### Breaking changes
- Restrict secret env vars (prefixed `DBT_ENV_SECRET_`) to `profiles.yml` + `packages.yml` _only_. Raise an exception if a secret env var is used elsewhere ([#4310](https://github.com/dbt-labs/dbt-core/issues/4310), [#4311](https://github.com/dbt-labs/dbt-core/pull/4311))
- Reorder arguments to `config.get()` so that `default` is second ([#4273](https://github.com/dbt-labs/dbt-core/issues/4273), [#4297](https://github.com/dbt-labs/dbt-core/pull/4297))
### Features
- Avoid error when missing column in YAML description ([#4151](https://github.com/dbt-labs/dbt-core/issues/4151), [#4285](https://github.com/dbt-labs/dbt-core/pull/4285))
- Allow `--defer` flag to `dbt snapshot` ([#4110](https://github.com/dbt-labs/dbt-core/issues/4110), [#4296](https://github.com/dbt-labs/dbt-core/pull/4296))
- Install prerelease packages when `version` explicitly references a prerelease version, regardless of `install-prerelease` status ([#4243](https://github.com/dbt-labs/dbt-core/issues/4243), [#4295](https://github.com/dbt-labs/dbt-core/pull/4295))
- Add data attributes to json log messages ([#4301](https://github.com/dbt-labs/dbt-core/pull/4301))
- Add event codes to all log events ([#4319](https://github.com/dbt-labs/dbt-core/pull/4319))
### Fixes
- Fix serialization error with missing quotes in metrics model ref ([#4252](https://github.com/dbt-labs/dbt-core/issues/4252), [#4287](https://github.com/dbt-labs/dbt-core/pull/4289))
- Correct definition of 'created_at' in ParsedMetric nodes ([#4298](http://github.com/dbt-labs/dbt-core/issues/4298), [#4299](https://github.com/dbt-labs/dbt-core/pull/4299))
### Fixes
- Allow specifying default in Jinja config.get with default keyword ([#4273](https://github.com/dbt-labs/dbt-core/issues/4273), [#4297](https://github.com/dbt-labs/dbt-core/pull/4297))
- Fix serialization error with missing quotes in metrics model ref ([#4252](https://github.com/dbt-labs/dbt-core/issues/4252), [#4287](https://github.com/dbt-labs/dbt-core/pull/4289))
- Correct definition of 'created_at' in ParsedMetric nodes ([#4298](https://github.com/dbt-labs/dbt-core/issues/4298), [#4299](https://github.com/dbt-labs/dbt-core/pull/4299))
### Under the hood
- Add --indirect-selection parameter to profiles.yml and builtin DBT_ env vars; stringified parameter to enable multi-modal use ([#3997](https://github.com/dbt-labs/dbt-core/issues/3997), [#4270](https://github.com/dbt-labs/dbt-core/pull/4270))
- Fix filesystem searcher test failure on Python 3.9 ([#3689](https://github.com/dbt-labs/dbt-core/issues/3689), [#4271](https://github.com/dbt-labs/dbt-core/pull/4271))
- Clean up deprecation warnings shown for `dbt_project.yml` config renames ([#4276](https://github.com/dbt-labs/dbt-core/issues/4276), [#4291](https://github.com/dbt-labs/dbt-core/pull/4291))
- Fix metrics count in compiled project stats ([#4290](https://github.com/dbt-labs/dbt-core/issues/4290), [#4292](https://github.com/dbt-labs/dbt-core/pull/4292))
- First pass at supporting more dbt tasks via python lib ([#4200](https://github.com/dbt-labs/dbt-core/pull/4200))
Contributors:
- [@kadero](https://github.com/kadero) ([#4285](https://github.com/dbt-labs/dbt-core/pull/4285), [#4296](https://github.com/dbt-labs/dbt-core/pull/4296))
- [@joellabes](https://github.com/joellabes) ([#4295](https://github.com/dbt-labs/dbt-core/pull/4295))
## dbt-core 1.0.0rc1 (November 10, 2021)
### Breaking changes
- Replace `greedy` flag/property for test selection with `indirect_selection: eager/cautious` flag/property. Set to `eager` by default. **Note:** This reverts test selection to its pre-v0.20 behavior by default. `dbt test -s my_model` _will_ select multi-parent tests, such as `relationships`, that depend on unselected resources. To achieve the behavior change in v0.20 + v0.21, set `--indirect-selection=cautious` on the CLI or `indirect_selection: cautious` in yaml selectors. ([#4082](https://github.com/dbt-labs/dbt-core/issues/4082), [#4104](https://github.com/dbt-labs/dbt-core/pull/4104))
- In v1.0.0, **`pip install dbt` will raise an explicit error.** Instead, please use `pip install dbt-<adapter>` (to use dbt with that database adapter), or `pip install dbt-core` (for core functionality). For parity with the previous behavior of `pip install dbt`, you can use: `pip install dbt-core dbt-postgres dbt-redshift dbt-snowflake dbt-bigquery` ([#4100](https://github.com/dbt-labs/dbt-core/issues/4100), [#4133](https://github.com/dbt-labs/dbt-core/pull/4133))
- Reorganize the `global_project` (macros) into smaller files with clearer names. Remove unused global macros: `column_list`, `column_list_for_create_table`, `incremental_upsert` ([#4154](https://github.com/dbt-labs/dbt-core/pull/4154))
- Introduce structured event interface, and begin conversion of all legacy logging ([#3359](https://github.com/dbt-labs/dbt-core/issues/3359), [#4055](https://github.com/dbt-labs/dbt-core/pull/4055))
- **This is a breaking change for adapter plugins, requiring a very simple migration.** See [`events` module README](core/dbt/events/README.md#adapter-maintainers) for details.
- If you maintain another kind of dbt-core plugin that makes heavy use of legacy logging, and you need time to cut over to the new event interface, you can re-enable the legacy logger via an environment variable shim, `DBT_ENABLE_LEGACY_LOGGER=True`. Be advised that we will remove this capability in a future version of dbt-core.
### Features
- Allow nullable `error_after` in source freshness ([#3874](https://github.com/dbt-labs/dbt-core/issues/3874), [#3955](https://github.com/dbt-labs/dbt-core/pull/3955))
- Add `metrics` nodes ([#4071](https://github.com/dbt-labs/dbt-core/issues/4071), [#4235](https://github.com/dbt-labs/dbt-core/pull/4235))
- Add support for `dbt init <project_name>`, and support for `skip_profile_setup` argument (`dbt init -s`) ([#4156](https://github.com/dbt-labs/dbt-core/issues/4156), [#4249](https://github.com/dbt-labs/dbt-core/pull/4249))
### Fixes
- Changes unit tests using `assertRaisesRegexp` to `assertRaisesRegex` ([#4136](https://github.com/dbt-labs/dbt-core/issues/4132), [#4136](https://github.com/dbt-labs/dbt-core/pull/4136))
- Allow retries when the answer from a `dbt deps` is `None` ([#4178](https://github.com/dbt-labs/dbt-core/issues/4178), [#4225](https://github.com/dbt-labs/dbt-core/pull/4225))
### Docs
- Fix non-alphabetical sort of Source Tables in source overview page ([docs#81](https://github.com/dbt-labs/dbt-docs/issues/81), [docs#218](https://github.com/dbt-labs/dbt-docs/pull/218))
- Add title tag to node elements in tree ([docs#202](https://github.com/dbt-labs/dbt-docs/issues/202), [docs#203](https://github.com/dbt-labs/dbt-docs/pull/203))
- Account for test rename: `schema` &rarr; `generic`, `data` &rarr;` singular`. Use `test_metadata` instead of `schema`/`data` tags to differentiate ([docs#216](https://github.com/dbt-labs/dbt-docs/issues/216), [docs#222](https://github.com/dbt-labs/dbt-docs/pull/222))
- Add `metrics` ([core#216](https://github.com/dbt-labs/dbt-core/issues/4235), [docs#223](https://github.com/dbt-labs/dbt-docs/pull/223))
### Under the hood
- Bump artifact schema versions for 1.0.0: manifest v4, run results v4, sources v3. Notable changes: added `metrics` nodes; schema test + data test nodes are renamed to generic test + singular test nodes; freshness threshold default values ([#4191](https://github.com/dbt-labs/dbt-core/pull/4191))
- Speed up node selection by skipping `incorporate_indirect_nodes` if not needed ([#4213](https://github.com/dbt-labs/dbt-core/issues/4213), [#4214](https://github.com/dbt-labs/dbt-core/issues/4214))
- When `on_schema_change` is set, pass common columns as `dest_columns` in incremental merge macros ([#4144](https://github.com/dbt-labs/dbt-core/issues/4144), [#4170](https://github.com/dbt-labs/dbt-core/pull/4170))
- Clear adapters before registering in `lib` module config generation ([#4218](https://github.com/dbt-labs/dbt-core/pull/4218))
- Remove official support for python 3.6, which is reaching end of life on December 23, 2021 ([#4134](https://github.com/dbt-labs/dbt-core/issues/4134), [#4223](https://github.com/dbt-labs/dbt-core/pull/4223))
Contributors:
- [@kadero](https://github.com/kadero) ([#3955](https://github.com/dbt-labs/dbt-core/pull/3955), [#4249](https://github.com/dbt-labs/dbt-core/pull/4249))
- [@frankcash](https://github.com/frankcash) ([#4136](https://github.com/dbt-labs/dbt-core/pull/4136))
- [@Kayrnt](https://github.com/Kayrnt) ([#4136](https://github.com/dbt-labs/dbt-core/pull/4170))
- [@VersusFacit](https://github.com/VersusFacit) ([#4104](https://github.com/dbt-labs/dbt-core/pull/4104))
- [@joellabes](https://github.com/joellabes) ([#4104](https://github.com/dbt-labs/dbt-core/pull/4104))
- [@b-per](https://github.com/b-per) ([#4225](https://github.com/dbt-labs/dbt-core/pull/4225))
- [@salmonsd](https://github.com/salmonsd) ([docs#218](https://github.com/dbt-labs/dbt-docs/pull/218))
- [@miike](https://github.com/miike) ([docs#203](https://github.com/dbt-labs/dbt-docs/pull/203))
## dbt-core 1.0.0b2 (October 25, 2021)
### Breaking changes
- Enable `on-run-start` and `on-run-end` hooks for `dbt test`. Add `flags.WHICH` to execution context, representing current task ([#3463](https://github.com/dbt-labs/dbt-core/issues/3463), [#4004](https://github.com/dbt-labs/dbt-core/pull/4004))
### Features
- Normalize global CLI arguments/flags ([#2990](https://github.com/dbt-labs/dbt/issues/2990), [#3839](https://github.com/dbt-labs/dbt/pull/3839))
- Turns on the static parser by default and adds the flag `--no-static-parser` to disable it. ([#3377](https://github.com/dbt-labs/dbt/issues/3377), [#3939](https://github.com/dbt-labs/dbt/pull/3939))
- Generic test FQNs have changed to include the relative path, resource, and column (if applicable) where they are defined. This makes it easier to configure them from the `tests` block in `dbt_project.yml` ([#3259](https://github.com/dbt-labs/dbt/pull/3259), [#3880](https://github.com/dbt-labs/dbt/pull/3880)
- Turn on partial parsing by default ([#3867](https://github.com/dbt-labs/dbt/issues/3867), [#3989](https://github.com/dbt-labs/dbt/issues/3989))
- Add `result:<status>` selectors to automatically rerun failed tests and erroneous models. This makes it easier to rerun failed dbt jobs with a simple selector flag instead of restarting from the beginning or manually running the dbt models in scope. ([#3859](https://github.com/dbt-labs/dbt/issues/3891), [#4017](https://github.com/dbt-labs/dbt/pull/4017))
- `dbt init` is now interactive, generating profiles.yml when run inside existing project ([#3625](https://github.com/dbt-labs/dbt/pull/3625))
### Under the hood
- Fix intermittent errors in partial parsing tests ([#4060](https://github.com/dbt-labs/dbt-core/issues/4060), [#4068](https://github.com/dbt-labs/dbt-core/pull/4068))
- Make finding disabled nodes more consistent ([#4069](https://github.com/dbt-labs/dbt-core/issues/4069), [#4073](https://github.com/dbt-labas/dbt-core/pull/4073))
- Remove connection from `render_with_context` during parsing, thereby removing misleading log message ([#3137](https://github.com/dbt-labs/dbt-core/issues/3137), [#4062](https://github.com/dbt-labas/dbt-core/pull/4062))
- Wait for postgres docker container to be ready in `setup_db.sh`. ([#3876](https://github.com/dbt-labs/dbt-core/issues/3876), [#3908](https://github.com/dbt-labs/dbt-core/pull/3908))
- Prefer macros defined in the project over the ones in a package by default ([#4106](https://github.com/dbt-labs/dbt-core/issues/4106), [#4114](https://github.com/dbt-labs/dbt-core/pull/4114))
- Dependency updates ([#4079](https://github.com/dbt-labs/dbt-core/pull/4079)), ([#3532](https://github.com/dbt-labs/dbt-core/pull/3532)
- Schedule partial parsing for SQL files with env_var changes ([#3885](https://github.com/dbt-labs/dbt-core/issues/3885), [#4101](https://github.com/dbt-labs/dbt-core/pull/4101))
- Schedule partial parsing for schema files with env_var changes ([#3885](https://github.com/dbt-labs/dbt-core/issues/3885), [#4162](https://github.com/dbt-labs/dbt-core/pull/4162))
- Skip partial parsing when env_vars change in dbt_project or profile ([#3885](https://github.com/dbt-labs/dbt-core/issues/3885), [#4212](https://github.com/dbt-labs/dbt-core/pull/4212))
Contributors:
- [@sungchun12](https://github.com/sungchun12) ([#4017](https://github.com/dbt-labs/dbt/pull/4017))
- [@matt-winkler](https://github.com/matt-winkler) ([#4017](https://github.com/dbt-labs/dbt/pull/4017))
- [@NiallRees](https://github.com/NiallRees) ([#3625](https://github.com/dbt-labs/dbt/pull/3625))
- [@rvacaru](https://github.com/rvacaru) ([#3908](https://github.com/dbt-labs/dbt/pull/3908))
- [@JCZuurmond](https://github.com/jczuurmond) ([#4114](https://github.com/dbt-labs/dbt-core/pull/4114))
- [@ljhopkins2](https://github.com/dbt-labs/dbt-core/pull/4079)
## dbt-core 1.0.0b1 (October 11, 2021)
### Breaking changes
- The two type of test definitions are now "singular" and "generic" (instead of "data" and "schema", respectively). The `test_type:` selection method accepts `test_type:singular` and `test_type:generic`. (It will also accept `test_type:schema` and `test_type:data` for backwards compatibility) ([#3234](https://github.com/dbt-labs/dbt-core/issues/3234), [#3880](https://github.com/dbt-labs/dbt-core/pull/3880)). **Not backwards compatible:** The `--data` and `--schema` flags to `dbt test` are no longer supported, and tests no longer have the tags `'data'` and `'schema'` automatically applied.
- Deprecated the use of the `packages` arg `adapter.dispatch` in favor of the `macro_namespace` arg. ([#3895](https://github.com/dbt-labs/dbt-core/issues/3895))
### Features
- Normalize global CLI arguments/flags ([#2990](https://github.com/dbt-labs/dbt-core/issues/2990), [#3839](https://github.com/dbt-labs/dbt-core/pull/3839))
- Turns on the static parser by default and adds the flag `--no-static-parser` to disable it. ([#3377](https://github.com/dbt-labs/dbt-core/issues/3377), [#3939](https://github.com/dbt-labs/dbt-core/pull/3939))
- Generic test FQNs have changed to include the relative path, resource, and column (if applicable) where they are defined. This makes it easier to configure them from the `tests` block in `dbt_project.yml` ([#3259](https://github.com/dbt-labs/dbt-core/pull/3259), [#3880](https://github.com/dbt-labs/dbt-core/pull/3880)
- Turn on partial parsing by default ([#3867](https://github.com/dbt-labs/dbt-core/issues/3867), [#3989](https://github.com/dbt-labs/dbt-core/issues/3989))
- Generic test can now be added under a `generic` subfolder in the `test-paths` directory. ([#4052](https://github.com/dbt-labs/dbt-core/pull/4052))
### Fixes
- Add generic tests defined on sources to the manifest once, not twice ([#3347](https://github.com/dbt-labs/dbt/issues/3347), [#3880](https://github.com/dbt-labs/dbt/pull/3880))
- Skip partial parsing if certain macros have changed ([#3810](https://github.com/dbt-labs/dbt/issues/3810), [#3982](https://github.com/dbt-labs/dbt/pull/3892))
- Enable cataloging of unlogged Postgres tables ([3961](https://github.com/dbt-labs/dbt/issues/3961), [#3993](https://github.com/dbt-labs/dbt/pull/3993))
- Fix multiple disabled nodes ([#4013](https://github.com/dbt-labs/dbt/issues/4013), [#4018](https://github.com/dbt-labs/dbt/pull/4018))
- Fix multiple partial parsing errors ([#3996](https://github.com/dbt-labs/dbt/issues/3006), [#4020](https://github.com/dbt-labs/dbt/pull/4018))
- Return an error instead of a warning when runing with `--warn-error` and no models are selected ([#4006](https://github.com/dbt-labs/dbt/issues/4006), [#4019](https://github.com/dbt-labs/dbt/pull/4019))
- Fixed bug with `error_if` test option ([#4070](https://github.com/dbt-labs/dbt-core/pull/4070))
### Under the hood
- Enact deprecation for `materialization-return` and replace deprecation warning with an exception. ([#3896](https://github.com/dbt-labs/dbt-core/issues/3896))
- Build catalog for only relational, non-ephemeral nodes in the graph ([#3920](https://github.com/dbt-labs/dbt-core/issues/3920))
- Enact deprecation to remove the `release` arg from the `execute_macro` method. ([#3900](https://github.com/dbt-labs/dbt-core/issues/3900))
- Enact deprecation for default quoting to be True. Override for the `dbt-snowflake` adapter so it stays `False`. ([#3898](https://github.com/dbt-labs/dbt-core/issues/3898))
- Enact deprecation for object used as dictionaries when they should be dataclasses. Replace deprecation warning with an exception for the dunder methods of `__iter__` and `__len__` for all superclasses of FakeAPIObject. ([#3897](https://github.com/dbt-labs/dbt-core/issues/3897))
- Enact deprecation for `adapter-macro` and replace deprecation warning with an exception. ([#3901](https://github.com/dbt-labs/dbt-core/issues/3901))
- Add warning when trying to put a node under the wrong key. ie. A seed under models in a `schema.yml` file. ([#3899](https://github.com/dbt-labs/dbt-core/issues/3899))
- Plugins for `redshift`, `snowflake`, and `bigquery` have moved to separate repos: [`dbt-redshift`](https://github.com/dbt-labs/dbt-redshift), [`dbt-snowflake`](https://github.com/dbt-labs/dbt-snowflake), [`dbt-bigquery`](https://github.com/dbt-labs/dbt-bigquery)
- Change the default dbt packages installation directory to `dbt_packages` from `dbt_modules`. Also rename `module-path` to `packages-install-path` to allow default overrides of package install directory. Deprecation warning added for projects using the old `dbt_modules` name without specifying a `packages-install-path`. ([#3523](https://github.com/dbt-labs/dbt-core/issues/3523))
- Update the default project paths to be `analysis-paths = ['analyses']` and `test-paths = ['tests]`. Also have starter project set `analysis-paths: ['analyses']` from now on. ([#2659](https://github.com/dbt-labs/dbt-core/issues/2659))
- Define the data type of `sources` as an array of arrays of string in the manifest artifacts. ([#3966](https://github.com/dbt-labs/dbt-core/issues/3966), [#3967](https://github.com/dbt-labs/dbt-core/pull/3967))
- Marked `source-paths` and `data-paths` as deprecated keys in `dbt_project.yml` in favor of `model-paths` and `seed-paths` respectively.([#1607](https://github.com/dbt-labs/dbt-core/issues/1607))
- Surface git errors to `stdout` when cloning dbt packages from Github. ([#3167](https://github.com/dbt-labs/dbt-core/issues/3167))
Contributors:
- [@dave-connors-3](https://github.com/dave-connors-3) ([#3920](https://github.com/dbt-labs/dbt-core/pull/3922))
- [@kadero](https://github.com/kadero) ([#3952](https://github.com/dbt-labs/dbt-core/pull/3953))
- [@samlader](https://github.com/samlader) ([#3993](https://github.com/dbt-labs/dbt-core/pull/3993))
- [@yu-iskw](https://github.com/yu-iskw) ([#3967](https://github.com/dbt-labs/dbt-core/pull/3967))
- [@laxjesse](https://github.com/laxjesse) ([#4019](https://github.com/dbt-labs/dbt-core/pull/4019))
- [@gitznik](https://github.com/Gitznik) ([#4124](https://github.com/dbt-labs/dbt-core/pull/4124))

3
.changes/1.0.4.md Normal file
View File

@@ -0,0 +1,3 @@
## dbt-core 1.0.4 - March 18, 2022
### Fixes
- Depend on new dbt-extractor version with fixed GitHub links to resolve Homebrew installation issues ([#4891](https://github.com/dbt-labs/dbt-core/issues/4891), [#4890](https://github.com/dbt-labs/dbt-core/pull/4890))

20
.changes/1.0.5.md Normal file
View File

@@ -0,0 +1,20 @@
## dbt-core 1.0.5 - April 20, 2022
### Fixes
- Fix bug causing empty node level meta, snapshot config errors ([#4459](https://github.com/dbt-labs/dbt-core/issues/4459), [#4726](https://github.com/dbt-labs/dbt-core/pull/4726))
- Support click versions in the v7.x series ([#4566](https://github.com/dbt-labs/dbt-core/issues/4566), [#4681](https://github.com/dbt-labs/dbt-core/pull/4681))
- Fixed a bug where nodes that depend on multiple macros couldn't be selected using `-s state:modified` ([#4678](https://github.com/dbt-labs/dbt-core/issues/4678), [#4820](https://github.com/dbt-labs/dbt-core/pull/4820))
- Catch all Requests Exceptions on deps install to attempt retries. Also log the exceptions hit. ([#4849](https://github.com/dbt-labs/dbt-core/issues/4849), [#4865](https://github.com/dbt-labs/dbt-core/pull/4865))
- Fix partial parsing bug with multiple snapshot blocks ([#4771](https://github.com/dbt-labs/dbt-core/issues/4771), [#4773](https://github.com/dbt-labs/dbt-core/pull/4773))
- Use cli_vars instead of context to create package and selector renderers ([#4876](https://github.com/dbt-labs/dbt-core/issues/4876), [#4878](https://github.com/dbt-labs/dbt-core/pull/4878))
- Catch more cases to retry package retrieval for deps pointing to the hub. Also start to cache the package requests. ([#4849](https://github.com/dbt-labs/dbt-core/issues/4849), [#4982](https://github.com/dbt-labs/dbt-core/pull/4982))
- Relax minimum supported version of MarkupSafe ([#4745](https://github.com/dbt-labs/dbt-core/issues/4745), [#5039](https://github.com/dbt-labs/dbt-core/pull/5039))
### Under the Hood
- Automate changelog generation with changie ([#4652](https://github.com/dbt-labs/dbt-core/issues/4652), [#4743](https://github.com/dbt-labs/dbt-core/pull/4743))
- Fix broken links for changelog generation and tweak GHA to only post a comment once when changelog entry is missing ([#4848](https://github.com/dbt-labs/dbt-core/issues/4848), [#4857](https://github.com/dbt-labs/dbt-core/pull/4857))
### Docs
- Resolve errors related to operations preventing DAG from generating in the docs. Also patch a spark issue to allow search to filter accurately past the missing columns. ([#4578](https://github.com/dbt-labs/dbt-core/issues/4578), [#4763](https://github.com/dbt-labs/dbt-core/pull/4763))
- backporting performance regression testing readme ([#4904](https://github.com/dbt-labs/dbt-core/issues/4904), [#5042](https://github.com/dbt-labs/dbt-core/pull/5042))
### Contributors
- [@adamantike](https://github.com/adamantike) ([#5039](https://github.com/dbt-labs/dbt-core/pull/5039))
- [@twilly](https://github.com/twilly) ([#4681](https://github.com/dbt-labs/dbt-core/pull/4681))

8
.changes/1.0.6.md Normal file
View File

@@ -0,0 +1,8 @@
## dbt-core 1.0.6 - April 27, 2022
### Fixes
- Use yaml renderer (with target context) for rendering selectors ([#5131](https://github.com/dbt-labs/dbt-core/issues/5131), [#5136](https://github.com/dbt-labs/dbt-core/pull/5136))
- Fix retry logic to return values after initial try ([#5023](https://github.com/dbt-labs/dbt-core/issues/5023), [#5137](https://github.com/dbt-labs/dbt-core/pull/5137))
- Scrub secret env vars from CommandError in exception stacktrace ([#5151](https://github.com/dbt-labs/dbt-core/issues/5151), [#5152](https://github.com/dbt-labs/dbt-core/pull/5152))
### Under the Hood
- Move package deprecation check outside of package cache ([#5068](https://github.com/dbt-labs/dbt-core/issues/5068), [#5069](https://github.com/dbt-labs/dbt-core/pull/5069))

40
.changes/README.md Normal file
View File

@@ -0,0 +1,40 @@
# CHANGELOG Automation
We use [changie](https://changie.dev/) to automate `CHANGELOG` generation. For installation and format/command specifics, see the documentation.
### Quick Tour
- All new change entries get generated under `/.changes/unreleased` as a yaml file
- `header.tpl.md` contains the contents of the entire CHANGELOG file
- `0.0.0.md` contains the contents of the footer for the entire CHANGELOG file. changie looks to be in the process of supporting a footer file the same as it supports a header file. Switch to that when available. For now, the 0.0.0 in the file name forces it to the bottom of the changelog no matter what version we are releasing.
- `.changie.yaml` contains the fields in a change, the format of a single change, as well as the format of the Contributors section for each version.
### Workflow
#### Daily workflow
Almost every code change we make associated with an issue will require a `CHANGELOG` entry. After you have created the PR in GitHub, run `changie new` and follow the command prompts to generate a yaml file with your change details. This only needs to be done once per PR.
The `changie new` command will ensure correct file format and file name. There is a one to one mapping of issues to changes. Multiple issues cannot be lumped into a single entry. If you make a mistake, the yaml file may be directly modified and saved as long as the format is preserved.
Note: If your PR has been cleared by the Core Team as not needing a changelog entry, the `Skip Changelog` label may be put on the PR to bypass the GitHub action that blacks PRs from being merged when they are missing a `CHANGELOG` entry.
#### Prerelease Workflow
These commands batch up changes in `/.changes/unreleased` to be included in this prerelease and move those files to a directory named for the release version. The `--move-dir` will be created if it does not exist and is created in `/.changes`.
```
changie batch <version> --move-dir '<version>' --prerelease 'rc1'
changie merge
```
#### Final Release Workflow
These commands batch up changes in `/.changes/unreleased` as well as `/.changes/<version>` to be included in this final release and delete all prereleases. This rolls all prereleases up into a single final release. All `yaml` files in `/unreleased` and `<version>` will be deleted at this point.
```
changie batch <version> --include '<version>' --remove-prereleases
changie merge
```
### A Note on Manual Edits & Gotchas
- Changie generates markdown files in the `.changes` directory that are parsed together with the `changie merge` command. Every time `changie merge` is run, it regenerates the entire file. For this reason, any changes made directly to `CHANGELOG.md` will be overwritten on the next run of `changie merge`.
- If changes need to be made to the `CHANGELOG.md`, make the changes to the relevant `<version>.md` file located in the `/.changes` directory. You will then run `changie merge` to regenerate the `CHANGELOG.MD`.
- Do not run `changie batch` again on released versions. Our final release workflow deletes all of the yaml files associated with individual changes. If for some reason modifications to the `CHANGELOG.md` are required after we've generated the final release `CHANGELOG.md`, the modifications need to be done manually to the `<version>.md` file in the `/.changes` directory.

6
.changes/header.tpl.md Executable file
View File

@@ -0,0 +1,6 @@
# dbt Core Changelog
- This file provides a full account of all changes to `dbt-core` and `dbt-postgres`
- Changes are listed under the (pre)release in which they first appear. Subsequent releases include changes from previous releases.
- "Breaking changes" listed under a version may require action from end users or external maintainers when upgrading to that version.
- Do not edit this file directly. This file is auto-generated using [changie](https://github.com/miniscruff/changie). For details on how to document a change, see [the contributing guide](https://github.com/dbt-labs/dbt-core/blob/main/CONTRIBUTING.md#adding-changelog-entry)

60
.changie.yaml Executable file
View File

@@ -0,0 +1,60 @@
changesDir: .changes
unreleasedDir: unreleased
headerPath: header.tpl.md
versionHeaderPath: ""
changelogPath: CHANGELOG.md
versionExt: md
versionFormat: '## dbt-core {{.Version}} - {{.Time.Format "January 02, 2006"}}'
kindFormat: '### {{.Kind}}'
changeFormat: '- {{.Body}} ([#{{.Custom.Issue}}](https://github.com/dbt-labs/dbt-core/issues/{{.Custom.Issue}}), [#{{.Custom.PR}}](https://github.com/dbt-labs/dbt-core/pull/{{.Custom.PR}}))'
kinds:
- label: Fixes
- label: Features
- label: Under the Hood
- label: Breaking Changes
- label: Docs
- label: Dependencies
custom:
- key: Author
label: GitHub Username(s) (separated by a single space if multiple)
type: string
minLength: 3
- key: Issue
label: GitHub Issue Number
type: int
minLength: 4
- key: PR
label: GitHub Pull Request Number
type: int
minLength: 4
footerFormat: |
{{- $contributorDict := dict }}
{{- /* any names added to this list should be all lowercase for later matching purposes */}}
{{- $core_team := list "emmyoop" "nathaniel-may" "gshank" "leahwicz" "chenyulinx" "stu-k" "iknox-fa" "versusfacit" "mcknight-42" "jtcohen6" "dependabot" }}
{{- range $change := .Changes }}
{{- $authorList := splitList " " $change.Custom.Author }}
{{- /* loop through all authors for a PR */}}
{{- range $author := $authorList }}
{{- $authorLower := lower $author }}
{{- /* we only want to include non-core team contributors */}}
{{- if not (has $authorLower $core_team)}}
{{- $pr := $change.Custom.PR }}
{{- /* check if this contributor has other PRs associated with them already */}}
{{- if hasKey $contributorDict $author }}
{{- $prList := get $contributorDict $author }}
{{- $prList = append $prList $pr }}
{{- $contributorDict := set $contributorDict $author $prList }}
{{- else }}
{{- $prList := list $change.Custom.PR }}
{{- $contributorDict := set $contributorDict $author $prList }}
{{- end }}
{{- end}}
{{- end}}
{{- end }}
{{- /* no indentation here for formatting so the final markdown doesn't have unneeded indentations */}}
{{- if $contributorDict}}
### Contributors
{{- range $k,$v := $contributorDict }}
- [@{{$k}}](https://github.com/{{$k}}) ({{ range $index, $element := $v }}{{if $index}}, {{end}}[#{{$element}}](https://github.com/dbt-labs/dbt-core/pull/{{$element}}){{end}})
{{- end }}
{{- end }}

View File

@@ -18,4 +18,4 @@ resolves #
- [ ] I have signed the [CLA](https://docs.getdbt.com/docs/contributor-license-agreements)
- [ ] I have run this code in development and it appears to resolve the stated issue
- [ ] This PR includes tests, or tests are not required/relevant for this PR
- [ ] I have updated the `CHANGELOG.md` and added information about my change
- [ ] I have added information about my change to be included in the [CHANGELOG](https://github.com/dbt-labs/dbt-core/blob/main/CONTRIBUTING.md#Adding-CHANGELOG-Entry).

76
.github/workflows/changelog-check.yml vendored Normal file
View File

@@ -0,0 +1,76 @@
# **what?**
# Checks that a file has been committed under the /.changes directory
# as a new CHANGELOG entry. Cannot check for a specific filename as
# it is dynamically generated by change type and timestamp.
# This workflow should not require any secrets since it runs for PRs
# from forked repos.
# By default, secrets are not passed to workflows running from
# a forked repo.
# **why?**
# Ensure code change gets reflected in the CHANGELOG.
# **when?**
# This will run for all PRs going into main and *.latest.
name: Check Changelog Entry
on:
pull_request:
workflow_dispatch:
defaults:
run:
shell: bash
permissions:
contents: read
pull-requests: write
env:
changelog_comment: 'Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see [the contributing guide](https://github.com/dbt-labs/dbt-core/blob/main/CONTRIBUTING.md#adding-changelog-entry).'
jobs:
changelog:
name: changelog
runs-on: ubuntu-latest
steps:
- name: Check if changelog file was added
# https://github.com/marketplace/actions/paths-changes-filter
# For each filter, it sets output variable named by the filter to the text:
# 'true' - if any of changed files matches any of filter rules
# 'false' - if none of changed files matches any of filter rules
# also, returns:
# `changes` - JSON array with names of all filters matching any of the changed files
uses: dorny/paths-filter@v2
id: filter
with:
token: ${{ secrets.GITHUB_TOKEN }}
filters: |
changelog:
- added: '.changes/unreleased/**.yaml'
- name: Check if comment already exists
uses: peter-evans/find-comment@v1
id: changelog_comment
with:
issue-number: ${{ github.event.pull_request.number }}
comment-author: 'github-actions[bot]'
body-includes: ${{ env.changelog_comment }}
- name: Create PR comment if changelog entry is missing, required, and does nto exist
if: |
steps.filter.outputs.changelog == 'false' &&
!contains( github.event.pull_request.labels.*.name, 'Skip Changelog') &&
steps.changelog_comment.outputs.comment-body == ''
uses: peter-evans/create-or-update-comment@v1
with:
issue-number: ${{ github.event.pull_request.number }}
body: ${{ env.changelog_comment }}
- name: Fail job if changelog entry is missing and required
if: |
steps.filter.outputs.changelog == 'false' &&
!contains( github.event.pull_request.labels.*.name, 'Skip Changelog')
uses: actions/github-script@v6
with:
script: core.setFailed('Changelog entry required to merge.')

View File

@@ -95,7 +95,9 @@ jobs:
- uses: actions/upload-artifact@v2
with:
name: dist
path: dist/
path: |
dist/
!dist/dbt-${{github.event.inputs.version_number}}.tar.gz
test-build:
name: verify packages

View File

@@ -0,0 +1,71 @@
# This Action checks makes a dbt run to sample json structured logs
# and checks that they conform to the currently documented schema.
#
# If this action fails it either means we have unintentionally deviated
# from our documented structured logging schema, or we need to bump the
# version of our structured logging and add new documentation to
# communicate these changes.
name: Structured Logging Schema Check
on:
push:
branches:
- "main"
- "*.latest"
- "releases/*"
pull_request:
workflow_dispatch:
permissions: read-all
jobs:
# run the performance measurements on the current or default branch
test-schema:
name: Test Log Schema
runs-on: ubuntu-latest
env:
# turns warnings into errors
RUSTFLAGS: "-D warnings"
# points tests to the log file
LOG_DIR: "/home/runner/work/dbt-core/dbt-core/logs"
# tells integration tests to output into json format
DBT_LOG_FORMAT: 'json'
steps:
- name: checkout dev
uses: actions/checkout@v2
with:
persist-credentials: false
- name: Setup Python
uses: actions/setup-python@v2.2.2
with:
python-version: "3.8"
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- name: install dbt
run: pip install -r dev-requirements.txt -r editable-requirements.txt
- name: Set up postgres
uses: ./.github/actions/setup-postgres-linux
- name: ls
run: ls
# integration tests generate a ton of logs in different files. the next step will find them all.
# we actually care if these pass, because the normal test run doesn't usually include many json log outputs
- name: Run integration tests
run: tox -e py38-postgres -- -nauto
# apply our schema tests to every log event from the previous step
# skips any output that isn't valid json
- uses: actions-rs/cargo@v1
with:
command: run
args: --manifest-path test/interop/log_parsing/Cargo.toml

3394
CHANGELOG.md Normal file → Executable file

File diff suppressed because it is too large Load Diff

View File

@@ -226,6 +226,15 @@ python -m pytest test/unit/test_graph.py::GraphTest::test__dependency_list
```
> [Here](https://docs.pytest.org/en/reorganize-docs/new-docs/user/commandlineuseful.html)
> is a list of useful command-line options for `pytest` to use while developing.
## Adding CHANGELOG Entry
We use [changie](https://changie.dev) to generate `CHANGELOG` entries. Do not edit the `CHANGELOG.md` directly. Your modifications will be lost.
Follow the steps to [install `changie`](https://changie.dev/guide/installation/) for your system.
Once changie is installed and your PR is created, simply run `changie new` and changie will walk you through the process of creating a changelog entry. Commit the file that's created and your changelog entry is complete!
## Submitting a Pull Request
dbt Labs provides a CI environment to test changes to specific adapters, and periodic maintenance checks of `dbt-core` through Github Actions. For example, if you submit a pull request to the `dbt-redshift` repo, GitHub will trigger automated code checks and tests against Redshift.

View File

@@ -10,6 +10,7 @@
</a>
</p>
**[dbt](https://www.getdbt.com/)** enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
![architecture](https://raw.githubusercontent.com/dbt-labs/dbt-core/6c6649f9129d5d108aa3b0526f634cd8f3a9d1ed/etc/dbt-arch.png)

View File

@@ -39,7 +39,7 @@ from dbt.adapters.base.relation import (
ComponentName, BaseRelation, InformationSchema, SchemaSearchMap
)
from dbt.adapters.base import Column as BaseColumn
from dbt.adapters.cache import RelationsCache
from dbt.adapters.cache import RelationsCache, _make_key
SeedModel = Union[ParsedSeedNode, CompiledSeedNode]
@@ -291,7 +291,7 @@ class BaseAdapter(metaclass=AdapterMeta):
if (database, schema) not in self.cache:
fire_event(
CacheMiss(
conn_name=self.nice_connection_name,
conn_name=self.nice_connection_name(),
database=database,
schema=schema
)
@@ -676,7 +676,11 @@ class BaseAdapter(metaclass=AdapterMeta):
relations = self.list_relations_without_caching(
schema_relation
)
fire_event(ListRelations(database=database, schema=schema, relations=relations))
fire_event(ListRelations(
database=database,
schema=schema,
relations=[_make_key(x) for x in relations]
))
return relations

View File

@@ -1,10 +1,10 @@
import threading
from collections import namedtuple
from copy import deepcopy
from typing import Any, Dict, Iterable, List, Optional, Set, Tuple
from dbt.adapters.reference_keys import _make_key, _ReferenceKey
import dbt.exceptions
from dbt.events.functions import fire_event
from dbt.events.functions import fire_event, fire_event_if
from dbt.events.types import (
AddLink,
AddRelation,
@@ -20,20 +20,9 @@ from dbt.events.types import (
UncachedRelation,
UpdateReference
)
import dbt.flags as flags
from dbt.utils import lowercase
_ReferenceKey = namedtuple('_ReferenceKey', 'database schema identifier')
def _make_key(relation) -> _ReferenceKey:
"""Make _ReferenceKeys with lowercase values for the cache so we don't have
to keep track of quoting
"""
# databases and schemas can both be None
return _ReferenceKey(lowercase(relation.database),
lowercase(relation.schema),
lowercase(relation.identifier))
def dot_separated(key: _ReferenceKey) -> str:
"""Return the key in dot-separated string form.
@@ -334,12 +323,12 @@ class RelationsCache:
:param BaseRelation relation: The underlying relation.
"""
cached = _CachedRelation(relation)
fire_event(AddRelation(relation=cached))
fire_event(DumpBeforeAddGraph(dump=self.dump_graph()))
fire_event(AddRelation(relation=_make_key(cached)))
fire_event_if(flags.LOG_CACHE_EVENTS, lambda: DumpBeforeAddGraph(dump=self.dump_graph()))
with self.lock:
self._setdefault(cached)
fire_event(DumpAfterAddGraph(dump=self.dump_graph()))
fire_event_if(flags.LOG_CACHE_EVENTS, lambda: DumpAfterAddGraph(dump=self.dump_graph()))
def _remove_refs(self, keys):
"""Removes all references to all entries in keys. This does not
@@ -452,8 +441,10 @@ class RelationsCache:
old_key = _make_key(old)
new_key = _make_key(new)
fire_event(RenameSchema(old_key=old_key, new_key=new_key))
fire_event(DumpBeforeRenameSchema(dump=self.dump_graph()))
fire_event_if(
flags.LOG_CACHE_EVENTS,
lambda: DumpBeforeRenameSchema(dump=self.dump_graph())
)
with self.lock:
if self._check_rename_constraints(old_key, new_key):
@@ -461,7 +452,10 @@ class RelationsCache:
else:
self._setdefault(_CachedRelation(new))
fire_event(DumpAfterRenameSchema(dump=self.dump_graph()))
fire_event_if(
flags.LOG_CACHE_EVENTS,
lambda: DumpAfterRenameSchema(dump=self.dump_graph())
)
def get_relations(
self, database: Optional[str], schema: Optional[str]

View File

@@ -0,0 +1,24 @@
# this module exists to resolve circular imports with the events module
from collections import namedtuple
from typing import Optional
_ReferenceKey = namedtuple('_ReferenceKey', 'database schema identifier')
def lowercase(value: Optional[str]) -> Optional[str]:
if value is None:
return None
else:
return value.lower()
def _make_key(relation) -> _ReferenceKey:
"""Make _ReferenceKeys with lowercase values for the cache so we don't have
to keep track of quoting
"""
# databases and schemas can both be None
return _ReferenceKey(lowercase(relation.database),
lowercase(relation.schema),
lowercase(relation.identifier))

View File

@@ -75,7 +75,8 @@ class SQLConnectionManager(BaseConnectionManager):
fire_event(
SQLQueryStatus(
status=str(self.get_response(cursor)), elapsed=round((time.time() - pre), 2)
status=str(self.get_response(cursor)),
elapsed=round((time.time() - pre), 2)
)
)

View File

@@ -5,6 +5,7 @@ import dbt.clients.agate_helper
from dbt.contracts.connection import Connection
import dbt.exceptions
from dbt.adapters.base import BaseAdapter, available
from dbt.adapters.cache import _make_key
from dbt.adapters.sql import SQLConnectionManager
from dbt.events.functions import fire_event
from dbt.events.types import ColTypeChange, SchemaCreation, SchemaDrop
@@ -182,7 +183,7 @@ class SQLAdapter(BaseAdapter):
def create_schema(self, relation: BaseRelation) -> None:
relation = relation.without_identifier()
fire_event(SchemaCreation(relation=relation))
fire_event(SchemaCreation(relation=_make_key(relation)))
kwargs = {
'relation': relation,
}
@@ -193,7 +194,7 @@ class SQLAdapter(BaseAdapter):
def drop_schema(self, relation: BaseRelation) -> None:
relation = relation.without_identifier()
fire_event(SchemaDrop(relation=relation))
fire_event(SchemaDrop(relation=_make_key(relation)))
kwargs = {
'relation': relation,
}

View File

@@ -13,6 +13,18 @@ from dbt.exceptions import RuntimeException
BOM = BOM_UTF8.decode('utf-8') # '\ufeff'
class Number(agate.data_types.Number):
# undo the change in https://github.com/wireservice/agate/pull/733
# i.e. do not cast True and False to numeric 1 and 0
def cast(self, d):
if type(d) == bool:
raise agate.exceptions.CastError(
'Do not cast True to 1 or False to 0.'
)
else:
return super().cast(d)
class ISODateTime(agate.data_types.DateTime):
def cast(self, d):
# this is agate.data_types.DateTime.cast with the "clever" bits removed
@@ -41,7 +53,7 @@ def build_type_tester(
) -> agate.TypeTester:
types = [
agate.data_types.Number(null_values=('null', '')),
Number(null_values=('null', '')),
agate.data_types.Date(null_values=('null', ''),
date_format='%Y-%m-%d'),
agate.data_types.DateTime(null_values=('null', ''),

View File

@@ -4,11 +4,21 @@ import os.path
from dbt.clients.system import run_cmd, rmdir
from dbt.events.functions import fire_event
from dbt.events.types import (
GitSparseCheckoutSubdirectory, GitProgressCheckoutRevision,
GitProgressUpdatingExistingDependency, GitProgressPullingNewDependency,
GitNothingToDo, GitProgressUpdatedCheckoutRange, GitProgressCheckedOutAt
GitSparseCheckoutSubdirectory,
GitProgressCheckoutRevision,
GitProgressUpdatingExistingDependency,
GitProgressPullingNewDependency,
GitNothingToDo,
GitProgressUpdatedCheckoutRange,
GitProgressCheckedOutAt,
)
from dbt.exceptions import (
CommandResultError,
RuntimeException,
bad_package_spec,
raise_git_cloning_error,
raise_git_cloning_problem,
)
import dbt.exceptions
from packaging import version
@@ -18,23 +28,23 @@ def _is_commit(revision: str) -> bool:
def _raise_git_cloning_error(repo, revision, error):
stderr = error.stderr.decode('utf-8').strip()
if 'usage: git' in stderr:
stderr = stderr.split('\nusage: git')[0]
stderr = error.stderr.strip()
if "usage: git" in stderr:
stderr = stderr.split("\nusage: git")[0]
if re.match("fatal: destination path '(.+)' already exists", stderr):
raise error
raise_git_cloning_error(error)
dbt.exceptions.bad_package_spec(repo, revision, stderr)
bad_package_spec(repo, revision, stderr)
def clone(repo, cwd, dirname=None, remove_git_dir=False, revision=None, subdirectory=None):
has_revision = revision is not None
is_commit = _is_commit(revision or "")
clone_cmd = ['git', 'clone', '--depth', '1']
clone_cmd = ["git", "clone", "--depth", "1"]
if subdirectory:
fire_event(GitSparseCheckoutSubdirectory(subdir=subdirectory))
out, _ = run_cmd(cwd, ['git', '--version'], env={'LC_ALL': 'C'})
out, _ = run_cmd(cwd, ["git", "--version"], env={"LC_ALL": "C"})
git_version = version.parse(re.search(r"\d+\.\d+\.\d+", out.decode("utf-8")).group(0))
if not git_version >= version.parse("2.25.0"):
# 2.25.0 introduces --sparse
@@ -42,37 +52,37 @@ def clone(repo, cwd, dirname=None, remove_git_dir=False, revision=None, subdirec
"Please update your git version to pull a dbt package "
"from a subdirectory: your version is {}, >= 2.25.0 needed".format(git_version)
)
clone_cmd.extend(['--filter=blob:none', '--sparse'])
clone_cmd.extend(["--filter=blob:none", "--sparse"])
if has_revision and not is_commit:
clone_cmd.extend(['--branch', revision])
clone_cmd.extend(["--branch", revision])
clone_cmd.append(repo)
if dirname is not None:
clone_cmd.append(dirname)
try:
result = run_cmd(cwd, clone_cmd, env={'LC_ALL': 'C'})
except dbt.exceptions.CommandResultError as exc:
result = run_cmd(cwd, clone_cmd, env={"LC_ALL": "C"})
except CommandResultError as exc:
_raise_git_cloning_error(repo, revision, exc)
if subdirectory:
cwd_subdir = os.path.join(cwd, dirname or '')
clone_cmd_subdir = ['git', 'sparse-checkout', 'set', subdirectory]
cwd_subdir = os.path.join(cwd, dirname or "")
clone_cmd_subdir = ["git", "sparse-checkout", "set", subdirectory]
try:
run_cmd(cwd_subdir, clone_cmd_subdir)
except dbt.exceptions.CommandResultError as exc:
except CommandResultError as exc:
_raise_git_cloning_error(repo, revision, exc)
if remove_git_dir:
rmdir(os.path.join(dirname, '.git'))
rmdir(os.path.join(dirname, ".git"))
return result
def list_tags(cwd):
out, err = run_cmd(cwd, ['git', 'tag', '--list'], env={'LC_ALL': 'C'})
tags = out.decode('utf-8').strip().split("\n")
out, err = run_cmd(cwd, ["git", "tag", "--list"], env={"LC_ALL": "C"})
tags = out.decode("utf-8").strip().split("\n")
return tags
@@ -84,44 +94,44 @@ def _checkout(cwd, repo, revision):
if _is_commit(revision):
run_cmd(cwd, fetch_cmd + [revision])
else:
run_cmd(cwd, ['git', 'remote', 'set-branches', 'origin', revision])
run_cmd(cwd, ["git", "remote", "set-branches", "origin", revision])
run_cmd(cwd, fetch_cmd + ["--tags", revision])
if _is_commit(revision):
spec = revision
# Prefer tags to branches if one exists
elif revision in list_tags(cwd):
spec = 'tags/{}'.format(revision)
spec = "tags/{}".format(revision)
else:
spec = 'origin/{}'.format(revision)
spec = "origin/{}".format(revision)
out, err = run_cmd(cwd, ['git', 'reset', '--hard', spec],
env={'LC_ALL': 'C'})
out, err = run_cmd(cwd, ["git", "reset", "--hard", spec], env={"LC_ALL": "C"})
return out, err
def checkout(cwd, repo, revision=None):
if revision is None:
revision = 'HEAD'
revision = "HEAD"
try:
return _checkout(cwd, repo, revision)
except dbt.exceptions.CommandResultError as exc:
stderr = exc.stderr.decode('utf-8').strip()
dbt.exceptions.bad_package_spec(repo, revision, stderr)
except CommandResultError as exc:
stderr = exc.stderr.strip()
bad_package_spec(repo, revision, stderr)
def get_current_sha(cwd):
out, err = run_cmd(cwd, ['git', 'rev-parse', 'HEAD'], env={'LC_ALL': 'C'})
out, err = run_cmd(cwd, ["git", "rev-parse", "HEAD"], env={"LC_ALL": "C"})
return out.decode('utf-8')
return out.decode("utf-8")
def remove_remote(cwd):
return run_cmd(cwd, ['git', 'remote', 'rm', 'origin'], env={'LC_ALL': 'C'})
return run_cmd(cwd, ["git", "remote", "rm", "origin"], env={"LC_ALL": "C"})
def clone_and_checkout(repo, cwd, dirname=None, remove_git_dir=False,
revision=None, subdirectory=None):
def clone_and_checkout(
repo, cwd, dirname=None, remove_git_dir=False, revision=None, subdirectory=None
):
exists = None
try:
_, err = clone(
@@ -131,14 +141,11 @@ def clone_and_checkout(repo, cwd, dirname=None, remove_git_dir=False,
remove_git_dir=remove_git_dir,
subdirectory=subdirectory,
)
except dbt.exceptions.CommandResultError as exc:
err = exc.stderr.decode('utf-8')
except CommandResultError as exc:
err = exc.stderr
exists = re.match("fatal: destination path '(.+)' already exists", err)
if not exists:
print(
'\nSomething went wrong while cloning {}'.format(repo) +
'\nCheck the debug logs for more information')
raise
raise_git_cloning_problem(repo)
directory = None
start_sha = None
@@ -146,11 +153,9 @@ def clone_and_checkout(repo, cwd, dirname=None, remove_git_dir=False,
directory = exists.group(1)
fire_event(GitProgressUpdatingExistingDependency(dir=directory))
else:
matches = re.match("Cloning into '(.+)'", err.decode('utf-8'))
matches = re.match("Cloning into '(.+)'", err.decode("utf-8"))
if matches is None:
raise dbt.exceptions.RuntimeException(
f'Error cloning {repo} - never saw "Cloning into ..." from git'
)
raise RuntimeException(f'Error cloning {repo} - never saw "Cloning into ..." from git')
directory = matches.group(1)
fire_event(GitProgressPullingNewDependency(dir=directory))
full_path = os.path.join(cwd, directory)
@@ -161,9 +166,9 @@ def clone_and_checkout(repo, cwd, dirname=None, remove_git_dir=False,
if start_sha == end_sha:
fire_event(GitNothingToDo(sha=start_sha[:7]))
else:
fire_event(GitProgressUpdatedCheckoutRange(
start_sha=start_sha[:7], end_sha=end_sha[:7]
))
fire_event(
GitProgressUpdatedCheckoutRange(start_sha=start_sha[:7], end_sha=end_sha[:7])
)
else:
fire_event(GitProgressCheckedOutAt(end_sha=end_sha[:7]))
return os.path.join(directory, subdirectory or '')
return os.path.join(directory, subdirectory or "")

View File

@@ -1,9 +1,16 @@
import functools
from typing import Any, Dict, List
import requests
from dbt.events.functions import fire_event
from dbt.events.types import (
RegistryProgressMakingGETRequest,
RegistryProgressGETResponse
RegistryProgressGETResponse,
RegistryIndexProgressMakingGETRequest,
RegistryIndexProgressGETResponse,
RegistryResponseUnexpectedType,
RegistryResponseMissingTopKeys,
RegistryResponseMissingNestedKeys,
RegistryResponseExtraNestedKeys,
)
from dbt.utils import memoized, _connection_exception_retry as connection_exception_retry
from dbt import deprecations
@@ -15,51 +22,87 @@ else:
DEFAULT_REGISTRY_BASE_URL = 'https://hub.getdbt.com/'
def _get_url(url, registry_base_url=None):
def _get_url(name, registry_base_url=None):
if registry_base_url is None:
registry_base_url = DEFAULT_REGISTRY_BASE_URL
url = "api/v1/{}.json".format(name)
return '{}{}'.format(registry_base_url, url)
def _get_with_retries(path, registry_base_url=None):
get_fn = functools.partial(_get, path, registry_base_url)
def _get_with_retries(package_name, registry_base_url=None):
get_fn = functools.partial(_get, package_name, registry_base_url)
return connection_exception_retry(get_fn, 5)
def _get(path, registry_base_url=None):
url = _get_url(path, registry_base_url)
def _get(package_name, registry_base_url=None):
url = _get_url(package_name, registry_base_url)
fire_event(RegistryProgressMakingGETRequest(url=url))
# all exceptions from requests get caught in the retry logic so no need to wrap this here
resp = requests.get(url, timeout=30)
fire_event(RegistryProgressGETResponse(url=url, resp_code=resp.status_code))
resp.raise_for_status()
if resp is None:
raise requests.exceptions.ContentDecodingError(
'Request error: The response is None', response=resp
# The response should always be a dictionary. Anything else is unexpected, raise error.
# Raising this error will cause this function to retry (if called within _get_with_retries)
# and hopefully get a valid response. This seems to happen when there's an issue with the Hub.
# Since we control what we expect the HUB to return, this is safe.
# See https://github.com/dbt-labs/dbt-core/issues/4577
# and https://github.com/dbt-labs/dbt-core/issues/4849
response = resp.json()
if not isinstance(response, dict): # This will also catch Nonetype
error_msg = (
f"Request error: Expected a response type of <dict> but got {type(response)} instead"
)
return resp.json()
fire_event(RegistryResponseUnexpectedType(response=response))
raise requests.exceptions.ContentDecodingError(error_msg, response=resp)
# check for expected top level keys
expected_keys = {"name", "versions"}
if not expected_keys.issubset(response):
error_msg = (
f"Request error: Expected the response to contain keys {expected_keys} "
f"but is missing {expected_keys.difference(set(response))}"
)
fire_event(RegistryResponseMissingTopKeys(response=response))
raise requests.exceptions.ContentDecodingError(error_msg, response=resp)
# check for the keys we need nested under each version
expected_version_keys = {"name", "packages", "downloads"}
all_keys = set().union(*(response["versions"][d] for d in response["versions"]))
if not expected_version_keys.issubset(all_keys):
error_msg = (
"Request error: Expected the response for the version to contain keys "
f"{expected_version_keys} but is missing {expected_version_keys.difference(all_keys)}"
)
fire_event(RegistryResponseMissingNestedKeys(response=response))
raise requests.exceptions.ContentDecodingError(error_msg, response=resp)
# all version responses should contain identical keys.
has_extra_keys = set().difference(*(response["versions"][d] for d in response["versions"]))
if has_extra_keys:
error_msg = (
"Request error: Keys for all versions do not match. Found extra key(s) "
f"of {has_extra_keys}."
)
fire_event(RegistryResponseExtraNestedKeys(response=response))
raise requests.exceptions.ContentDecodingError(error_msg, response=resp)
return response
def index(registry_base_url=None):
return _get_with_retries('api/v1/index.json', registry_base_url)
_get_cached = memoized(_get_with_retries)
index_cached = memoized(index)
def packages(registry_base_url=None):
return _get_with_retries('api/v1/packages.json', registry_base_url)
def package(name, registry_base_url=None):
response = _get_with_retries('api/v1/{}.json'.format(name), registry_base_url)
def package(package_name, registry_base_url=None) -> Dict[str, Any]:
# returns a dictionary of metadata for all versions of a package
response = _get_cached(package_name, registry_base_url)
# Either redirectnamespace or redirectname in the JSON response indicate a redirect
# redirectnamespace redirects based on package ownership
# redirectname redirects based on package name
# Both can be present at the same time, or neither. Fails gracefully to old name
if ('redirectnamespace' in response) or ('redirectname' in response):
if ("redirectnamespace" in response) or ("redirectname" in response):
if ('redirectnamespace' in response) and response['redirectnamespace'] is not None:
use_namespace = response['redirectnamespace']
@@ -72,15 +115,49 @@ def package(name, registry_base_url=None):
use_name = response['name']
new_nwo = use_namespace + "/" + use_name
deprecations.warn('package-redirect', old_name=name, new_name=new_nwo)
deprecations.warn("package-redirect", old_name=package_name, new_name=new_nwo)
return response["versions"]
def package_version(package_name, version, registry_base_url=None) -> Dict[str, Any]:
# returns the metadata of a specific version of a package
response = package(package_name, registry_base_url)
return response[version]
def get_available_versions(package_name) -> List["str"]:
# returns a list of all available versions of a package
response = package(package_name)
return list(response)
def _get_index(registry_base_url=None):
url = _get_url("index", registry_base_url)
fire_event(RegistryIndexProgressMakingGETRequest(url=url))
# all exceptions from requests get caught in the retry logic so no need to wrap this here
resp = requests.get(url, timeout=30)
fire_event(RegistryIndexProgressGETResponse(url=url, resp_code=resp.status_code))
resp.raise_for_status()
# The response should be a list. Anything else is unexpected, raise an error.
# Raising this error will cause this function to retry and hopefully get a valid response.
response = resp.json()
if not isinstance(response, list): # This will also catch Nonetype
error_msg = (
f"Request error: The response type of {type(response)} is not valid: {resp.text}"
)
raise requests.exceptions.ContentDecodingError(error_msg, response=resp)
return response
def package_version(name, version, registry_base_url=None):
return _get_with_retries('api/v1/{}/{}.json'.format(name, version), registry_base_url)
def index(registry_base_url=None) -> List[str]:
# this returns a list of all packages on the Hub
get_index_fn = functools.partial(_get_index, registry_base_url)
return connection_exception_retry(get_index_fn, 5)
def get_available_versions(name):
response = package(name)
return list(response['versions'])
index_cached = memoized(index)

View File

@@ -485,7 +485,7 @@ def untar_package(
) -> None:
tar_path = convert_path(tar_path)
tar_dir_name = None
with tarfile.open(tar_path, 'r') as tarball:
with tarfile.open(tar_path, 'r:gz') as tarball:
tarball.extractall(dest_dir)
tar_dir_name = os.path.commonprefix(tarball.getnames())
if rename_to:

View File

@@ -45,7 +45,7 @@ INVALID_VERSION_ERROR = """\
This version of dbt is not supported with the '{package}' package.
Installed version of dbt: {installed}
Required version of dbt for '{package}': {version_spec}
Check the requirements for the '{package}' package, or run dbt again with \
Check for a different version of the '{package}' package, or run dbt again with \
--no-version-check
"""
@@ -54,7 +54,7 @@ IMPOSSIBLE_VERSION_ERROR = """\
The package version requirement can never be satisfied for the '{package}
package.
Required versions of dbt for '{package}': {version_spec}
Check the requirements for the '{package}' package, or run dbt again with \
Check for a different version of the '{package}' package, or run dbt again with \
--no-version-check
"""

View File

@@ -122,11 +122,9 @@ class DbtProjectYamlRenderer(BaseRenderer):
def name(self):
'Project config'
# Uses SecretRenderer
def get_package_renderer(self) -> BaseRenderer:
return PackageRenderer(self.context)
def get_selector_renderer(self) -> BaseRenderer:
return SelectorRenderer(self.context)
return PackageRenderer(self.ctx_obj.cli_vars)
def render_project(
self,
@@ -144,8 +142,7 @@ class DbtProjectYamlRenderer(BaseRenderer):
return package_renderer.render_data(packages)
def render_selectors(self, selectors: Dict[str, Any]):
selector_renderer = self.get_selector_renderer()
return selector_renderer.render_data(selectors)
return self.render_data(selectors)
def render_entry(self, value: Any, keypath: Keypath) -> Any:
result = super().render_entry(value, keypath)
@@ -176,20 +173,10 @@ class DbtProjectYamlRenderer(BaseRenderer):
return True
class SelectorRenderer(BaseRenderer):
@property
def name(self):
return 'Selector config'
class SecretRenderer(BaseRenderer):
def __init__(
self, cli_vars: Optional[Dict[str, Any]] = None
) -> None:
def __init__(self, cli_vars: Dict[str, Any] = {}) -> None:
# Generate contexts here because we want to save the context
# object in order to retrieve the env_vars.
if cli_vars is None:
cli_vars = {}
self.ctx_obj = SecretContext(cli_vars)
context = self.ctx_obj.to_dict()
super().__init__(context)

View File

@@ -1,7 +1,7 @@
import itertools
import os
from copy import deepcopy
from dataclasses import dataclass, fields
from dataclasses import dataclass
from pathlib import Path
from typing import (
Dict, Any, Optional, Mapping, Iterator, Iterable, Tuple, List, MutableSet,
@@ -13,20 +13,17 @@ from .project import Project
from .renderer import DbtProjectYamlRenderer, ProfileRenderer
from .utils import parse_cli_vars
from dbt import flags
from dbt import tracking
from dbt.adapters.factory import get_relation_class_by_name, get_include_paths
from dbt.helper_types import FQNPath, PathSet
from dbt.helper_types import FQNPath, PathSet, DictDefaultEmptyStr
from dbt.config.profile import read_user_config
from dbt.contracts.connection import AdapterRequiredConfig, Credentials
from dbt.contracts.graph.manifest import ManifestMetadata
from dbt.contracts.relation import ComponentName
from dbt.events.types import ProfileLoadError, ProfileNotFound
from dbt.events.functions import fire_event
from dbt.ui import warning_tag
from dbt.contracts.project import Configuration, UserConfig
from dbt.exceptions import (
RuntimeException,
DbtProfileError,
DbtProjectError,
validator_error_message,
warn_or_error,
@@ -191,6 +188,7 @@ class RuntimeConfig(Project, Profile, AdapterRequiredConfig):
profile_renderer: ProfileRenderer,
profile_name: Optional[str],
) -> Profile:
return Profile.render_from_args(
args, profile_renderer, profile_name
)
@@ -412,27 +410,18 @@ class UnsetCredentials(Credentials):
return ()
class UnsetConfig(UserConfig):
def __getattribute__(self, name):
if name in {f.name for f in fields(UserConfig)}:
raise AttributeError(
f"'UnsetConfig' object has no attribute {name}"
)
def __post_serialize__(self, dct):
return {}
# This is used by UnsetProfileConfig, for commands which do
# not require a profile, i.e. dbt deps and clean
class UnsetProfile(Profile):
def __init__(self):
self.credentials = UnsetCredentials()
self.user_config = UnsetConfig()
self.user_config = UserConfig() # This will be read in _get_rendered_profile
self.profile_name = ''
self.target_name = ''
self.threads = -1
def to_target_dict(self):
return {}
return DictDefaultEmptyStr({})
def __getattribute__(self, name):
if name in {'profile_name', 'target_name', 'threads'}:
@@ -443,6 +432,8 @@ class UnsetProfile(Profile):
return Profile.__getattribute__(self, name)
# This class is used by the dbt deps and clean commands, because they don't
# require a functioning profile.
@dataclass
class UnsetProfileConfig(RuntimeConfig):
"""This class acts a lot _like_ a RuntimeConfig, except if your profile is
@@ -469,7 +460,7 @@ class UnsetProfileConfig(RuntimeConfig):
def to_target_dict(self):
# re-override the poisoned profile behavior
return {}
return DictDefaultEmptyStr({})
@classmethod
def from_parts(
@@ -525,7 +516,7 @@ class UnsetProfileConfig(RuntimeConfig):
profile_env_vars=profile.profile_env_vars,
profile_name='',
target_name='',
user_config=UnsetConfig(),
user_config=UserConfig(),
threads=getattr(args, 'threads', 1),
credentials=UnsetCredentials(),
args=args,
@@ -540,17 +531,12 @@ class UnsetProfileConfig(RuntimeConfig):
profile_renderer: ProfileRenderer,
profile_name: Optional[str],
) -> Profile:
try:
profile = Profile.render_from_args(
args, profile_renderer, profile_name
)
except (DbtProjectError, DbtProfileError) as exc:
fire_event(ProfileLoadError(exc=exc))
fire_event(ProfileNotFound(profile_name=profile_name))
# return the poisoned form
profile = UnsetProfile()
# disable anonymous usage statistics
tracking.disable_tracking()
profile = UnsetProfile()
# The profile (for warehouse connection) is not needed, but we want
# to get the UserConfig, which is also in profiles.yml
user_config = read_user_config(flags.PROFILES_DIR)
profile.user_config = user_config
return profile
@classmethod
@@ -565,9 +551,6 @@ class UnsetProfileConfig(RuntimeConfig):
:raises ValidationException: If the cli variables are invalid.
"""
project, profile = cls.collect_parts(args)
if not isinstance(profile, UnsetProfile):
# if it's a real profile, return a real config
cls = RuntimeConfig
return cls.from_parts(
project=project,

View File

@@ -5,7 +5,7 @@ from dbt.clients.yaml_helper import ( # noqa: F401
)
from dbt.dataclass_schema import ValidationError
from .renderer import SelectorRenderer
from .renderer import BaseRenderer
from dbt.clients.system import (
load_file_contents,
@@ -60,8 +60,8 @@ class SelectorConfig(Dict[str, Dict[str, Union[SelectionSpec, bool]]]):
def render_from_dict(
cls,
data: Dict[str, Any],
renderer: SelectorRenderer,
) -> 'SelectorConfig':
renderer: BaseRenderer,
) -> "SelectorConfig":
try:
rendered = renderer.render_data(data)
except (ValidationError, RuntimeException) as exc:
@@ -73,8 +73,10 @@ class SelectorConfig(Dict[str, Dict[str, Union[SelectionSpec, bool]]]):
@classmethod
def from_path(
cls, path: Path, renderer: SelectorRenderer,
) -> 'SelectorConfig':
cls,
path: Path,
renderer: BaseRenderer,
) -> "SelectorConfig":
try:
data = load_yaml_text(load_file_contents(str(path)))
except (ValidationError, RuntimeException) as exc:

View File

@@ -1186,10 +1186,12 @@ class ProviderContext(ManifestContext):
# If this is compiling, do not save because it's irrelevant to parsing.
if self.model and not hasattr(self.model, 'compiled'):
self.manifest.env_vars[var] = return_value
source_file = self.manifest.files[self.model.file_id]
# Schema files should never get here
if source_file.parse_file_type != 'schema':
source_file.env_vars.append(var)
# hooks come from dbt_project.yml which doesn't have a real file_id
if self.model.file_id in self.manifest.files:
source_file = self.manifest.files[self.model.file_id]
# Schema files should never get here
if source_file.parse_file_type != 'schema':
source_file.env_vars.append(var)
return return_value
else:
msg = f"Env var required but not provided: '{var}'"

View File

@@ -4,6 +4,7 @@ from typing import Any, Dict, Optional
from .base import BaseContext, contextmember
from dbt.exceptions import raise_parsing_error
from dbt.logger import SECRET_ENV_PREFIX
class SecretContext(BaseContext):
@@ -27,7 +28,11 @@ class SecretContext(BaseContext):
return_value = default
if return_value is not None:
self.env_vars[var] = return_value
# do not save secret environment variables
if not var.startswith(SECRET_ENV_PREFIX):
self.env_vars[var] = return_value
# return the value even if its a secret
return return_value
else:
msg = f"Env var required but not provided: '{var}'"

View File

@@ -153,7 +153,6 @@ class ParsedNodeMixins(dbtClassMixin):
self.created_at = time.time()
self.description = patch.description
self.columns = patch.columns
self.meta = patch.meta
self.docs = patch.docs
def get_materialization(self):
@@ -431,6 +430,10 @@ class ParsedSingularTestNode(ParsedNode):
# refactor the various configs.
config: TestConfig = field(default_factory=TestConfig) # type: ignore
@property
def test_node_type(self):
return 'singular'
@dataclass
class ParsedGenericTestNode(ParsedNode, HasTestMetadata):
@@ -452,6 +455,10 @@ class ParsedGenericTestNode(ParsedNode, HasTestMetadata):
True
)
@property
def test_node_type(self):
return 'generic'
@dataclass
class IntermediateSnapshotNode(ParsedNode):

View File

@@ -18,6 +18,18 @@ DEFAULT_SEND_ANONYMOUS_USAGE_STATS = True
class Name(ValidatedStringMixin):
ValidationRegex = r'^[^\d\W]\w*$'
@classmethod
def is_valid(cls, value: Any) -> bool:
if not isinstance(value, str):
return False
try:
cls.validate(value)
except ValidationError:
return False
return True
register_pattern(Name, r'^[^\d\W]\w*$')
@@ -231,7 +243,7 @@ class UserConfig(ExtensibleDbtClassMixin, Replaceable, UserConfigContract):
printer_width: Optional[int] = None
write_json: Optional[bool] = None
warn_error: Optional[bool] = None
log_format: Optional[bool] = None
log_format: Optional[str] = None
debug: Optional[bool] = None
version_check: Optional[bool] = None
fail_fast: Optional[bool] = None

View File

@@ -14,7 +14,8 @@ class PreviousState:
manifest_path = self.path / 'manifest.json'
if manifest_path.exists() and manifest_path.is_file():
try:
self.manifest = WritableManifest.read(str(manifest_path))
# we want to bail with an error if schema versions don't match
self.manifest = WritableManifest.read_and_check_versions(str(manifest_path))
except IncompatibleSchemaException as exc:
exc.add_filename(str(manifest_path))
raise
@@ -22,7 +23,8 @@ class PreviousState:
results_path = self.path / 'run_results.json'
if results_path.exists() and results_path.is_file():
try:
self.results = RunResultsArtifact.read(str(results_path))
# we want to bail with an error if schema versions don't match
self.results = RunResultsArtifact.read_and_check_versions(str(results_path))
except IncompatibleSchemaException as exc:
exc.add_filename(str(results_path))
raise

View File

@@ -9,6 +9,7 @@ from dbt.clients.system import write_json, read_json
from dbt.exceptions import (
InternalException,
RuntimeException,
IncompatibleSchemaException
)
from dbt.version import __version__
from dbt.events.functions import get_invocation_id
@@ -158,6 +159,8 @@ def get_metadata_env() -> Dict[str, str]:
}
# This is used in the ManifestMetadata, RunResultsMetadata, RunOperationResultMetadata,
# FreshnessMetadata, and CatalogMetadata classes
@dataclasses.dataclass
class BaseArtifactMetadata(dbtClassMixin):
dbt_schema_version: str
@@ -177,6 +180,17 @@ class BaseArtifactMetadata(dbtClassMixin):
return dct
# This is used as a class decorator to set the schema_version in the
# 'dbt_schema_version' class attribute. (It's copied into the metadata objects.)
# Name attributes of SchemaVersion in classes with the 'schema_version' decorator:
# manifest
# run-results
# run-operation-result
# sources
# catalog
# remote-compile-result
# remote-execution-result
# remote-run-result
def schema_version(name: str, version: int):
def inner(cls: Type[VersionedSchema]):
cls.dbt_schema_version = SchemaVersion(
@@ -187,6 +201,7 @@ def schema_version(name: str, version: int):
return inner
# This is used in the ArtifactMixin and RemoteResult classes
@dataclasses.dataclass
class VersionedSchema(dbtClassMixin):
dbt_schema_version: ClassVar[SchemaVersion]
@@ -198,6 +213,30 @@ class VersionedSchema(dbtClassMixin):
result['$id'] = str(cls.dbt_schema_version)
return result
@classmethod
def read_and_check_versions(cls, path: str):
try:
data = read_json(path)
except (EnvironmentError, ValueError) as exc:
raise RuntimeException(
f'Could not read {cls.__name__} at "{path}" as JSON: {exc}'
) from exc
# Check metadata version. There is a class variable 'dbt_schema_version', but
# that doesn't show up in artifacts, where it only exists in the 'metadata'
# dictionary.
if hasattr(cls, 'dbt_schema_version'):
if 'metadata' in data and 'dbt_schema_version' in data['metadata']:
previous_schema_version = data['metadata']['dbt_schema_version']
# cls.dbt_schema_version is a SchemaVersion object
if str(cls.dbt_schema_version) != previous_schema_version:
raise IncompatibleSchemaException(
expected=str(cls.dbt_schema_version),
found=previous_schema_version
)
return cls.from_dict(data) # type: ignore
T = TypeVar('T', bound='ArtifactMixin')
@@ -205,6 +244,8 @@ T = TypeVar('T', bound='ArtifactMixin')
# metadata should really be a Generic[T_M] where T_M is a TypeVar bound to
# BaseArtifactMetadata. Unfortunately this isn't possible due to a mypy issue:
# https://github.com/python/mypy/issues/7520
# This is used in the WritableManifest, RunResultsArtifact, RunOperationResultsArtifact,
# and CatalogArtifact
@dataclasses.dataclass(init=False)
class ArtifactMixin(VersionedSchema, Writable, Readable):
metadata: BaseArtifactMetadata

View File

@@ -36,9 +36,9 @@ class DBTDeprecation:
if self.name not in active_deprecations:
desc = self.description.format(**kwargs)
msg = ui.line_wrap_message(
desc, prefix='* Deprecation Warning:\n\n'
desc, prefix='Deprecated functionality\n\n'
)
dbt.exceptions.warn_or_error(msg)
dbt.exceptions.warn_or_error(msg, log_fmt=ui.warning_tag('{}'))
self.track_deprecation_warn()
active_deprecations.add(self.name)
@@ -62,7 +62,7 @@ class PackageInstallPathDeprecation(DBTDeprecation):
class ConfigPathDeprecation(DBTDeprecation):
_description = '''\
The `{deprecated_path}` config has been deprecated in favor of `{exp_path}`.
The `{deprecated_path}` config has been renamed to `{exp_path}`.
Please update your `dbt_project.yml` configuration to reflect this change.
'''

View File

@@ -1,4 +1,5 @@
import os
import functools
from typing import List
from dbt import semver
@@ -14,6 +15,7 @@ from dbt.exceptions import (
DependencyException,
package_not_found,
)
from dbt.utils import _connection_exception_retry as connection_exception_retry
class RegistryPackageMixin:
@@ -68,9 +70,28 @@ class RegistryPinnedPackage(RegistryPackageMixin, PinnedPackage):
system.make_directory(os.path.dirname(tar_path))
download_url = metadata.downloads.tarball
system.download_with_retries(download_url, tar_path)
deps_path = project.packages_install_path
package_name = self.get_project_name(project, renderer)
download_untar_fn = functools.partial(
self.download_and_untar,
download_url,
tar_path,
deps_path,
package_name
)
connection_exception_retry(download_untar_fn, 5)
def download_and_untar(self, download_url, tar_path, deps_path, package_name):
"""
Sometimes the download of the files fails and we want to retry. Sometimes the
download appears successful but the file did not make it through as expected
(generally due to a github incident). Either way we want to retry downloading
and untarring to see if we can get a success. Call this within
`_connection_exception_retry`
"""
system.download(download_url, tar_path)
system.untar_package(tar_path, deps_path, package_name)

View File

@@ -6,7 +6,53 @@ The Events module is the implmentation for structured logging. These events repr
The event module provides types that represent what is happening in dbt in `events.types`. These types are intended to represent an exhaustive list of all things happening within dbt that will need to be logged, streamed, or printed. To fire an event, `events.functions::fire_event` is the entry point to the module from everywhere in dbt.
# Adding a New Event
In `events.types` add a new class that represents the new event. This may be a simple class with no values, or it may be a dataclass with some values to construct downstream messaging. Only include the data necessary to construct this message within this class. You must extend all destinations (e.g. - if your log message belongs on the cli, extend `CliEventABC`) as well as the loglevel this event belongs to.
In `events.types` add a new class that represents the new event. All events must be a dataclass with, at minimum, a code. You may also include some other values to construct downstream messaging. Only include the data necessary to construct this message within this class. You must extend all destinations (e.g. - if your log message belongs on the cli, extend `Cli`) as well as the loglevel this event belongs to. This system has been designed to take full advantage of mypy so running it will catch anything you may miss.
## Required for Every Event
- a string attribute `code`, that's unique across events
- assign a log level by extending `DebugLevel`, `InfoLevel`, `WarnLevel`, or `ErrorLevel`
- a message()
- extend `File` and/or `Cli` based on where it should output
Example
```
@dataclass
class PartialParsingDeletedExposure(DebugLevel, Cli, File):
unique_id: str
code: str = "I049"
def message(self) -> str:
return f"Partial parsing: deleted exposure {self.unique_id}"
```
## Optional (based on your event)
- Events associated with node status changes must have `report_node_data` passed in and be extended with `NodeInfo`
- define `asdict` if your data is not serializable to json
Example
```
@dataclass
class SuperImportantNodeEvent(InfoLevel, File, NodeInfo):
node_name: str
run_result: RunResult
report_node_data: ParsedModelNode # may vary
code: str = "Q036"
def message(self) -> str:
return f"{self.node_name} had overly verbose result of {run_result}"
@classmethod
def asdict(cls, data: list) -> dict:
return dict((k, str(v)) for k, v in data)
```
All values other than `code` and `report_node_data` will be included in the `data` node of the json log output.
Once your event has been added, add a dummy call to your new event at the bottom of `types.py` and also add your new Event to the list `sample_values` in `test/unit/test_events.py'.
# Adapter Maintainers
To integrate existing log messages from adapters, you likely have a line of code like this in your adapter already:

View File

@@ -1,7 +1,6 @@
from abc import ABCMeta, abstractmethod, abstractproperty
from dataclasses import dataclass
from datetime import datetime
import json
import os
import threading
from typing import Any, Optional
@@ -38,6 +37,11 @@ class ErrorLevel():
return "error"
class Cache():
# Events with this class will only be logged when the `--log-cache-events` flag is passed
pass
@dataclass
class Node():
node_path: str
@@ -70,6 +74,7 @@ class Event(metaclass=ABCMeta):
# fields that should be on all events with their default implementations
log_version: int = 1
ts: Optional[datetime] = None # use getter for non-optional
ts_rfc3339: Optional[str] = None # use getter for non-optional
pid: Optional[int] = None # use getter for non-optional
node_info: Optional[Node]
@@ -91,32 +96,20 @@ class Event(metaclass=ABCMeta):
def message(self) -> str:
raise Exception("msg not implemented for Event")
# override this method to convert non-json serializable fields to json.
# for override examples, see existing concrete types.
#
# there is no type-level mechanism to have mypy enforce json serializability, so we just try
# to serialize and raise an exception at runtime when that fails. This safety mechanism
# only works if we have attempted to serialize every concrete event type in our tests.
def fields_to_json(self, field_value: Any) -> Any:
try:
json.dumps(field_value, sort_keys=True)
return field_value
except TypeError:
val_type = type(field_value).__name__
event_type = type(self).__name__
return Exception(
f"type {val_type} is not serializable to json."
f" First make sure that the call sites for {event_type} match the type hints"
f" and if they do, you can override Event::fields_to_json in {event_type} in"
" types.py to define your own serialization function to any valid json type"
)
# exactly one time stamp per concrete event
def get_ts(self) -> datetime:
if not self.ts:
self.ts = datetime.now()
self.ts = datetime.utcnow()
self.ts_rfc3339 = self.ts.strftime('%Y-%m-%dT%H:%M:%S.%fZ')
return self.ts
# preformatted time stamp
def get_ts_rfc3339(self) -> str:
if not self.ts_rfc3339:
# get_ts() creates the formatted string too so all time logic is centralized
self.get_ts()
return self.ts_rfc3339 # type: ignore
# exactly one pid per concrete event
def get_pid(self) -> int:
if not self.pid:
@@ -132,6 +125,21 @@ class Event(metaclass=ABCMeta):
from dbt.events.functions import get_invocation_id
return get_invocation_id()
# default dict factory for all events. can override on concrete classes.
@classmethod
def asdict(cls, data: list) -> dict:
d = dict()
for k, v in data:
# stringify all exceptions
if isinstance(v, Exception) or isinstance(v, BaseException):
d[k] = str(v)
# skip all binary data
elif isinstance(v, bytes):
continue
else:
d[k] = v
return d
@dataclass # type: ignore
class NodeInfo(Event, metaclass=ABCMeta):
@@ -143,7 +151,7 @@ class NodeInfo(Event, metaclass=ABCMeta):
node_name=self.report_node_data.name,
unique_id=self.report_node_data.unique_id,
resource_type=self.report_node_data.resource_type.value,
materialized=self.report_node_data.config.materialized,
materialized=self.report_node_data.config.get('materialized'),
node_status=str(self.report_node_data._event_status.get('node_status')),
node_started_at=self.report_node_data._event_status.get("started_at"),
node_finished_at=self.report_node_data._event_status.get("finished_at")

View File

@@ -2,8 +2,8 @@
from colorama import Style
from datetime import datetime
import dbt.events.functions as this # don't worry I hate it too.
from dbt.events.base_types import Cli, Event, File, ShowException, NodeInfo
from dbt.events.types import EventBufferFull, T_Event
from dbt.events.base_types import Cli, Event, File, ShowException, NodeInfo, Cache
from dbt.events.types import EventBufferFull, T_Event, MainReportVersion, EmptyLine
import dbt.flags as flags
# TODO this will need to move eventually
from dbt.logger import SECRET_ENV_PREFIX, make_log_dir_if_missing, GLOBAL_LOGGER
@@ -13,19 +13,21 @@ from io import StringIO, TextIOWrapper
import logbook
import logging
from logging import Logger
import sys
from logging.handlers import RotatingFileHandler
import os
import uuid
import threading
from typing import Any, Callable, Dict, List, Optional, Union
import dataclasses
from collections import deque
# create the global event history buffer with a max size of 100k records
# create the global event history buffer with the default max size (10k)
# python 3.7 doesn't support type hints on globals, but mypy requires them. hence the ignore.
# TODO: make the maxlen something configurable from the command line via args(?)
# TODO the flags module has not yet been resolved when this is created
global EVENT_HISTORY
EVENT_HISTORY = deque(maxlen=100000) # type: ignore
EVENT_HISTORY = deque(maxlen=flags.EVENT_BUFFER_SIZE) # type: ignore
# create the global file logger with no configuration
global FILE_LOG
@@ -38,7 +40,7 @@ FILE_LOG.addHandler(null_handler)
global STDOUT_LOG
STDOUT_LOG = logging.getLogger('default_stdout')
STDOUT_LOG.setLevel(logging.INFO)
stdout_handler = logging.StreamHandler()
stdout_handler = logging.StreamHandler(sys.stdout)
stdout_handler.setLevel(logging.INFO)
STDOUT_LOG.addHandler(stdout_handler)
@@ -48,6 +50,10 @@ invocation_id: Optional[str] = None
def setup_event_logger(log_path, level_override=None):
# flags have been resolved, and log_path is known
global EVENT_HISTORY
EVENT_HISTORY = deque(maxlen=flags.EVENT_BUFFER_SIZE) # type: ignore
make_log_dir_if_missing(log_path)
this.format_json = flags.LOG_FORMAT == 'json'
# USE_COLORS can be None if the app just started and the cli flags
@@ -64,7 +70,7 @@ def setup_event_logger(log_path, level_override=None):
FORMAT = "%(message)s"
stdout_passthrough_formatter = logging.Formatter(fmt=FORMAT)
stdout_handler = logging.StreamHandler()
stdout_handler = logging.StreamHandler(sys.stdout)
stdout_handler.setFormatter(stdout_passthrough_formatter)
stdout_handler.setLevel(level)
# clear existing stdout TextIOWrapper stream handlers
@@ -80,7 +86,12 @@ def setup_event_logger(log_path, level_override=None):
file_passthrough_formatter = logging.Formatter(fmt=FORMAT)
file_handler = RotatingFileHandler(filename=log_dest, encoding='utf8')
file_handler = RotatingFileHandler(
filename=log_dest,
encoding='utf8',
maxBytes=10 * 1024 * 1024, # 10 mb
backupCount=5
)
file_handler.setFormatter(file_passthrough_formatter)
file_handler.setLevel(logging.DEBUG) # always debug regardless of user input
this.FILE_LOG.handlers.clear()
@@ -130,17 +141,25 @@ def event_to_serializable_dict(
) -> Dict[str, Any]:
data = dict()
node_info = dict()
if hasattr(e, '__dataclass_fields__'):
for field, value in dataclasses.asdict(e).items(): # type: ignore[attr-defined]
_json_value = e.fields_to_json(value)
log_line = dict()
try:
log_line = dataclasses.asdict(e, dict_factory=type(e).asdict)
except AttributeError:
event_type = type(e).__name__
raise Exception( # TODO this may hang async threads
f"type {event_type} is not serializable to json."
f" First make sure that the call sites for {event_type} match the type hints"
f" and if they do, you can override the dataclass method `asdict` in {event_type} in"
" types.py to define your own serialization function to a dictionary of valid json"
" types"
)
if isinstance(e, NodeInfo):
node_info = dataclasses.asdict(e.get_node_info())
if isinstance(e, NodeInfo):
node_info = dataclasses.asdict(e.get_node_info())
if not isinstance(_json_value, Exception):
data[field] = _json_value
else:
data[field] = f"JSON_SERIALIZE_FAILED: {type(value).__name__, 'NA'}"
for field, value in log_line.items(): # type: ignore[attr-defined]
if field not in ["code", "report_node_data"]:
data[field] = value
event_dict = {
'type': 'log_line',
@@ -152,7 +171,8 @@ def event_to_serializable_dict(
'data': data,
'invocation_id': e.get_invocation_id(),
'thread_name': e.get_thread_name(),
'node_info': node_info
'node_info': node_info,
'code': e.code
}
return event_dict
@@ -161,35 +181,64 @@ def event_to_serializable_dict(
# translates an Event to a completely formatted text-based log line
# you have to specify which message you want. (i.e. - e.message, e.cli_msg(), e.file_msg())
# type hinting everything as strings so we don't get any unintentional string conversions via str()
def create_text_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
def create_info_text_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
color_tag: str = '' if this.format_color else Style.RESET_ALL
ts: str = e.get_ts().strftime("%H:%M:%S")
scrubbed_msg: str = scrub_secrets(msg_fn(e), env_secrets())
log_line: str = f"{color_tag}{ts} {scrubbed_msg}"
return log_line
def create_debug_text_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
log_line: str = ''
# Create a separator if this is the beginning of an invocation
if type(e) == MainReportVersion:
separator = 30 * '='
log_line = f'\n\n{separator} {e.get_ts()} | {get_invocation_id()} {separator}\n'
color_tag: str = '' if this.format_color else Style.RESET_ALL
ts: str = e.get_ts().strftime("%H:%M:%S.%f")
scrubbed_msg: str = scrub_secrets(msg_fn(e), env_secrets())
level: str = e.level_tag() if len(e.level_tag()) == 5 else f"{e.level_tag()} "
log_line: str = f"{color_tag}{ts} | [ {level} ] | {scrubbed_msg}"
thread = ''
if threading.current_thread().getName():
thread_name = threading.current_thread().getName()
thread_name = thread_name[:10]
thread_name = thread_name.ljust(10, ' ')
thread = f' [{thread_name}]:'
log_line = log_line + f"{color_tag}{ts} [{level}]{thread} {scrubbed_msg}"
return log_line
# translates an Event to a completely formatted json log line
# you have to specify which message you want. (i.e. - e.message(), e.cli_msg(), e.file_msg())
def create_json_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
values = event_to_serializable_dict(e, lambda dt: dt.isoformat(), lambda x: msg_fn(x))
def create_json_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> Optional[str]:
if type(e) == EmptyLine:
return None # will not be sent to logger
# using preformatted string instead of formatting it here to be extra careful about timezone
values = event_to_serializable_dict(e, lambda _: e.get_ts_rfc3339(), lambda x: msg_fn(x))
raw_log_line = json.dumps(values, sort_keys=True)
return scrub_secrets(raw_log_line, env_secrets())
# calls create_text_log_line() or create_json_log_line() according to logger config
def create_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
return (
create_json_log_line(e, msg_fn)
if this.format_json else
create_text_log_line(e, msg_fn)
)
# calls create_stdout_text_log_line() or create_json_log_line() according to logger config
def create_log_line(
e: T_Event,
msg_fn: Callable[[T_Event], str],
file_output=False
) -> Optional[str]:
if this.format_json:
return create_json_log_line(e, msg_fn) # json output, both console and file
elif file_output is True or flags.DEBUG:
return create_debug_text_log_line(e, msg_fn) # default file output
else:
return create_info_text_log_line(e, msg_fn) # console output
# allows for resuse of this obnoxious if else tree.
# do not use for exceptions, it doesn't pass along exc_info, stack_info, or extra
def send_to_logger(l: Union[Logger, logbook.Logger], level_tag: str, log_line: str):
if not log_line:
return
if level_tag == 'test':
# TODO after implmenting #3977 send to new test level
l.debug(log_line)
@@ -257,33 +306,46 @@ def send_exc_to_logger(
)
# an alternative to fire_event which only creates and logs the event value
# if the condition is met. Does nothing otherwise.
def fire_event_if(conditional: bool, lazy_e: Callable[[], Event]) -> None:
if conditional:
fire_event(lazy_e())
# top-level method for accessing the new eventing system
# this is where all the side effects happen branched by event type
# (i.e. - mutating the event history, printing to stdout, logging
# to files, etc.)
def fire_event(e: Event) -> None:
# skip logs when `--log-cache-events` is not passed
if isinstance(e, Cache) and not flags.LOG_CACHE_EVENTS:
return
# if and only if the event history deque will be completely filled by this event
# fire warning that old events are now being dropped
global EVENT_HISTORY
if len(EVENT_HISTORY) == ((EVENT_HISTORY.maxlen or 100000) - 1):
if len(EVENT_HISTORY) == (flags.EVENT_BUFFER_SIZE - 1):
EVENT_HISTORY.append(e)
fire_event(EventBufferFull())
EVENT_HISTORY.append(e)
else:
EVENT_HISTORY.append(e)
# backwards compatibility for plugins that require old logger (dbt-rpc)
if flags.ENABLE_LEGACY_LOGGER:
# using Event::message because the legacy logger didn't differentiate messages by
# destination
log_line = create_log_line(e, msg_fn=lambda x: x.message())
send_to_logger(GLOBAL_LOGGER, e.level_tag(), log_line)
if log_line:
send_to_logger(GLOBAL_LOGGER, e.level_tag(), log_line)
return # exit the function to avoid using the current logger as well
# always logs debug level regardless of user input
if isinstance(e, File):
log_line = create_log_line(e, msg_fn=lambda x: x.file_msg())
log_line = create_log_line(e, msg_fn=lambda x: x.file_msg(), file_output=True)
# doesn't send exceptions to exception logger
send_to_logger(FILE_LOG, level_tag=e.level_tag(), log_line=log_line)
if log_line:
send_to_logger(FILE_LOG, level_tag=e.level_tag(), log_line=log_line)
if isinstance(e, Cli):
# explicitly checking the debug flag here so that potentially expensive-to-construct
@@ -292,18 +354,19 @@ def fire_event(e: Event) -> None:
return # eat the message in case it was one of the expensive ones
log_line = create_log_line(e, msg_fn=lambda x: x.cli_msg())
if not isinstance(e, ShowException):
send_to_logger(STDOUT_LOG, level_tag=e.level_tag(), log_line=log_line)
# CliEventABC and ShowException
else:
send_exc_to_logger(
STDOUT_LOG,
level_tag=e.level_tag(),
log_line=log_line,
exc_info=e.exc_info,
stack_info=e.stack_info,
extra=e.extra
)
if log_line:
if not isinstance(e, ShowException):
send_to_logger(STDOUT_LOG, level_tag=e.level_tag(), log_line=log_line)
# CliEventABC and ShowException
else:
send_exc_to_logger(
STDOUT_LOG,
level_tag=e.level_tag(),
log_line=log_line,
exc_info=e.exc_info,
stack_info=e.stack_info,
extra=e.extra
)
def get_invocation_id() -> str:

View File

@@ -1,16 +1,16 @@
import argparse
from dataclasses import dataclass
from dbt.adapters.reference_keys import _make_key, _ReferenceKey
from dbt.events.stubs import (
_CachedRelation,
BaseRelation,
ParsedModelNode,
ParsedHookNode,
_ReferenceKey,
ParsedModelNode,
RunResult
)
from dbt import ui
from dbt.events.base_types import (
Cli, Event, File, DebugLevel, InfoLevel, WarnLevel, ErrorLevel, ShowException, NodeInfo
Cli, Event, File, DebugLevel, InfoLevel, WarnLevel, ErrorLevel, ShowException, NodeInfo, Cache
)
from dbt.events.format import format_fancy_output_line, pluralize
from dbt.node_types import NodeType
@@ -115,14 +115,6 @@ class MainEncounteredError(ErrorLevel, Cli):
def message(self) -> str:
return f"Encountered an error:\n{str(self.e)}"
# overriding default json serialization for this event
def fields_to_json(self, val: Any) -> Any:
# equality on BaseException is not good enough of a comparison here
if isinstance(val, BaseException):
return str(val)
return val
@dataclass
class MainStackTrace(DebugLevel, Cli):
@@ -150,12 +142,9 @@ class MainReportArgs(DebugLevel, Cli, File):
def message(self):
return f"running dbt with arguments {str(self.args)}"
# overriding default json serialization for this event
def fields_to_json(self, val: Any) -> Any:
if isinstance(val, argparse.Namespace):
return str(val)
return val
@classmethod
def asdict(cls, data: list) -> dict:
return dict((k, str(v)) for k, v in data)
@dataclass
@@ -312,6 +301,25 @@ class GitProgressCheckedOutAt(DebugLevel, Cli, File):
return f" Checked out at {self.end_sha}."
@dataclass
class RegistryIndexProgressMakingGETRequest(DebugLevel, Cli, File):
url: str
code: str = "M022"
def message(self) -> str:
return f"Making package index registry request: GET {self.url}"
@dataclass
class RegistryIndexProgressGETResponse(DebugLevel, Cli, File):
url: str
resp_code: int
code: str = "M023"
def message(self) -> str:
return f"Response from registry index: GET {self.url} {self.resp_code}"
@dataclass
class RegistryProgressMakingGETRequest(DebugLevel, Cli, File):
url: str
@@ -331,6 +339,45 @@ class RegistryProgressGETResponse(DebugLevel, Cli, File):
return f"Response from registry: GET {self.url} {self.resp_code}"
@dataclass
class RegistryResponseUnexpectedType(DebugLevel, File):
response: str
code: str = "M024"
def message(self) -> str:
return f"Response was None: {self.response}"
@dataclass
class RegistryResponseMissingTopKeys(DebugLevel, File):
response: str
code: str = "M025"
def message(self) -> str:
# expected/actual keys logged in exception
return f"Response missing top level keys: {self.response}"
@dataclass
class RegistryResponseMissingNestedKeys(DebugLevel, File):
response: str
code: str = "M026"
def message(self) -> str:
# expected/actual keys logged in exception
return f"Response missing nested keys: {self.response}"
@dataclass
class RegistryResponseExtraNestedKeys(DebugLevel, File):
response: str
code: str = "M027"
def message(self) -> str:
# expected/actual keys logged in exception
return f"Response contained inconsistent keys: {self.response}"
# TODO this was actually `logger.exception(...)` not `logger.error(...)`
@dataclass
class SystemErrorRetrievingModTime(ErrorLevel, Cli, File):
@@ -354,13 +401,6 @@ class SystemCouldNotWrite(DebugLevel, Cli, File):
f"{self.reason}\nexception: {self.exc}"
)
# overriding default json serialization for this event
def fields_to_json(self, val: Any) -> Any:
if val == self.exc:
return str(val)
return val
@dataclass
class SystemExecutingCmd(DebugLevel, Cli, File):
@@ -397,40 +437,6 @@ class SystemReportReturnCode(DebugLevel, Cli, File):
def message(self) -> str:
return f"command return code={self.returncode}"
# TODO remove?? Not called outside of this file
@dataclass
class SelectorAlertUpto3UnusedNodes(InfoLevel, Cli, File):
node_names: List[str]
code: str = "I_NEED_A_CODE_5"
def message(self) -> str:
summary_nodes_str = ("\n - ").join(self.node_names[:3])
and_more_str = (
f"\n - and {len(self.node_names) - 3} more" if len(self.node_names) > 4 else ""
)
return (
f"\nSome tests were excluded because at least one parent is not selected. "
f"Use the --greedy flag to include them."
f"\n - {summary_nodes_str}{and_more_str}"
)
# TODO remove?? Not called outside of this file
@dataclass
class SelectorAlertAllUnusedNodes(DebugLevel, Cli, File):
node_names: List[str]
code: str = "I_NEED_A_CODE_6"
def message(self) -> str:
debug_nodes_str = ("\n - ").join(self.node_names)
return (
f"Full list of tests that were excluded:"
f"\n - {debug_nodes_str}"
)
@dataclass
class SelectorReportInvalidSelector(InfoLevel, Cli, File):
@@ -542,7 +548,7 @@ class Rollback(DebugLevel, Cli, File):
@dataclass
class CacheMiss(DebugLevel, Cli, File):
conn_name: Any # TODO mypy says this is `Callable[[], str]`?? ¯\_(ツ)_/¯
conn_name: str
database: Optional[str]
schema: str
code: str = "E013"
@@ -558,12 +564,20 @@ class CacheMiss(DebugLevel, Cli, File):
class ListRelations(DebugLevel, Cli, File):
database: Optional[str]
schema: str
relations: List[BaseRelation]
relations: List[_ReferenceKey]
code: str = "E014"
def message(self) -> str:
return f"with database={self.database}, schema={self.schema}, relations={self.relations}"
@classmethod
def asdict(cls, data: list) -> dict:
d = dict()
for k, v in data:
if type(v) == list:
d[k] = [str(x) for x in v]
return d
@dataclass
class ConnectionUsed(DebugLevel, Cli, File):
@@ -587,7 +601,7 @@ class SQLQuery(DebugLevel, Cli, File):
@dataclass
class SQLQueryStatus(DebugLevel, Cli, File):
status: str # could include AdapterResponse if we resolve circular imports
status: str
elapsed: float
code: str = "E017"
@@ -617,7 +631,7 @@ class ColTypeChange(DebugLevel, Cli, File):
@dataclass
class SchemaCreation(DebugLevel, Cli, File):
relation: BaseRelation
relation: _ReferenceKey
code: str = "E020"
def message(self) -> str:
@@ -626,17 +640,21 @@ class SchemaCreation(DebugLevel, Cli, File):
@dataclass
class SchemaDrop(DebugLevel, Cli, File):
relation: BaseRelation
relation: _ReferenceKey
code: str = "E021"
def message(self) -> str:
return f'Dropping schema "{self.relation}".'
@classmethod
def asdict(cls, data: list) -> dict:
return dict((k, str(v)) for k, v in data)
# TODO pretty sure this is only ever called in dead code
# see: core/dbt/adapters/cache.py _add_link vs add_link
@dataclass
class UncachedRelation(DebugLevel, Cli, File):
class UncachedRelation(DebugLevel, Cli, File, Cache):
dep_key: _ReferenceKey
ref_key: _ReferenceKey
code: str = "E022"
@@ -650,7 +668,7 @@ class UncachedRelation(DebugLevel, Cli, File):
@dataclass
class AddLink(DebugLevel, Cli, File):
class AddLink(DebugLevel, Cli, File, Cache):
dep_key: _ReferenceKey
ref_key: _ReferenceKey
code: str = "E023"
@@ -660,23 +678,16 @@ class AddLink(DebugLevel, Cli, File):
@dataclass
class AddRelation(DebugLevel, Cli, File):
relation: _CachedRelation
class AddRelation(DebugLevel, Cli, File, Cache):
relation: _ReferenceKey
code: str = "E024"
def message(self) -> str:
return f"Adding relation: {str(self.relation)}"
# overriding default json serialization for this event
def fields_to_json(self, val: Any) -> Any:
if isinstance(val, _CachedRelation):
return str(val)
return val
@dataclass
class DropMissingRelation(DebugLevel, Cli, File):
class DropMissingRelation(DebugLevel, Cli, File, Cache):
relation: _ReferenceKey
code: str = "E025"
@@ -685,7 +696,7 @@ class DropMissingRelation(DebugLevel, Cli, File):
@dataclass
class DropCascade(DebugLevel, Cli, File):
class DropCascade(DebugLevel, Cli, File, Cache):
dropped: _ReferenceKey
consequences: Set[_ReferenceKey]
code: str = "E026"
@@ -693,9 +704,19 @@ class DropCascade(DebugLevel, Cli, File):
def message(self) -> str:
return f"drop {self.dropped} is cascading to {self.consequences}"
@classmethod
def asdict(cls, data: list) -> dict:
d = dict()
for k, v in data:
if isinstance(v, list):
d[k] = [str(x) for x in v]
else:
d[k] = str(v) # type: ignore
return d
@dataclass
class DropRelation(DebugLevel, Cli, File):
class DropRelation(DebugLevel, Cli, File, Cache):
dropped: _ReferenceKey
code: str = "E027"
@@ -704,7 +725,7 @@ class DropRelation(DebugLevel, Cli, File):
@dataclass
class UpdateReference(DebugLevel, Cli, File):
class UpdateReference(DebugLevel, Cli, File, Cache):
old_key: _ReferenceKey
new_key: _ReferenceKey
cached_key: _ReferenceKey
@@ -716,7 +737,7 @@ class UpdateReference(DebugLevel, Cli, File):
@dataclass
class TemporaryRelation(DebugLevel, Cli, File):
class TemporaryRelation(DebugLevel, Cli, File, Cache):
key: _ReferenceKey
code: str = "E029"
@@ -725,7 +746,7 @@ class TemporaryRelation(DebugLevel, Cli, File):
@dataclass
class RenameSchema(DebugLevel, Cli, File):
class RenameSchema(DebugLevel, Cli, File, Cache):
old_key: _ReferenceKey
new_key: _ReferenceKey
code: str = "E030"
@@ -735,8 +756,8 @@ class RenameSchema(DebugLevel, Cli, File):
@dataclass
class DumpBeforeAddGraph(DebugLevel, Cli, File):
# large value. delay not necessary since every debug level message is logged anyway.
class DumpBeforeAddGraph(DebugLevel, Cli, File, Cache):
# large value. delay creation with fire_event_if.
dump: Dict[str, List[str]]
code: str = "E031"
@@ -745,8 +766,8 @@ class DumpBeforeAddGraph(DebugLevel, Cli, File):
@dataclass
class DumpAfterAddGraph(DebugLevel, Cli, File):
# large value. delay not necessary since every debug level message is logged anyway.
class DumpAfterAddGraph(DebugLevel, Cli, File, Cache):
# large value. delay creation with fire_event_if.
dump: Dict[str, List[str]]
code: str = "E032"
@@ -755,8 +776,8 @@ class DumpAfterAddGraph(DebugLevel, Cli, File):
@dataclass
class DumpBeforeRenameSchema(DebugLevel, Cli, File):
# large value. delay not necessary since every debug level message is logged anyway.
class DumpBeforeRenameSchema(DebugLevel, Cli, File, Cache):
# large value. delay creation with fire_event_if.
dump: Dict[str, List[str]]
code: str = "E033"
@@ -765,8 +786,8 @@ class DumpBeforeRenameSchema(DebugLevel, Cli, File):
@dataclass
class DumpAfterRenameSchema(DebugLevel, Cli, File):
# large value. delay not necessary since every debug level message is logged anyway.
class DumpAfterRenameSchema(DebugLevel, Cli, File, Cache):
# large value. delay creation with fire_event_if.
dump: Dict[str, List[str]]
code: str = "E034"
@@ -782,11 +803,9 @@ class AdapterImportError(InfoLevel, Cli, File):
def message(self) -> str:
return f"Error importing adapter: {self.exc}"
def fields_to_json(self, val: Any) -> Any:
if val == self.exc:
return str(val())
return val
@classmethod
def asdict(cls, data: list) -> dict:
return dict((k, str(v)) for k, v in data)
@dataclass
@@ -834,114 +853,12 @@ class MissingProfileTarget(InfoLevel, Cli, File):
return f"target not specified in profile '{self.profile_name}', using '{self.target_name}'"
@dataclass
class ProfileLoadError(ShowException, DebugLevel, Cli, File):
exc: Exception
code: str = "A006"
def message(self) -> str:
return f"Profile not loaded due to error: {self.exc}"
def fields_to_json(self, val: Any) -> Any:
if val == self.exc:
return str(val)
return val
@dataclass
class ProfileNotFound(InfoLevel, Cli, File):
profile_name: Optional[str]
code: str = "A007"
def message(self) -> str:
return f'No profile "{self.profile_name}" found, continuing with no target'
@dataclass
class InvalidVarsYAML(ErrorLevel, Cli, File):
code: str = "A008"
def message(self) -> str:
return "The YAML provided in the --vars argument is not valid.\n"
# TODO: Remove? (appears to be uncalled)
@dataclass
class CatchRunException(ShowException, DebugLevel, Cli, File):
build_path: Any
exc: Exception
code: str = "I_NEED_A_CODE_1"
def message(self) -> str:
INTERNAL_ERROR_STRING = """This is an error in dbt. Please try again. If the \
error persists, open an issue at https://github.com/dbt-labs/dbt-core
""".strip()
prefix = f'Internal error executing {self.build_path}'
error = "{prefix}\n{error}\n\n{note}".format(
prefix=ui.red(prefix),
error=str(self.exc).strip(),
note=INTERNAL_ERROR_STRING
)
return error
def fields_to_json(self, val: Any) -> Any:
if val == self.exc:
return str(val)
return val
# TODO: Remove? (appears to be uncalled)
@dataclass
class HandleInternalException(ShowException, DebugLevel, Cli, File):
exc: Exception
code: str = "I_NEED_A_CODE_2"
def message(self) -> str:
return str(self.exc)
def fields_to_json(self, val: Any) -> Any:
if val == self.exc:
return str(val)
return val
# TODO: Remove? (appears to be uncalled)
@dataclass
class MessageHandleGenericException(ErrorLevel, Cli, File):
build_path: str
unique_id: str
exc: Exception
code: str = "I_NEED_A_CODE_3"
def message(self) -> str:
node_description = self.build_path
if node_description is None:
node_description = self.unique_id
prefix = "Unhandled error while executing {}".format(node_description)
return "{prefix}\n{error}".format(
prefix=ui.red(prefix),
error=str(self.exc).strip()
)
def fields_to_json(self, val: Any) -> Any:
if val == self.exc:
return str(val)
return val
# TODO: Remove? (appears to be uncalled)
@dataclass
class DetailsHandleGenericException(ShowException, DebugLevel, Cli, File):
code: str = "I_NEED_A_CODE_4"
def message(self) -> str:
return ''
return "The YAML provided in the --vars argument is not valid."
@dataclass
@@ -1110,12 +1027,6 @@ class ParsedFileLoadFailed(ShowException, DebugLevel, Cli, File):
def message(self) -> str:
return f"Failed to load parsed file from disk at {self.path}: {self.exc}"
def fields_to_json(self, val: Any) -> Any:
if val == self.exc:
return str(val)
return val
@dataclass
class PartialParseSaveFileNotFound(InfoLevel, Cli, File):
@@ -1313,12 +1224,12 @@ class InvalidDisabledSourceInTestNode(WarnLevel, Cli, File):
@dataclass
class InvalidRefInTestNode(WarnLevel, Cli, File):
class InvalidRefInTestNode(DebugLevel, Cli, File):
msg: str
code: str = "I051"
def message(self) -> str:
return ui.warning_tag(self.msg)
return self.msg
@dataclass
@@ -1329,12 +1240,6 @@ class RunningOperationCaughtError(ErrorLevel, Cli, File):
def message(self) -> str:
return f'Encountered an error while running operation: {self.exc}'
def fields_to_json(self, val: Any) -> Any:
if val == self.exc:
return str(val)
return val
@dataclass
class RunningOperationUncaughtError(ErrorLevel, Cli, File):
@@ -1344,12 +1249,6 @@ class RunningOperationUncaughtError(ErrorLevel, Cli, File):
def message(self) -> str:
return f'Encountered an error while running operation: {self.exc}'
def fields_to_json(self, val: Any) -> Any:
if val == self.exc:
return str(val)
return val
@dataclass
class DbtProjectError(ErrorLevel, Cli, File):
@@ -1367,12 +1266,6 @@ class DbtProjectErrorException(ErrorLevel, Cli, File):
def message(self) -> str:
return f" ERROR: {str(self.exc)}"
def fields_to_json(self, val: Any) -> Any:
if val == self.exc:
return str(val)
return val
@dataclass
class DbtProfileError(ErrorLevel, Cli, File):
@@ -1390,12 +1283,6 @@ class DbtProfileErrorException(ErrorLevel, Cli, File):
def message(self) -> str:
return f" ERROR: {str(self.exc)}"
def fields_to_json(self, val: Any) -> Any:
if val == self.exc:
return str(val)
return val
@dataclass
class ProfileListTitle(InfoLevel, Cli, File):
@@ -1443,12 +1330,6 @@ class CatchableExceptionOnRun(ShowException, DebugLevel, Cli, File):
def message(self) -> str:
return str(self.exc)
def fields_to_json(self, val: Any) -> Any:
if val == self.exc:
return str(val)
return val
@dataclass
class InternalExceptionOnRun(DebugLevel, Cli, File):
@@ -1469,12 +1350,6 @@ the error persists, open an issue at https://github.com/dbt-labs/dbt-core
note=INTERNAL_ERROR_STRING
)
def fields_to_json(self, val: Any) -> Any:
if val == self.exc:
return str(val)
return val
# This prints the stack trace at the debug level while allowing just the nice exception message
# at the error level - or whatever other level chosen. Used in multiple places.
@@ -1488,9 +1363,9 @@ class PrintDebugStackTrace(ShowException, DebugLevel, Cli, File):
@dataclass
class GenericExceptionOnRun(ErrorLevel, Cli, File):
build_path: str
build_path: Optional[str]
unique_id: str
exc: Exception
exc: str # TODO: make this the actual exception once we have a better searilization strategy
code: str = "W004"
def message(self) -> str:
@@ -1503,12 +1378,6 @@ class GenericExceptionOnRun(ErrorLevel, Cli, File):
error=str(self.exc).strip()
)
def fields_to_json(self, val: Any) -> Any:
if val == self.exc:
return str(val)
return val
@dataclass
class NodeConnectionReleaseError(ShowException, DebugLevel, Cli, File):
@@ -1520,12 +1389,6 @@ class NodeConnectionReleaseError(ShowException, DebugLevel, Cli, File):
return ('Error releasing connection for node {}: {!s}'
.format(self.node_name, self.exc))
def fields_to_json(self, val: Any) -> Any:
if val == self.exc:
return str(val)
return val
@dataclass
class CheckCleanPath(InfoLevel, Cli):
@@ -1591,11 +1454,11 @@ class DepsNoPackagesFound(InfoLevel, Cli, File):
@dataclass
class DepsStartPackageInstall(InfoLevel, Cli, File):
package: str
package_name: str
code: str = "M014"
def message(self) -> str:
return f"Installing {self.package}"
return f"Installing {self.package_name}"
@dataclass
@@ -1639,7 +1502,7 @@ class DepsNotifyUpdatesAvailable(InfoLevel, Cli, File):
code: str = "M019"
def message(self) -> str:
return ('\nUpdates available for packages: {} \
return ('Updates available for packages: {} \
\nUpdate your versions in packages.yml, then run dbt deps'.format(self.packages))
@@ -1756,7 +1619,7 @@ class ServingDocsExitInfo(InfoLevel, Cli, File):
code: str = "Z020"
def message(self) -> str:
return "Press Ctrl+C to exit.\n\n"
return "Press Ctrl+C to exit."
@dataclass
@@ -1807,7 +1670,7 @@ class StatsLine(InfoLevel, Cli, File):
code: str = "Z023"
def message(self) -> str:
stats_line = ("\nDone. PASS={pass} WARN={warn} ERROR={error} SKIP={skip} TOTAL={total}")
stats_line = ("Done. PASS={pass} WARN={warn} ERROR={error} SKIP={skip} TOTAL={total}")
return stats_line.format(**self.stats)
@@ -1846,12 +1709,6 @@ class SQlRunnerException(ShowException, DebugLevel, Cli, File):
def message(self) -> str:
return f"Got an exception: {self.exc}"
def fields_to_json(self, val: Any) -> Any:
if val == self.exc:
return str(val)
return val
@dataclass
class CheckNodeTestFailure(InfoLevel, Cli, File):
@@ -1910,7 +1767,7 @@ class PrintStartLine(InfoLevel, Cli, File, NodeInfo):
index: int
total: int
report_node_data: ParsedModelNode
code: str = "Z031"
code: str = "Q033"
def message(self) -> str:
msg = f"START {self.description}"
@@ -1928,8 +1785,8 @@ class PrintHookStartLine(InfoLevel, Cli, File, NodeInfo):
index: int
total: int
truncate: bool
report_node_data: Any # TODO use ParsedHookNode here
code: str = "Z032"
report_node_data: Any # TODO: resolve ParsedHookNode circular import
code: str = "Q032"
def message(self) -> str:
msg = f"START hook: {self.statement}"
@@ -1948,7 +1805,7 @@ class PrintHookEndLine(InfoLevel, Cli, File, NodeInfo):
total: int
execution_time: int
truncate: bool
report_node_data: Any # TODO use ParsedHookNode here
report_node_data: Any # TODO: resolve ParsedHookNode circular import
code: str = "Q007"
def message(self) -> str:
@@ -1969,7 +1826,7 @@ class SkippingDetails(InfoLevel, Cli, File, NodeInfo):
index: int
total: int
report_node_data: ParsedModelNode
code: str = "Z033"
code: str = "Q034"
def message(self) -> str:
if self.resource_type in NodeType.refable():
@@ -2084,7 +1941,7 @@ class PrintModelErrorResultLine(ErrorLevel, Cli, File, NodeInfo):
total: int
execution_time: int
report_node_data: ParsedModelNode
code: str = "Z035"
code: str = "Q035"
def message(self) -> str:
info = "ERROR creating"
@@ -2322,6 +2179,10 @@ class NodeFinished(DebugLevel, Cli, File, NodeInfo):
def message(self) -> str:
return f"Finished running node {self.unique_id}"
@classmethod
def asdict(cls, data: list) -> dict:
return dict((k, str(v)) for k, v in data)
@dataclass
class QueryCancelationUnsupported(InfoLevel, Cli, File):
@@ -2337,11 +2198,12 @@ class QueryCancelationUnsupported(InfoLevel, Cli, File):
@dataclass
class ConcurrencyLine(InfoLevel, Cli, File):
concurrency_line: str
num_threads: int
target_name: str
code: str = "Q026"
def message(self) -> str:
return self.concurrency_line
return f"Concurrency: {self.num_threads} threads (target='{self.target_name}')"
@dataclass
@@ -2577,7 +2439,7 @@ class TrackingInitializeFailure(ShowException, DebugLevel, Cli, File):
class RetryExternalCall(DebugLevel, Cli, File):
attempt: int
max: int
code: str = "Z045"
code: str = "M020"
def message(self) -> str:
return f"Retrying external call. Attempt: {self.attempt} Max attempts: {self.max}"
@@ -2606,12 +2468,6 @@ class GeneralWarningException(WarnLevel, Cli, File):
return self.log_fmt.format(str(self.exc))
return str(self.exc)
def fields_to_json(self, val: Any) -> Any:
if val == self.exc:
return str(val)
return val
@dataclass
class EventBufferFull(WarnLevel, Cli, File):
@@ -2621,6 +2477,15 @@ class EventBufferFull(WarnLevel, Cli, File):
return "Internal event buffer full. Earliest events will be dropped (FIFO)."
@dataclass
class RecordRetryException(DebugLevel, Cli, File):
exc: Exception
code: str = "M021"
def message(self) -> str:
return f"External call exception: {self.exc}"
# since mypy doesn't run on every file we need to suggest to mypy that every
# class gets instantiated. But we don't actually want to run this code.
# making the conditional `if False` causes mypy to skip it as dead code so
@@ -2650,6 +2515,14 @@ if 1 == 0:
GitNothingToDo(sha="")
GitProgressUpdatedCheckoutRange(start_sha="", end_sha="")
GitProgressCheckedOutAt(end_sha="")
RegistryIndexProgressMakingGETRequest(url="")
RegistryIndexProgressGETResponse(url="", resp_code=1234)
RegistryProgressMakingGETRequest(url="")
RegistryProgressGETResponse(url="", resp_code=1234)
RegistryResponseUnexpectedType(response=""),
RegistryResponseMissingTopKeys(response=""),
RegistryResponseMissingNestedKeys(response=""),
RegistryResponseExtraNestedKeys(response=""),
SystemErrorRetrievingModTime(path="")
SystemCouldNotWrite(path="", reason="", exc=Exception(""))
SystemExecutingCmd(cmd=[""])
@@ -2675,8 +2548,8 @@ if 1 == 0:
SQLQueryStatus(status="", elapsed=0.1)
SQLCommit(conn_name="")
ColTypeChange(orig_type="", new_type="", table="")
SchemaCreation(relation=BaseRelation())
SchemaDrop(relation=BaseRelation())
SchemaCreation(relation=_make_key(BaseRelation()))
SchemaDrop(relation=_make_key(BaseRelation()))
UncachedRelation(
dep_key=_ReferenceKey(database="", schema="", identifier=""),
ref_key=_ReferenceKey(database="", schema="", identifier=""),
@@ -2685,7 +2558,7 @@ if 1 == 0:
dep_key=_ReferenceKey(database="", schema="", identifier=""),
ref_key=_ReferenceKey(database="", schema="", identifier=""),
)
AddRelation(relation=_CachedRelation())
AddRelation(relation=_make_key(_CachedRelation()))
DropMissingRelation(relation=_ReferenceKey(database="", schema="", identifier=""))
DropCascade(
dropped=_ReferenceKey(database="", schema="", identifier=""),
@@ -2708,14 +2581,10 @@ if 1 == 0:
AdapterImportError(ModuleNotFoundError())
PluginLoadError()
SystemReportReturnCode(returncode=0)
SelectorAlertUpto3UnusedNodes(node_names=[])
SelectorAlertAllUnusedNodes(node_names=[])
NewConnectionOpening(connection_state='')
TimingInfoCollected()
MergedFromState(nbr_merged=0, sample=[])
MissingProfileTarget(profile_name='', target_name='')
ProfileLoadError(exc=Exception(''))
ProfileNotFound(profile_name='')
InvalidVarsYAML()
GenericTestFileParse(path='')
MacroFileParse(path='')
@@ -2755,8 +2624,6 @@ if 1 == 0:
PartialParsingDeletedExposure(unique_id='')
InvalidDisabledSourceInTestNode(msg='')
InvalidRefInTestNode(msg='')
MessageHandleGenericException(build_path='', unique_id='', exc=Exception(''))
DetailsHandleGenericException()
RunningOperationCaughtError(exc=Exception(''))
RunningOperationUncaughtError(exc=Exception(''))
DbtProjectError()
@@ -2769,7 +2636,7 @@ if 1 == 0:
ProfileHelpMessage()
CatchableExceptionOnRun(exc=Exception(''))
InternalExceptionOnRun(build_path='', exc=Exception(''))
GenericExceptionOnRun(build_path='', unique_id='', exc=Exception(''))
GenericExceptionOnRun(build_path='', unique_id='', exc='')
NodeConnectionReleaseError(node_name='', exc=Exception(''))
CheckCleanPath(path='')
ConfirmCleanPath(path='')
@@ -2777,7 +2644,7 @@ if 1 == 0:
FinishedCleanPaths()
OpenCommand(open_cmd='', profiles_dir='')
DepsNoPackagesFound()
DepsStartPackageInstall(package='')
DepsStartPackageInstall(package_name='')
DepsInstallInfo(version_name='')
DepsUpdateAvailable(version_latest='')
DepsListSubdirectory(subdirectory='')
@@ -2952,7 +2819,7 @@ if 1 == 0:
NodeStart(report_node_data=ParsedModelNode(), unique_id='')
NodeFinished(report_node_data=ParsedModelNode(), unique_id='', run_result=RunResult())
QueryCancelationUnsupported(type='')
ConcurrencyLine(concurrency_line='')
ConcurrencyLine(num_threads=0, target_name='')
NodeCompiling(report_node_data=ParsedModelNode(), unique_id='')
NodeExecuting(report_node_data=ParsedModelNode(), unique_id='')
StarterProjectPath(dir='')
@@ -2982,3 +2849,4 @@ if 1 == 0:
GeneralWarningMsg(msg='', log_fmt='')
GeneralWarningException(exc=Exception(''), log_fmt='')
EventBufferFull()
RecordRetryException(exc=Exception(""))

File diff suppressed because it is too large Load Diff

View File

@@ -33,6 +33,8 @@ SEND_ANONYMOUS_USAGE_STATS = None
PRINTER_WIDTH = 80
WHICH = None
INDIRECT_SELECTION = None
LOG_CACHE_EVENTS = None
EVENT_BUFFER_SIZE = 100000
# Global CLI defaults. These flags are set from three places:
# CLI args, environment variables, and user_config (profiles.yml).
@@ -51,7 +53,9 @@ flag_defaults = {
"FAIL_FAST": False,
"SEND_ANONYMOUS_USAGE_STATS": True,
"PRINTER_WIDTH": 80,
"INDIRECT_SELECTION": 'eager'
"INDIRECT_SELECTION": 'eager',
"LOG_CACHE_EVENTS": False,
"EVENT_BUFFER_SIZE": 100000
}
@@ -99,7 +103,7 @@ def set_from_args(args, user_config):
USE_EXPERIMENTAL_PARSER, STATIC_PARSER, WRITE_JSON, PARTIAL_PARSE, \
USE_COLORS, STORE_FAILURES, PROFILES_DIR, DEBUG, LOG_FORMAT, INDIRECT_SELECTION, \
VERSION_CHECK, FAIL_FAST, SEND_ANONYMOUS_USAGE_STATS, PRINTER_WIDTH, \
WHICH
WHICH, LOG_CACHE_EVENTS, EVENT_BUFFER_SIZE
STRICT_MODE = False # backwards compatibility
# cli args without user_config or env var option
@@ -122,6 +126,8 @@ def set_from_args(args, user_config):
SEND_ANONYMOUS_USAGE_STATS = get_flag_value('SEND_ANONYMOUS_USAGE_STATS', args, user_config)
PRINTER_WIDTH = get_flag_value('PRINTER_WIDTH', args, user_config)
INDIRECT_SELECTION = get_flag_value('INDIRECT_SELECTION', args, user_config)
LOG_CACHE_EVENTS = get_flag_value('LOG_CACHE_EVENTS', args, user_config)
EVENT_BUFFER_SIZE = get_flag_value('EVENT_BUFFER_SIZE', args, user_config)
def get_flag_value(flag, args, user_config):
@@ -134,7 +140,13 @@ def get_flag_value(flag, args, user_config):
if env_value is not None and env_value != '':
env_value = env_value.lower()
# non Boolean values
if flag in ['LOG_FORMAT', 'PRINTER_WIDTH', 'PROFILES_DIR', 'INDIRECT_SELECTION']:
if flag in [
'LOG_FORMAT',
'PRINTER_WIDTH',
'PROFILES_DIR',
'INDIRECT_SELECTION',
'EVENT_BUFFER_SIZE'
]:
flag_value = env_value
else:
flag_value = env_set_bool(env_value)
@@ -142,7 +154,7 @@ def get_flag_value(flag, args, user_config):
flag_value = getattr(user_config, lc_flag)
else:
flag_value = flag_defaults[flag]
if flag == 'PRINTER_WIDTH': # printer_width must be an int or it hangs
if flag in ['PRINTER_WIDTH', 'EVENT_BUFFER_SIZE']: # must be ints
flag_value = int(flag_value)
if flag == 'PROFILES_DIR':
flag_value = os.path.abspath(flag_value)
@@ -165,5 +177,7 @@ def get_flag_dict():
"fail_fast": FAIL_FAST,
"send_anonymous_usage_stats": SEND_ANONYMOUS_USAGE_STATS,
"printer_width": PRINTER_WIDTH,
"indirect_selection": INDIRECT_SELECTION
"indirect_selection": INDIRECT_SELECTION,
"log_cache_events": LOG_CACHE_EVENTS,
"event_buffer_size": EVENT_BUFFER_SIZE
}

View File

@@ -1,7 +1,7 @@
import abc
from itertools import chain
from pathlib import Path
from typing import Set, List, Dict, Iterator, Tuple, Any, Union, Type, Optional
from typing import Set, List, Dict, Iterator, Tuple, Any, Union, Type, Optional, Callable
from dbt.dataclass_schema import StrEnum
@@ -449,20 +449,24 @@ class StateSelectorMethod(SelectorMethod):
return modified
def recursively_check_macros_modified(self, node, previous_macros):
def recursively_check_macros_modified(self, node, visited_macros):
# loop through all macros that this node depends on
for macro_uid in node.depends_on.macros:
# avoid infinite recursion if we've already seen this macro
if macro_uid in previous_macros:
if macro_uid in visited_macros:
continue
previous_macros.append(macro_uid)
visited_macros.append(macro_uid)
# is this macro one of the modified macros?
if macro_uid in self.modified_macros:
return True
# if not, and this macro depends on other macros, keep looping
macro_node = self.manifest.macros[macro_uid]
if len(macro_node.depends_on.macros) > 0:
return self.recursively_check_macros_modified(macro_node, previous_macros)
return self.recursively_check_macros_modified(macro_node, visited_macros)
# this macro hasn't been modified, but we haven't checked
# the other macros the node depends on, so keep looking
elif len(node.depends_on.macros) > len(visited_macros):
continue
else:
return False
@@ -475,45 +479,31 @@ class StateSelectorMethod(SelectorMethod):
return False
# recursively loop through upstream macros to see if any is modified
else:
previous_macros = []
return self.recursively_check_macros_modified(node, previous_macros)
visited_macros = []
return self.recursively_check_macros_modified(node, visited_macros)
def check_modified(self, old: Optional[SelectorTarget], new: SelectorTarget) -> bool:
# TODO check modifed_content and check_modified macro seems a bit redundent
def check_modified_content(self, old: Optional[SelectorTarget], new: SelectorTarget) -> bool:
different_contents = not new.same_contents(old) # type: ignore
upstream_macro_change = self.check_macros_modified(new)
return different_contents or upstream_macro_change
def check_modified_body(self, old: Optional[SelectorTarget], new: SelectorTarget) -> bool:
if hasattr(new, "same_body"):
return not new.same_body(old) # type: ignore
else:
return False
def check_modified_configs(self, old: Optional[SelectorTarget], new: SelectorTarget) -> bool:
if hasattr(new, "same_config"):
return not new.same_config(old) # type: ignore
else:
return False
def check_modified_persisted_descriptions(
self, old: Optional[SelectorTarget], new: SelectorTarget
) -> bool:
if hasattr(new, "same_persisted_description"):
return not new.same_persisted_description(old) # type: ignore
else:
return False
def check_modified_relation(
self, old: Optional[SelectorTarget], new: SelectorTarget
) -> bool:
if hasattr(new, "same_database_representation"):
return not new.same_database_representation(old) # type: ignore
else:
return False
def check_modified_macros(self, _, new: SelectorTarget) -> bool:
return self.check_macros_modified(new)
@staticmethod
def check_modified_factory(
compare_method: str
) -> Callable[[Optional[SelectorTarget], SelectorTarget], bool]:
# get a function that compares two selector target based on compare method provided
def check_modified_things(old: Optional[SelectorTarget], new: SelectorTarget) -> bool:
if hasattr(new, compare_method):
# when old body does not exist or old and new are not the same
return not old or not getattr(new, compare_method)(old) # type: ignore
else:
return False
return check_modified_things
def check_new(self, old: Optional[SelectorTarget], new: SelectorTarget) -> bool:
return old is None
@@ -527,14 +517,21 @@ class StateSelectorMethod(SelectorMethod):
state_checks = {
# it's new if there is no old version
'new': lambda old, _: old is None,
'new':
lambda old, _: old is None,
# use methods defined above to compare properties of old + new
'modified': self.check_modified,
'modified.body': self.check_modified_body,
'modified.configs': self.check_modified_configs,
'modified.persisted_descriptions': self.check_modified_persisted_descriptions,
'modified.relation': self.check_modified_relation,
'modified.macros': self.check_modified_macros,
'modified':
self.check_modified_content,
'modified.body':
self.check_modified_factory('same_body'),
'modified.configs':
self.check_modified_factory('same_config'),
'modified.persisted_descriptions':
self.check_modified_factory('same_persisted_description'),
'modified.relation':
self.check_modified_factory('same_database_representation'),
'modified.macros':
self.check_modified_macros,
}
if selector in state_checks:
checker = state_checks[selector]

View File

@@ -93,3 +93,10 @@ dbtClassMixin.register_field_encoders({
FQNPath = Tuple[str, ...]
PathSet = AbstractSet[FQNPath]
# This class is used in to_target_dict, so that accesses to missing keys
# will return an empty string instead of Undefined
class DictDefaultEmptyStr(dict):
def __getitem__(self, key):
return dict.get(self, key, "")

View File

@@ -35,7 +35,7 @@ Note that you can also right-click on models to interactively filter and explore
### More information
- [What is dbt](https://docs.getdbt.com/docs/overview)?
- [What is dbt](https://docs.getdbt.com/docs/introduction)?
- Read the [dbt viewpoint](https://docs.getdbt.com/docs/viewpoint)
- [Installation](https://docs.getdbt.com/docs/installation)
- Join the [dbt Community](https://www.getdbt.com/community/) for questions and discussion

File diff suppressed because one or more lines are too long

View File

@@ -424,7 +424,7 @@ class DelayedFileHandler(logbook.RotatingFileHandler, FormatterMixin):
return
make_log_dir_if_missing(log_dir)
log_path = os.path.join(log_dir, 'dbt.log.old') # TODO hack for now
log_path = os.path.join(log_dir, 'dbt.log.legacy') # TODO hack for now
self._super_init(log_path)
self._replay_buffered()
self._log_path = log_path

View File

@@ -221,24 +221,22 @@ def track_run(task):
def run_from_args(parsed):
log_cache_events(getattr(parsed, 'log_cache_events', False))
# we can now use the logger for stdout
# set log_format in the logger
# if 'list' task: set stdout to WARN instead of INFO
level_override = parsed.cls.pre_init_hook(parsed)
fire_event(MainReportVersion(v=str(dbt.version.installed)))
# this will convert DbtConfigErrors into RuntimeExceptions
# task could be any one of the task objects
task = parsed.cls.from_args(args=parsed)
fire_event(MainReportArgs(args=parsed))
# Set up logging
log_path = None
if task.config is not None:
log_path = getattr(task.config, 'log_path', None)
# we can finally set the file logger up
log_manager.set_path(log_path)
# if 'list' task: set stdout to WARN instead of INFO
level_override = parsed.cls.pre_init_hook(parsed)
setup_event_logger(log_path or 'logs', level_override)
fire_event(MainReportVersion(v=str(dbt.version.installed)))
fire_event(MainReportArgs(args=parsed))
if dbt.tracking.active_user is not None: # mypy appeasement, always true
fire_event(MainTrackingUserState(dbt.tracking.active_user.state()))
@@ -1078,6 +1076,14 @@ def parse_args(args, cls=DBTArgumentParser):
'''
)
p.add_argument(
'--event-buffer-size',
dest='event_buffer_size',
help='''
Sets the max number of events to buffer in EVENT_HISTORY
'''
)
subs = p.add_subparsers(title="Available sub-commands")
base_subparser = _build_base_subparser()

View File

@@ -246,7 +246,7 @@ class ManifestLoader:
project_parser_files = self.partial_parser.get_parsing_files()
self.partially_parsing = True
self.manifest = self.saved_manifest
except Exception:
except Exception as exc:
# pp_files should still be the full set and manifest is new manifest,
# since get_parsing_files failed
fire_event(PartialParsingFullReparseBecauseOfError())
@@ -284,6 +284,9 @@ class ManifestLoader:
exc_info['full_reparse_reason'] = ReparseReason.exception
dbt.tracking.track_partial_parser(exc_info)
if os.environ.get('DBT_PP_TEST'):
raise exc
if self.manifest._parsing_info is None:
self.manifest._parsing_info = ParsingInfo()

View File

@@ -272,10 +272,10 @@ class PartialParsing:
if self.already_scheduled_for_parsing(old_source_file):
return
# These files only have one node.
unique_id = None
# These files only have one node except for snapshots
unique_ids = []
if old_source_file.nodes:
unique_id = old_source_file.nodes[0]
unique_ids = old_source_file.nodes
else:
# It's not clear when this would actually happen.
# Logging in case there are other associated errors.
@@ -286,7 +286,7 @@ class PartialParsing:
self.deleted_manifest.files[file_id] = old_source_file
self.saved_files[file_id] = deepcopy(new_source_file)
self.add_to_pp_files(new_source_file)
if unique_id:
for unique_id in unique_ids:
self.remove_node_in_saved(new_source_file, unique_id)
def remove_node_in_saved(self, source_file, unique_id):
@@ -315,7 +315,7 @@ class PartialParsing:
if node.patch_path:
file_id = node.patch_path
# it might be changed... then what?
if file_id not in self.file_diff['deleted']:
if file_id not in self.file_diff['deleted'] and file_id in self.saved_files:
# schema_files should already be updated
schema_file = self.saved_files[file_id]
dict_key = parse_file_type_to_key[source_file.parse_file_type]
@@ -358,7 +358,7 @@ class PartialParsing:
if not source_file.nodes:
fire_event(PartialParsingMissingNodes(file_id=source_file.file_id))
return
# There is generally only 1 node for SQL files, except for macros
# There is generally only 1 node for SQL files, except for macros and snapshots
for unique_id in source_file.nodes:
self.remove_node_in_saved(source_file, unique_id)
self.schedule_referencing_nodes_for_parsing(unique_id)
@@ -375,7 +375,7 @@ class PartialParsing:
for unique_id in unique_ids:
if unique_id in self.saved_manifest.nodes:
node = self.saved_manifest.nodes[unique_id]
if node.resource_type == NodeType.Test:
if node.resource_type == NodeType.Test and node.test_node_type == 'generic':
# test nodes are handled separately. Must be removed from schema file
continue
file_id = node.file_id
@@ -435,7 +435,9 @@ class PartialParsing:
self.check_for_special_deleted_macros(source_file)
self.handle_macro_file_links(source_file, follow_references)
file_id = source_file.file_id
self.deleted_manifest.files[file_id] = self.saved_files.pop(file_id)
# It's not clear when this file_id would not exist in saved_files
if file_id in self.saved_files:
self.deleted_manifest.files[file_id] = self.saved_files.pop(file_id)
def check_for_special_deleted_macros(self, source_file):
for unique_id in source_file.macros:
@@ -498,7 +500,9 @@ class PartialParsing:
for unique_id in unique_ids:
if unique_id in self.saved_manifest.nodes:
node = self.saved_manifest.nodes[unique_id]
if node.resource_type == NodeType.Test:
# Both generic tests from yaml files and singular tests have NodeType.Test
# so check for generic test.
if node.resource_type == NodeType.Test and node.test_node_type == 'generic':
schema_file_id = node.file_id
schema_file = self.saved_manifest.files[schema_file_id]
(key, name) = schema_file.get_key_and_name_for_test(node.unique_id)
@@ -670,8 +674,8 @@ class PartialParsing:
continue
elem = self.get_schema_element(new_yaml_dict[dict_key], name)
if elem:
self.delete_schema_macro_patch(schema_file, macro)
self.merge_patch(schema_file, dict_key, macro)
self.delete_schema_macro_patch(schema_file, elem)
self.merge_patch(schema_file, dict_key, elem)
# exposures
dict_key = 'exposures'

View File

@@ -960,10 +960,9 @@ class MacroPatchParser(NonSourceParser[UnparsedMacroUpdate, ParsedMacroPatch]):
unique_id = f'macro.{patch.package_name}.{patch.name}'
macro = self.manifest.macros.get(unique_id)
if not macro:
warn_or_error(
f'WARNING: Found patch for macro "{patch.name}" '
f'which was not found'
)
msg = f'Found patch for macro "{patch.name}" ' \
f'which was not found'
warn_or_error(msg, log_fmt=warning_tag('{}'))
return
if macro.patch_path:
package_name, existing_file_path = macro.patch_path.split('://')

View File

@@ -63,8 +63,14 @@ class SnapshotParser(
def transform(self, node: IntermediateSnapshotNode) -> ParsedSnapshotNode:
try:
# The config_call_dict is not serialized, because normally
# it is not needed after parsing. But since the snapshot node
# does this extra to_dict, save and restore it, to keep
# the model config when there is also schema config.
config_call_dict = node.config_call_dict
dct = node.to_dict(omit_none=True)
parsed_node = ParsedSnapshotNode.from_dict(dct)
parsed_node.config_call_dict = config_call_dict
self.set_snapshot_attributes(parsed_node)
return parsed_node
except ValidationError as exc:

View File

@@ -334,7 +334,7 @@ class BaseRunner(metaclass=ABCMeta):
GenericExceptionOnRun(
build_path=self.node.build_path,
unique_id=self.node.unique_id,
exc=e
exc=str(e) # TODO: unstring this when serialization is fixed
)
)
fire_event(PrintDebugStackTrace())

View File

@@ -38,7 +38,7 @@ class CleanTask(BaseTask):
"""
move_to_nearest_project_dir(self.args)
if ('dbt_modules' in self.config.clean_targets and
self.config.packages_install_path != 'dbt_modules'):
self.config.packages_install_path not in self.config.clean_targets):
deprecations.warn('install-packages-path')
for path in self.config.clean_targets:
fire_event(CheckCleanPath(path=path))

View File

@@ -10,7 +10,7 @@ from dbt.deps.resolver import resolve_packages
from dbt.events.functions import fire_event
from dbt.events.types import (
DepsNoPackagesFound, DepsStartPackageInstall, DepsUpdateAvailable, DepsUTD,
DepsInstallInfo, DepsListSubdirectory, DepsNotifyUpdatesAvailable
DepsInstallInfo, DepsListSubdirectory, DepsNotifyUpdatesAvailable, EmptyLine
)
from dbt.clients import system
@@ -63,7 +63,7 @@ class DepsTask(BaseTask):
source_type = package.source_type()
version = package.get_version()
fire_event(DepsStartPackageInstall(package=package))
fire_event(DepsStartPackageInstall(package_name=package_name))
package.install(self.config, renderer)
fire_event(DepsInstallInfo(version_name=package.nice_version_name()))
if source_type == 'hub':
@@ -81,6 +81,7 @@ class DepsTask(BaseTask):
source_type=source_type,
version=version)
if packages_to_upgrade:
fire_event(EmptyLine())
fire_event(DepsNotifyUpdatesAvailable(packages=packages_to_upgrade))
@classmethod

View File

@@ -14,6 +14,8 @@ from dbt import flags
from dbt.version import _get_adapter_plugin_names
from dbt.adapters.factory import load_plugin, get_include_paths
from dbt.contracts.project import Name as ProjectName
from dbt.events.functions import fire_event
from dbt.events.types import (
StarterProjectPath, ConfigFolderDirectory, NoSampleProfileFound, ProfileWrittenWithSample,
@@ -48,7 +50,8 @@ Need help? Don't hesitate to reach out to us via GitHub issues or on Slack:
Happy modeling!
"""
# https://click.palletsprojects.com/en/8.0.x/api/?highlight=float#types
# https://click.palletsprojects.com/en/8.0.x/api/#types
# click v7.0 has UNPROCESSED, STRING, INT, FLOAT, BOOL, and UUID available.
click_type_mapping = {
"string": click.STRING,
"int": click.INT,
@@ -269,6 +272,16 @@ class InitTask(BaseTask):
numeric_choice = click.prompt(prompt_msg, type=click.INT)
return available_adapters[numeric_choice - 1]
def get_valid_project_name(self) -> str:
"""Returns a valid project name, either from CLI arg or user prompt."""
name = self.args.project_name
while not ProjectName.is_valid(name):
if name:
click.echo(name + " is not a valid project name.")
name = click.prompt("Enter a name for your project (letters, digits, underscore)")
return name
def run(self):
"""Entry point for the init task."""
profiles_dir = flags.PROFILES_DIR
@@ -285,6 +298,8 @@ class InitTask(BaseTask):
# just setup the user's profile.
fire_event(SettingUpProfile())
profile_name = self.get_profile_name_from_current_project()
if not self.check_if_can_write_profile(profile_name=profile_name):
return
# If a profile_template.yml exists in the project root, that effectively
# overrides the profile_template.yml for the given target.
profile_template_path = Path("profile_template.yml")
@@ -296,8 +311,6 @@ class InitTask(BaseTask):
return
except Exception:
fire_event(InvalidProfileTemplateYAML())
if not self.check_if_can_write_profile(profile_name=profile_name):
return
adapter = self.ask_for_adapter_choice()
self.create_profile_from_target(
adapter, profile_name=profile_name
@@ -306,11 +319,7 @@ class InitTask(BaseTask):
# When dbt init is run outside of an existing project,
# create a new project and set up the user's profile.
project_name = self.args.project_name
if project_name is None:
# If project name is not provided,
# ask the user which project name they'd like to use.
project_name = click.prompt("What is the desired project name?")
project_name = self.get_valid_project_name()
project_path = Path(project_name)
if project_path.exists():
fire_event(ProjectNameAlreadyExists(name=project_name))

View File

@@ -65,6 +65,8 @@ def print_run_status_line(results) -> None:
stats[result_type] += 1
stats['total'] += 1
with TextOnly():
fire_event(EmptyLine())
fire_event(StatsLine(stats=stats))

View File

@@ -11,7 +11,7 @@ from .printer import (
print_run_end_messages,
get_counts,
)
from datetime import datetime
from dbt import tracking
from dbt import utils
from dbt.adapters.base import BaseRelation
@@ -21,7 +21,7 @@ from dbt.contracts.graph.compiled import CompileResultNode
from dbt.contracts.graph.manifest import WritableManifest
from dbt.contracts.graph.model_config import Hook
from dbt.contracts.graph.parsed import ParsedHookNode
from dbt.contracts.results import NodeStatus, RunResult, RunStatus
from dbt.contracts.results import NodeStatus, RunResult, RunStatus, RunningStatus
from dbt.exceptions import (
CompilationException,
InternalException,
@@ -342,6 +342,8 @@ class RunTask(CompileTask):
finishctx = TimestampNamed('node_finished_at')
for idx, hook in enumerate(ordered_hooks, start=1):
hook._event_status['started_at'] = datetime.utcnow().isoformat()
hook._event_status['node_status'] = RunningStatus.Started
sql = self.get_hook_sql(adapter, hook, idx, num_hooks,
extra_context)
@@ -360,19 +362,21 @@ class RunTask(CompileTask):
)
)
status = 'OK'
with Timer() as timer:
if len(sql.strip()) > 0:
status, _ = adapter.execute(sql, auto_begin=False,
fetch=False)
self.ran_hooks.append(hook)
response, _ = adapter.execute(sql, auto_begin=False, fetch=False)
status = response._message
else:
status = 'OK'
self.ran_hooks.append(hook)
hook._event_status['finished_at'] = datetime.utcnow().isoformat()
with finishctx, DbtModelState({'node_status': 'passed'}):
hook._event_status['node_status'] = RunStatus.Success
fire_event(
PrintHookEndLine(
statement=hook_text,
status=str(status),
status=status,
index=idx,
total=num_hooks,
execution_time=timer.elapsed,
@@ -380,6 +384,11 @@ class RunTask(CompileTask):
report_node_data=hook
)
)
# `_event_status` dict is only used for logging. Make sure
# it gets deleted when we're done with it
del hook._event_status["started_at"]
del hook._event_status["finished_at"]
del hook._event_status["node_status"]
self._total_executed += len(ordered_hooks)

View File

@@ -56,6 +56,7 @@ from dbt.parser.manifest import ManifestLoader
import dbt.exceptions
from dbt import flags
import dbt.utils
from dbt.ui import warning_tag
RESULT_FILE_NAME = 'run_results.json'
MANIFEST_FILE_NAME = 'manifest.json'
@@ -208,7 +209,7 @@ class GraphRunnableTask(ManifestTask):
with RUNNING_STATE, uid_context:
startctx = TimestampNamed('node_started_at')
index = self.index_offset(runner.node_index)
runner.node._event_status['dbt_internal__started_at'] = datetime.utcnow().isoformat()
runner.node._event_status['started_at'] = datetime.utcnow().isoformat()
runner.node._event_status['node_status'] = RunningStatus.Started
extended_metadata = ModelMetadata(runner.node, index)
@@ -224,8 +225,7 @@ class GraphRunnableTask(ManifestTask):
result = runner.run_with_hooks(self.manifest)
status = runner.get_result_status(result)
runner.node._event_status['node_status'] = result.status
runner.node._event_status['dbt_internal__finished_at'] = \
datetime.utcnow().isoformat()
runner.node._event_status['finished_at'] = datetime.utcnow().isoformat()
finally:
finishctx = TimestampNamed('finished_at')
with finishctx, DbtModelState(status):
@@ -238,8 +238,8 @@ class GraphRunnableTask(ManifestTask):
)
# `_event_status` dict is only used for logging. Make sure
# it gets deleted when we're done with it
del runner.node._event_status["dbt_internal__started_at"]
del runner.node._event_status["dbt_internal__finished_at"]
del runner.node._event_status["started_at"]
del runner.node._event_status["finished_at"]
del runner.node._event_status["node_status"]
fail_fast = flags.FAIL_FAST
@@ -359,7 +359,7 @@ class GraphRunnableTask(ManifestTask):
adapter = get_adapter(self.config)
if not adapter.is_cancelable():
fire_event(QueryCancelationUnsupported(type=adapter.type))
fire_event(QueryCancelationUnsupported(type=adapter.type()))
else:
with adapter.connection_named('master'):
for conn_name in adapter.cancel_open_connections():
@@ -377,10 +377,8 @@ class GraphRunnableTask(ManifestTask):
num_threads = self.config.threads
target_name = self.config.target_name
text = "Concurrency: {} threads (target='{}')"
concurrency_line = text.format(num_threads, target_name)
with NodeCount(self.num_nodes):
fire_event(ConcurrencyLine(concurrency_line=concurrency_line))
fire_event(ConcurrencyLine(num_threads=num_threads, target_name=target_name))
with TextOnly():
fire_event(EmptyLine())
@@ -461,8 +459,11 @@ class GraphRunnableTask(ManifestTask):
)
if len(self._flattened_nodes) == 0:
warn_or_error("\nWARNING: Nothing to do. Try checking your model "
"configs and model specification args")
with TextOnly():
fire_event(EmptyLine())
msg = "Nothing to do. Try checking your model " \
"configs and model specification args"
warn_or_error(msg, log_fmt=warning_tag('{}'))
result = self.get_result(
results=[],
generated_at=datetime.utcnow(),

View File

@@ -6,7 +6,7 @@ from dbt.include.global_project import DOCS_INDEX_FILE_PATH
from http.server import SimpleHTTPRequestHandler
from socketserver import TCPServer
from dbt.events.functions import fire_event
from dbt.events.types import ServingDocsPort, ServingDocsAccessInfo, ServingDocsExitInfo
from dbt.events.types import ServingDocsPort, ServingDocsAccessInfo, ServingDocsExitInfo, EmptyLine
from dbt.task.base import ConfiguredTask
@@ -22,6 +22,8 @@ class ServeTask(ConfiguredTask):
fire_event(ServingDocsPort(address=address, port=port))
fire_event(ServingDocsAccessInfo(port=port))
fire_event(EmptyLine())
fire_event(EmptyLine())
fire_event(ServingDocsExitInfo())
# mypy doesn't think SimpleHTTPRequestHandler is ok here, but it is

View File

@@ -66,6 +66,4 @@ def line_wrap_message(
def warning_tag(msg: str) -> str:
# no longer needed, since new logging includes colorized log level
# return f'[{yellow("WARNING")}]: {msg}'
return msg
return f'[{yellow("WARNING")}]: {msg}'

View File

@@ -10,12 +10,13 @@ import jinja2
import json
import os
import requests
from tarfile import ReadError
import time
from contextlib import contextmanager
from dbt.exceptions import ConnectionException
from dbt.events.functions import fire_event
from dbt.events.types import RetryExternalCall
from dbt.events.types import RetryExternalCall, RecordRetryException
from enum import Enum
from typing_extensions import Protocol
from typing import (
@@ -598,18 +599,21 @@ class MultiDict(Mapping[str, Any]):
def _connection_exception_retry(fn, max_attempts: int, attempt: int = 0):
"""Attempts to run a function that makes an external call, if the call fails
on a connection error or timeout, it will be tried up to 5 more times.
on a Requests exception or decompression issue (ReadError), it will be tried
up to 5 more times. All exceptions that Requests explicitly raises inherit from
requests.exceptions.RequestException. See https://github.com/dbt-labs/dbt-core/issues/4579
for context on this decompression issues specifically.
"""
try:
return fn()
except (
requests.exceptions.ConnectionError,
requests.exceptions.Timeout,
requests.exceptions.ContentDecodingError,
requests.exceptions.RequestException,
ReadError,
) as exc:
if attempt <= max_attempts - 1:
fire_event(RecordRetryException(exc=exc))
fire_event(RetryExternalCall(attempt=attempt, max=max_attempts))
time.sleep(1)
_connection_exception_retry(fn, max_attempts, attempt + 1)
return _connection_exception_retry(fn, max_attempts, attempt + 1)
else:
raise ConnectionException('External connection exception occurred: ' + str(exc))

View File

@@ -10,13 +10,15 @@ import requests
import dbt.exceptions
import dbt.semver
from dbt.ui import green, red, yellow
from dbt import flags
PYPI_VERSION_URL = 'https://pypi.org/pypi/dbt/json'
PYPI_VERSION_URL = 'https://pypi.org/pypi/dbt-core/json'
def get_latest_version():
def get_latest_version(version_url: str = PYPI_VERSION_URL):
try:
resp = requests.get(PYPI_VERSION_URL)
resp = requests.get(version_url)
data = resp.json()
version_string = data['info']['version']
except (json.JSONDecodeError, KeyError, requests.RequestException):
@@ -29,7 +31,13 @@ def get_installed_version():
return dbt.semver.VersionSpecifier.from_version_string(__version__)
def get_package_pypi_url(package_name: str) -> str:
return f'https://pypi.org/pypi/dbt-{package_name}/json'
def get_version_information():
flags.USE_COLORS = True if not flags.USE_COLORS else None
installed = get_installed_version()
latest = get_latest_version()
@@ -44,16 +52,40 @@ def get_version_information():
plugin_version_msg = "Plugins:\n"
for plugin_name, version in _get_dbt_plugins_info():
plugin_version_msg += ' - {plugin_name}: {version}\n'.format(
plugin_name=plugin_name, version=version
)
plugin_version = dbt.semver.VersionSpecifier.from_version_string(version)
latest_plugin_version = get_latest_version(version_url=get_package_pypi_url(plugin_name))
plugin_update_msg = ''
if installed == plugin_version or (
latest_plugin_version and plugin_version == latest_plugin_version
):
compatibility_msg = green('Up to date!')
else:
if latest_plugin_version:
if installed.major == plugin_version.major:
compatibility_msg = yellow('Update available!')
else:
compatibility_msg = red('Out of date!')
plugin_update_msg = (
" Your version of dbt-{} is out of date! "
"You can find instructions for upgrading here:\n"
" https://docs.getdbt.com/dbt-cli/install/overview\n\n"
).format(plugin_name)
else:
compatibility_msg = yellow('No PYPI version available')
plugin_version_msg += (
" - {}: {} - {}\n"
"{}"
).format(plugin_name, version, compatibility_msg, plugin_update_msg)
if latest is None:
return ("{}The latest version of dbt could not be determined!\n"
"Make sure that the following URL is accessible:\n{}\n\n{}"
.format(version_msg, PYPI_VERSION_URL, plugin_version_msg))
.format(version_msg, PYPI_VERSION_URL, plugin_version_msg)
)
if installed == latest:
return "{}Up to date!\n\n{}".format(version_msg, plugin_version_msg)
return f"{version_msg}{green('Up to date!')}\n\n{plugin_version_msg}"
elif installed > latest:
return ("{}Your version of dbt is ahead of the latest "
@@ -91,10 +123,10 @@ def _get_dbt_plugins_info():
f'dbt.adapters.{plugin_name}.__version__'
)
except ImportError:
# not an adpater
# not an adapter
continue
yield plugin_name, mod.version
__version__ = '1.0.0rc3'
__version__ = '1.0.6'
installed = get_installed_version()

View File

@@ -284,12 +284,12 @@ def parse_args(argv=None):
parser.add_argument('adapter')
parser.add_argument('--title-case', '-t', default=None)
parser.add_argument('--dependency', action='append')
parser.add_argument('--dbt-core-version', default='1.0.0rc3')
parser.add_argument('--dbt-core-version', default='1.0.6')
parser.add_argument('--email')
parser.add_argument('--author')
parser.add_argument('--url')
parser.add_argument('--sql', action='store_true')
parser.add_argument('--package-version', default='1.0.0rc3')
parser.add_argument('--package-version', default='1.0.6')
parser.add_argument('--project-version', default='1.0')
parser.add_argument(
'--no-dependency', action='store_false', dest='set_dependency'

View File

@@ -25,7 +25,7 @@ with open(os.path.join(this_directory, 'README.md')) as f:
package_name = "dbt-core"
package_version = "1.0.0rc3"
package_version = "1.0.6"
description = """With dbt, data analysts and engineers can build analytics \
the way engineers build applications."""
@@ -52,8 +52,9 @@ setup(
],
install_requires=[
'Jinja2==2.11.3',
'MarkupSafe>=0.23,<2.1',
'agate>=1.6,<1.6.4',
'click>=8,<9',
'click>=7.0,<9',
'colorama>=0.3.9,<0.4.5',
'hologram==0.0.14',
'isodate>=0.6,<0.7',
@@ -63,7 +64,7 @@ setup(
'networkx>=2.3,<3',
'packaging>=20.9,<22.0',
'sqlparse>=0.2.3,<0.5',
'dbt-extractor==0.4.0',
'dbt-extractor~=0.4.1',
'typing-extensions>=3.7.4,<3.11',
'werkzeug>=1,<3',
# the following are all to match snowflake-connector-python

View File

@@ -1,19 +1,19 @@
agate==1.6.3
attrs==21.2.0
Babel==2.9.1
attrs==21.4.0
Babel==2.10.1
certifi==2021.10.8
cffi==1.15.0
charset-normalizer==2.0.8
click==8.0.3
charset-normalizer==2.0.12
click==8.1.2
colorama==0.4.4
dbt-core==1.0.0rc3
dbt-extractor==0.4.0
dbt-postgres==1.0.0rc3
dbt-core==1.0.6
dbt-extractor==0.4.1
dbt-postgres==1.0.6
future==0.18.2
hologram==0.0.14
idna==3.3
importlib-metadata==4.8.2
isodate==0.6.0
importlib-metadata==4.11.3
isodate==0.6.1
Jinja2==2.11.3
jsonschema==3.1.1
leather==0.3.4
@@ -22,23 +22,23 @@ MarkupSafe==2.0.1
mashumaro==2.9
minimal-snowplow-tracker==0.0.2
msgpack==1.0.3
networkx==2.6.3
networkx==2.8
packaging==21.3
parsedatetime==2.4
psycopg2-binary==2.9.2
psycopg2-binary==2.9.3
pycparser==2.21
pyparsing==3.0.6
pyrsistent==0.18.0
pyparsing==3.0.8
pyrsistent==0.18.1
python-dateutil==2.8.2
python-slugify==5.0.2
python-slugify==6.1.2
pytimeparse==1.1.8
pytz==2021.3
pytz==2022.1
PyYAML==6.0
requests==2.26.0
requests==2.27.1
six==1.16.0
sqlparse==0.4.2
text-unidecode==1.3
typing-extensions==3.10.0.2
urllib3==1.26.7
Werkzeug==2.0.2
zipp==3.6.0
urllib3==1.26.9
Werkzeug==2.1.1
zipp==3.8.0

View File

@@ -1,18 +1,118 @@
# Performance Regression Testing
This directory includes dbt project setups to test on and a test runner written in Rust which runs specific dbt commands on each of the projects. Orchestration is done via the GitHub Action workflow in `/.github/workflows/performance.yml`. The workflow is scheduled to run every night, but it can also be triggered manually.
The github workflow hardcodes our baseline branch for performance metrics as `0.20.latest`. As future versions become faster, this branch will be updated to hold us to those new standards.
## Attention!
## Adding a new dbt project
Just make a new directory under `performance/projects/`. It will automatically be picked up by the tests.
PLEASE READ THIS README IN THE MAIN BRANCH
The performance runner is always pulled from main regardless of the version being modeled or sampled. If you are not in the main branch, this information may be stale.
## Adding a new dbt command
In `runner/src/measure.rs::measure` add a metric to the `metrics` Vec. The Github Action will handle recompilation if you don't have the rust toolchain installed.
## Description
This test suite samples the performance characteristics of individual commits against performance models for prior releases. Performance is measured in project-command pairs which are assumed to conform to a normal distribution. The sampling and comparison is effecient enough to run against PRs.
This collection of projects and commands should expand over time to reflect user feedback about poorly performing projects to protect against poor performance in these scenarios in future versions.
Here are all the components of the testing module:
- dbt project setups that are known performance bottlenecks which you can find in `/performance/projects/`, and a runner written in Rust that runs specific dbt commands on each of the projects.
- Performance characteristics called "baselines" from released dbt versions in `/performance/baselines/`. Each branch will only have the baselines for its ancestors because when we compare samples, we compare against the lastest baseline available in the branch.
- A GitHub action for modeling the performance distribution for a new release: `/.github/workflows/model_performance.yml`.
- A GitHub action for sampling performance of dbt at your commit and comparing it against a previous release: `/.github/workflows/sample_performance.yml`.
At this time, the biggest risk in the design of this project is how to account for the natural variation of GitHub Action runs. Typically, performance work is done on dedicated hardware to elimiate this factor. However, there are ways to integrate the variation in obeservation tools if it can be measured.
## Adding Test Scenarios
A clear process for maintainers and community members to add new performance testing targets will exist after the next stage of the test suite is complete. For details, see #4768.
## Investigating Regressions
If your commit has failed one of the performance regression tests, it does not necessarily mean your commit has a performance regression. However, the observed runtime value was so much slower than the expected value that it was unlikely to be random noise. If it is not due to random noise, this commit contains the code that is causing this performance regression. However, it may not be the commit that introduced that code. That code may have been introduced in the commit before even if it passed due to natural variation in sampling. When investigating a performance regression, start with the failing commit and working your way backwards.
Here's an example of how this could happen:
```
Commit
A <- last release
B
C <- perf regression
D
E
F <- the first failing commit
```
- Commit A is measured to have an expected value for one performance metric of 30 seconds with a standard deviation of 0.5 seconds.
- Commit B doesn't introduce a performance regression and passes the performance regression tests.
- Commit C introduces a performance regression such that the new expected value of the metric is 32 seconds with a standard deviation still at 0.5 seconds, but we don't know this because we don't estimate the whole performance distribution on every commit because that is far too much work to run on every commit. It passes the performance regression test because we happened to sample a value of 31 seconds which is within our threshold for the original model. It's also only 2 standard deviations away from the actual performance model of commit C so even though it's not going to be a super common situation, it is expected to happen sometimes.
- Commit D samples a value of 31.4 seconds and passes
- Commit E samples a value of 31.2 seconds and passes
- Commit F samples a value of 32.9 seconds and fails
Because these performance regression tests are non-deterministic, it is frequently going to be possible to rerun the test on a failing commit and get it to pass. The more often we do this, the farther down the commit history we will be punting detection.
If your PR is against `main` your commits will be compared against the latest baseline measurement found in `performance/baselines`. If this commit needs to be backported, that PR will be against the `.latest` branch and will also compare against the latest baseline measurement found in `performance/baselines` in that branch. These two versions may be the same or they may be different. For example, If the latest version of dbt is v1.99.0, the performance sample of your PR against main will compare against the baseline for v1.99.0. When those commits are backported to `1.98.latest` those commits will be compared against the baseline for v1.98.6 (or whatever the latest is at that time). Even if the compared baseline is the same, a different sample is taken for each PR. In this case, even though it should be rare, it is possible for a performance regression to be detected in one of the two PRs even with the same baseline due to variation in sampling.
## The Statistics
Particle physicists need to be confident in declaring new discoveries, snack manufacturers need to be sure each individual item is within the regulated margin of error for nutrition facts, and weight-rated climbing gear needs to be produced so you can trust your life to every unit that comes off the line. All of these use cases use the same kind of math to meet their needs: sigma-based p-values. This section will peel apart that math with the help of a physicist and walk through how we apply this approach to performance regression testing in this test suite.
You are likely familiar with forming a hypothesis of the form "A and B are correlated" which is known as _the research hypothesis_. Additionally, it follows that the hypothesis "A and B are not correlated" is relevant and is known as _the null hypothesis_. When looking at data, we commonly use a _p-value_ to determine the significance of the data. Formally, a _p-value_ is the probability of obtaining data at least as extreme as the ones observed, if the null hypothesis is true. To refine this definition, The experimental partical physicist [Dr. Tommaso Dorigo](https://userswww.pd.infn.it/~dorigo/#about) has an excellent [glossary](https://www.science20.com/quantum_diaries_survivor/fundamental_glossary_higgs_broadcast-85365) of these terms that helps clarify: "'Extreme' is quite tricky instead: it depends on what is your 'alternate hypothesis' of reference, and what kind of departure it would produce on the studied statistic derived from the data. So 'extreme' will mean 'departing from the typical values expected for the null hypothesis, toward the values expected from the alternate hypothesis.'" In the context of performance regression testing, our research hypothesis is that "after commit A, the codebase includes a performance regression" which means we expect the runtime of our measured processes to be _slower_, not faster than the expected value.
Given this definition of p-value, we need to explicitly call out the common tendancy to apply _probability inversion_ to our observations. To quote [Dr. Tommaso Dorigo](https://www.science20.com/quantum_diaries_survivor/fundamental_glossary_higgs_broadcast-85365) again, "If your ability on the long jump puts you in the 99.99% percentile, that does not mean that you are a kangaroo, and neither can one infer that the probability that you belong to the human race is 0.01%." Using our previously defined terms, the p-value is _not_ the probability that the null hypothesis _is true_.
This brings us to calculating sigma values. Sigma refers to the standard deviation of a statistical model, which is used as a measurement of how far away an observed value is from the expected value. When we say that we have a "3 sigma result" we are saying that if the null hypothesis is true, this is a particularly unlikely observation—not that the null hypothesis is false. Exactly how unlikely depends on what the expected values from our research hypothesis are. In the context of performance regression testing, if the null hypothesis is false, we are expecting the results to be _slower_ than the expected value not _slower or faster_. Looking at a normal distrubiton below, we can see that we only care about one _half_ of the distribution: the half where the values are slower than the expected value. This means that when we're calculating the p-value we are not including both sides of the normal distribution.
![normal distibution](./images/normal.svg)
Because of this, the following table describes the significance of each sigma level for our _one-sided_ hypothesis:
| σ | p-value | scientific significance |
| --- | -------------- | ----------------------- |
| 1 σ | 1 in 6 | |
| 2 σ | 1 in 44 | |
| 3 σ | 1 in 741 | evidence |
| 4 σ | 1 in 31,574 | |
| 5 σ | 1 in 3,486,914 | discovery |
When detecting performance regressions that trigger alerts, block PRs, or delay releases we want to be conservative enough that detections are infrequently triggered by noise, but not so conservative as to miss most actual regressions. This test suite uses a 3 sigma standard so that only about 1 in every 700 runs is expected to fail the performance regression test suite due to expected variance in our measurements.
In practice, the number of performance regression failures due to random noise will be higher because we are not incorporating the variance of the tools we use to measure, namely GHA.
### Concrete Example: Performance Regression Detection
The following example data was collected by running the code in this repository in Github Actions.
In dbt v1.0.3, we have the following mean and standard deviation when parsing a dbt project with 2000 models:
μ (mean): 41.22<br/>
σ (stddev): 0.2525<br/>
The 2-sided 3 sigma range can be calculated with these two values via:
x < μ - 3 σ or x > μ + 3 σ<br/>
x < 41.22 - 3 * 0.2525 or x > 41.22 + 3 * 0.2525 <br/>
x < 40.46 or x > 41.98<br/>
It follows that the 1-sided 3 sigma range for performance regressions is just:<br/>
x > 41.98
If when we sample a single `dbt parse` of the same project with a commit slated to go into dbt v1.0.4, we observe a 42s parse time, then this observation is so unlikely if there were no code-induced performance regressions, that we should investigate if there is a performance regression in any of the commits between this failure and the commit where the initial distribution was measured.
Observations with 3 sigma significance that are _not_ performance regressions could be due to observing unlikely values (roughly 1 in every 750 observations), or variations in the instruments we use to take these measurements such as github actions. At this time we do not measure the variation in the instruments we use to account for these in our calculations which means failures due to random noise are more likely than they would be if we did take them into account.
### Concrete Example: Performance Modeling
Once a new dbt version is released (excluding pre-releases), the performance characteristics of that released version need to be measured. In this repository this measurement is referred to as a baseline.
After dbt v1.0.99 is released, a github action running from `main`, for the latest version of that action, takes the following steps:
- Checks out main for the latest performance runner
- pip installs dbt v1.0.99
- builds the runner if it's not already in the github actions cache
- uses the performance runner model sub command with `./runner model`.
- The model subcommand calls hyperfine to run all of the project-command pairs a large number of times (maybe 20 or so) and save the hyperfine outputs to files in `performance/baselines/1.0.99/` one file per command-project pair.
- The action opens two PRs with these files: one against `main` and one against `1.0.latest` so that future PRs against these branches will detect regressions against the performance characteristics of dbt v1.0.99 instead of v1.0.98.
- The release driver for dbt v1.0.99 reviews and merges these PRs which is the sole deliverable of the performance modeling work.
## Future work
- add more projects to test different configurations that have been known bottlenecks
- add more dbt commands to measure
- possibly using the uploaded json artifacts to store these results so they can be graphed over time
- reading new metrics from a file so no one has to edit rust source to add them to the suite
- instead of building the rust every time, we could publish and pull down the latest version.
- instead of manually setting the baseline version of dbt to test, pull down the latest stable version as the baseline.
- pin commands to projects by reading commands from a file defined in the project.
- add a postgres warehouse to run `dbt compile` and `dbt run` commands
- add more projects to test different configurations that have been known performance bottlenecks
- Account for github action variation: Either measure it, or eliminate it. To measure it we could set up another action that periodically samples the same version of dbt and use a 7 day rolling variation. To eliminate it we could run the action using something like [act](https://github.com/nektos/act) on dedicated hardware.
- build in a git-bisect run to automatically identify the commits that caused a performance regression by modeling each commit's expected value for the failing metric. Running this automatically, or even providing a script to do this locally would be useful.

View File

@@ -0,0 +1,40 @@
{
"version": "1.0.3",
"metric": {
"name": "parse",
"project_name": "01_2000_simple_models"
},
"ts": "2022-03-04T00:02:52.657727515Z",
"measurement": {
"command": "dbt parse --no-version-check --profiles-dir ../../project_config/",
"mean": 41.224566760615,
"stddev": 0.252468634424254,
"median": 41.182836243915,
"user": 40.70073678499999,
"system": 0.61185062,
"min": 40.89372129691501,
"max": 41.68176405591501,
"times": [
41.397582801915,
41.618822256915,
41.374914350915,
41.68176405591501,
41.255119986915,
41.528348636915,
41.238762892915,
40.950121934915,
41.388716648915,
41.62938069991501,
41.139914502915,
41.114225200915,
41.045012222915,
41.01039839391501,
40.915296414915,
41.006528646915,
40.89372129691501,
40.951454721915,
41.125491559915,
41.225757984915
]
}
}

View File

@@ -1 +1 @@
version = '1.0.0rc3'
version = '1.0.6'

View File

@@ -41,7 +41,7 @@ def _dbt_psycopg2_name():
package_name = "dbt-postgres"
package_version = "1.0.0rc3"
package_version = "1.0.6"
description = """The postgres adpter plugin for dbt (data build tool)"""
this_directory = os.path.abspath(os.path.dirname(__file__))

View File

@@ -5,7 +5,7 @@ import sys
if 'sdist' not in sys.argv:
print('')
print('As of v1.0.0, `pip install dbt` is no longer supported.')
print('As of v1.0.6, `pip install dbt` is no longer supported.')
print('Instead, please use one of the following.')
print('')
print('**To use dbt with your specific database, platform, or query engine:**')
@@ -50,7 +50,7 @@ with open(os.path.join(this_directory, 'README.md')) as f:
package_name = "dbt"
package_version = "1.0.0rc3"
package_version = "1.0.6"
description = """With dbt, data analysts and engineers can build analytics \
the way engineers build applications."""
@@ -81,4 +81,5 @@ setup(
'Programming Language :: Python :: 3.9',
],
python_requires=">=3.7",
packages=[]
)

View File

@@ -3,3 +3,6 @@ snapshots:
- name: snapshot_actual
tests:
- mutually_exclusive_ranges
config:
meta:
owner: 'a_owner'

View File

@@ -246,3 +246,58 @@ class TestSimpleDependencyNoProfile(TestSimpleDependency):
with tempfile.TemporaryDirectory() as tmpdir:
result = self.run_dbt(["clean", "--profiles-dir", tmpdir])
return result
class TestSimpleDependencyBadProfile(DBTIntegrationTest):
@property
def schema(self):
return "simple_dependency_006"
@property
def models(self):
return "models"
@property
def project_config(self):
return {
'config-version': 2,
'models': {
'+any_config': "{{ target.name }}",
'+enabled': "{{ target.name in ['redshift', 'postgres'] | as_bool }}"
}
}
def postgres_profile(self):
# Need to set the environment variable here initially because
# the unittest setup does a load_config.
os.environ['PROFILE_TEST_HOST'] = self.database_host
return {
'config': {
'send_anonymous_usage_stats': False
},
'test': {
'outputs': {
'default2': {
'type': 'postgres',
'threads': 4,
'host': "{{ env_var('PROFILE_TEST_HOST') }}",
'port': 5432,
'user': 'root',
'pass': 'password',
'dbname': 'dbt',
'schema': self.unique_schema()
},
},
'target': 'default2'
}
}
@use_profile('postgres')
def test_postgres_deps_bad_profile(self):
del os.environ['PROFILE_TEST_HOST']
self.run_dbt(["deps"])
@use_profile('postgres')
def test_postgres_clean_bad_profile(self):
del os.environ['PROFILE_TEST_HOST']
self.run_dbt(["clean"])

View File

@@ -43,7 +43,7 @@ class TestConfigPathDeprecation(BaseTestDeprecations):
with self.assertRaises(dbt.exceptions.CompilationException) as exc:
self.run_dbt(['--warn-error', 'debug'])
exc_str = ' '.join(str(exc.exception).split()) # flatten all whitespace
expected = "The `data-paths` config has been deprecated"
expected = "The `data-paths` config has been renamed"
assert expected in exc_str
@@ -116,11 +116,16 @@ class TestPackageRedirectDeprecation(BaseTestDeprecations):
expected = {'package-redirect'}
self.assertEqual(expected, deprecations.active_deprecations)
@use_profile('postgres')
def test_postgres_package_redirect_fail(self):
self.assertEqual(deprecations.active_deprecations, set())
with self.assertRaises(dbt.exceptions.CompilationException) as exc:
self.run_dbt(['--warn-error', 'deps'])
exc_str = ' '.join(str(exc.exception).split()) # flatten all whitespace
expected = "The `fishtown-analytics/dbt_utils` package is deprecated in favor of `dbt-labs/dbt_utils`"
assert expected in exc_str
# this test fails as a result of the caching added in
# https://github.com/dbt-labs/dbt-core/pull/4982
# This seems to be a testing issue though. Everything works when tested locally
# and the CompilationException get raised. Since we're refactoring these tests anyways
# I won't rewrite this one
# @use_profile('postgres')
# def test_postgres_package_redirect_fail(self):
# self.assertEqual(deprecations.active_deprecations, set())
# with self.assertRaises(dbt.exceptions.CompilationException) as exc:
# self.run_dbt(['--warn-error', 'deps'])
# exc_str = ' '.join(str(exc.exception).split()) # flatten all whitespace
# expected = "The `fishtown-analytics/dbt_utils` package is deprecated in favor of `dbt-labs/dbt_utils`"
# assert expected in exc_str

View File

@@ -221,6 +221,36 @@ class TestAllowSecretProfilePackage(DBTIntegrationTest):
self.assertFalse("first_dependency" in log_output)
class TestCloneFailSecretScrubbed(DBTIntegrationTest):
def setUp(self):
os.environ[SECRET_ENV_PREFIX + "GIT_TOKEN"] = "abc123"
DBTIntegrationTest.setUp(self)
@property
def packages_config(self):
return {
"packages": [
{"git": "https://fakeuser:{{ env_var('DBT_ENV_SECRET_GIT_TOKEN') }}@github.com/dbt-labs/fake-repo.git"},
]
}
@property
def schema(self):
return "context_vars_013"
@property
def models(self):
return "models"
@use_profile('postgres')
def test_postgres_fail_clone_with_scrubbing(self):
with self.assertRaises(dbt.exceptions.InternalException) as exc:
_, log_output = self.run_dbt_and_capture(['deps'])
assert "abc123" not in str(exc.exception)
class TestEmitWarning(DBTIntegrationTest):
@property
def schema(self):

View File

@@ -1,4 +1,5 @@
from test.integration.base import DBTIntegrationTest, use_profile
import os
class TestPrePostRunHooks(DBTIntegrationTest):
@@ -22,6 +23,7 @@ class TestPrePostRunHooks(DBTIntegrationTest):
'run_started_at',
'invocation_id'
]
os.environ['TERM_TEST'] = 'TESTING'
@property
def schema(self):
@@ -41,6 +43,7 @@ class TestPrePostRunHooks(DBTIntegrationTest):
"{{ custom_run_hook('start', target, run_started_at, invocation_id) }}",
"create table {{ target.schema }}.start_hook_order_test ( id int )",
"drop table {{ target.schema }}.start_hook_order_test",
"{{ log(env_var('TERM_TEST'), info=True) }}",
],
"on-run-end": [
"{{ custom_run_hook('end', target, run_started_at, invocation_id) }}",

View File

@@ -41,9 +41,17 @@ def temporary_working_directory() -> str:
out : str
The temporary working directory.
"""
with tempfile.TemporaryDirectory() as tmpdir:
with change_working_directory(tmpdir):
yield tmpdir
# N.B: supressing the OSError is necessary for older (pre 3.10) versions of python
# which do not support the `ignore_cleanup_errors` in tempfile::TemporaryDirectory.
# See: https://github.com/python/cpython/pull/24793
#
# In our case the cleanup is redundent since windows handles clearing
# Appdata/Local/Temp at the os level anyway.
with contextlib.suppress(OSError):
with tempfile.TemporaryDirectory() as tmpdir:
with change_working_directory(tmpdir):
yield tmpdir
def get_custom_profiles_config(database_host, custom_schema):

View File

@@ -34,15 +34,3 @@ class TestStatements(DBTIntegrationTest):
self.assertEqual(len(results), 1)
self.assertTablesEqual("statement_actual", "statement_expected")
@use_profile("presto")
def test_presto_statements(self):
self.use_default_project({"seed-paths": [self.dir("seed")]})
results = self.run_dbt(["seed"])
self.assertEqual(len(results), 2)
results = self.run_dbt()
self.assertEqual(len(results), 1)
self.assertTablesEqual("statement_actual", "statement_expected")

View File

@@ -65,6 +65,9 @@ class TestSchemaFileConfigs(DBTIntegrationTest):
manifest = get_manifest()
model_id = 'model.test.model'
model_node = manifest.nodes[model_id]
meta_expected = {'company': 'NuMade', 'project': 'test', 'team': 'Core Team', 'owner': 'Julie Smith', 'my_attr': 'TESTING'}
self.assertEqual(model_node.meta, meta_expected)
self.assertEqual(model_node.config.meta, meta_expected)
model_tags = ['tag_1_in_model', 'tag_2_in_model', 'tag_in_project', 'tag_in_schema']
model_node_tags = model_node.tags.copy()
model_node_tags.sort()

View File

@@ -327,7 +327,7 @@ test:
]
self.run_dbt(['init'])
manager.assert_has_calls([
call.prompt('What is the desired project name?'),
call.prompt("Enter a name for your project (letters, digits, underscore)"),
call.prompt("Which database would you like to use?\n[1] postgres\n\n(Don't see the one you want? https://docs.getdbt.com/docs/available-adapters)\n\nEnter a number", type=click.INT),
call.prompt('host (hostname for the instance)', default=None, hide_input=False, type=None),
call.prompt('port', default=5432, hide_input=False, type=click.INT),
@@ -532,6 +532,48 @@ models:
+materialized: view
"""
@use_profile('postgres')
@mock.patch('click.confirm')
@mock.patch('click.prompt')
def test_postgres_init_invalid_project_name_cli(self, mock_prompt, mock_confirm):
manager = Mock()
manager.attach_mock(mock_prompt, 'prompt')
manager.attach_mock(mock_confirm, 'confirm')
os.remove('dbt_project.yml')
invalid_name = 'name-with-hyphen'
valid_name = self.get_project_name()
manager.prompt.side_effect = [
valid_name
]
self.run_dbt(['init', invalid_name, '-s'])
manager.assert_has_calls([
call.prompt("Enter a name for your project (letters, digits, underscore)"),
])
@use_profile('postgres')
@mock.patch('click.confirm')
@mock.patch('click.prompt')
def test_postgres_init_invalid_project_name_prompt(self, mock_prompt, mock_confirm):
manager = Mock()
manager.attach_mock(mock_prompt, 'prompt')
manager.attach_mock(mock_confirm, 'confirm')
os.remove('dbt_project.yml')
invalid_name = 'name-with-hyphen'
valid_name = self.get_project_name()
manager.prompt.side_effect = [
invalid_name, valid_name
]
self.run_dbt(['init', '-s'])
manager.assert_has_calls([
call.prompt("Enter a name for your project (letters, digits, underscore)"),
call.prompt("Enter a name for your project (letters, digits, underscore)"),
])
@use_profile('postgres')
@mock.patch('click.confirm')
@mock.patch('click.prompt')
@@ -546,20 +588,12 @@ models:
project_name = self.get_project_name()
manager.prompt.side_effect = [
project_name,
1,
'localhost',
5432,
'test_username',
'test_password',
'test_db',
'test_schema',
4,
]
# provide project name through the ini command
self.run_dbt(['init', '-s'])
manager.assert_has_calls([
call.prompt('What is the desired project name?')
call.prompt("Enter a name for your project (letters, digits, underscore)")
])
with open(os.path.join(self.test_root_dir, project_name, 'dbt_project.yml'), 'r') as f:

View File

@@ -0,0 +1,6 @@
{
"metadata": {
"dbt_schema_version": "https://schemas.getdbt.com/dbt/manifest/v3.json",
"dbt_version": "0.21.1"
}
}

View File

@@ -6,7 +6,7 @@ import string
import pytest
from dbt.exceptions import CompilationException
from dbt.exceptions import CompilationException, IncompatibleSchemaException
class TestModifiedState(DBTIntegrationTest):
@@ -36,7 +36,7 @@ class TestModifiedState(DBTIntegrationTest):
for entry in os.listdir(self.test_original_source_path):
src = os.path.join(self.test_original_source_path, entry)
tst = os.path.join(self.test_root_dir, entry)
if entry in {'models', 'seeds', 'macros'}:
if entry in {'models', 'seeds', 'macros', 'previous_state'}:
shutil.copytree(src, tst)
elif os.path.isdir(entry) or entry.endswith('.sql'):
os.symlink(src, tst)
@@ -202,3 +202,10 @@ class TestModifiedState(DBTIntegrationTest):
results, stdout = self.run_dbt_and_capture(['run', '--models', '+state:modified', '--state', './state'])
assert len(results) == 1
assert results[0].node.name == 'view_model'
@use_profile('postgres')
def test_postgres_previous_version_manifest(self):
# This tests that a different schema version in the file throws an error
with self.assertRaises(IncompatibleSchemaException) as exc:
results = self.run_dbt(['ls', '-s', 'state:modified', '--state', './previous_state'])
self.assertEqual(exc.CODE, 10014)

View File

@@ -1,2 +1,2 @@
select
* from {{ ref('customers') }} where customer_id > 100
* from {{ ref('customers') }} where first_name = '{{ macro_something() }}'

View File

@@ -13,3 +13,17 @@ select * from {{ ref('orders') }}
{% endsnapshot %}
{% snapshot orders2_snapshot %}
{{
config(
target_schema=schema,
strategy='check',
unique_key='id',
check_cols=['order_date'],
)
}}
select * from {{ ref('orders') }}
{% endsnapshot %}

View File

@@ -1,3 +1,4 @@
- add a comment
{% snapshot orders_snapshot %}
{{
@@ -8,7 +9,22 @@
check_cols=['status'],
)
}}
select * from {{ ref('orders') }}
{% endsnapshot %}
{% snapshot orders2_snapshot %}
{{
config(
target_schema=schema,
strategy='check',
unique_key='id',
check_cols=['order_date'],
)
}}
select * from {{ ref('orders') }}
{% endsnapshot %}

View File

@@ -0,0 +1,5 @@
{% macro macro_something() %}
{% do return('macro_something') %}
{% endmacro %}

View File

@@ -0,0 +1,5 @@
{% macro macro_something() %}
{% do return('some_name') %}
{% endmacro %}

View File

@@ -46,6 +46,7 @@ class BasePPTest(DBTIntegrationTest):
os.mkdir(os.path.join(self.test_root_dir, 'macros'))
os.mkdir(os.path.join(self.test_root_dir, 'analyses'))
os.mkdir(os.path.join(self.test_root_dir, 'snapshots'))
os.environ['DBT_PP_TEST'] = 'true'
@@ -332,6 +333,7 @@ class TestSources(BasePPTest):
results = self.run_dbt(["--partial-parse", "run"])
# Add a data test
self.copy_file('test-files/test-macro.sql', 'macros/test-macro.sql')
self.copy_file('test-files/my_test.sql', 'tests/my_test.sql')
results = self.run_dbt(["--partial-parse", "test"])
manifest = get_manifest()
@@ -339,6 +341,11 @@ class TestSources(BasePPTest):
test_id = 'test.test.my_test'
self.assertIn(test_id, manifest.nodes)
# Change macro that data test depends on
self.copy_file('test-files/test-macro2.sql', 'macros/test-macro.sql')
results = self.run_dbt(["--partial-parse", "test"])
manifest = get_manifest()
# Add an analysis
self.copy_file('test-files/my_analysis.sql', 'analyses/my_analysis.sql')
results = self.run_dbt(["--partial-parse", "run"])
@@ -496,10 +503,12 @@ class TestSnapshots(BasePPTest):
manifest = get_manifest()
snapshot_id = 'snapshot.test.orders_snapshot'
self.assertIn(snapshot_id, manifest.nodes)
snapshot2_id = 'snapshot.test.orders2_snapshot'
self.assertIn(snapshot2_id, manifest.nodes)
# run snapshot
results = self.run_dbt(["--partial-parse", "snapshot"])
self.assertEqual(len(results), 1)
self.assertEqual(len(results), 2)
# modify snapshot
self.copy_file('test-files/snapshot2.sql', 'snapshots/snapshot.sql')

View File

@@ -37,6 +37,7 @@ class TestDocs(DBTIntegrationTest):
os.mkdir(os.path.join(self.test_root_dir, 'macros'))
os.mkdir(os.path.join(self.test_root_dir, 'analyses'))
os.mkdir(os.path.join(self.test_root_dir, 'snapshots'))
os.environ['DBT_PP_TEST'] = 'true'
@use_profile('postgres')

View File

@@ -45,6 +45,7 @@ class BasePPTest(DBTIntegrationTest):
os.mkdir(os.path.join(self.test_root_dir, 'macros'))
os.mkdir(os.path.join(self.test_root_dir, 'analyses'))
os.mkdir(os.path.join(self.test_root_dir, 'snapshots'))
os.environ['DBT_PP_TEST'] = 'true'

View File

@@ -2,6 +2,7 @@ from dbt.exceptions import CompilationException, ParsingException
from dbt.contracts.graph.manifest import Manifest
from dbt.contracts.files import ParseFileType
from dbt.contracts.results import TestStatus
from dbt.logger import SECRET_ENV_PREFIX
from dbt.parser.partial import special_override_macros
from test.integration.base import DBTIntegrationTest, use_profile, normalize, get_manifest
import shutil
@@ -41,7 +42,7 @@ class BasePPTest(DBTIntegrationTest):
os.mkdir(os.path.join(self.test_root_dir, 'tests'))
os.mkdir(os.path.join(self.test_root_dir, 'macros'))
os.mkdir(os.path.join(self.test_root_dir, 'seeds'))
os.environ['DBT_PP_TEST'] = 'true'
class EnvVarTest(BasePPTest):
@@ -300,6 +301,7 @@ class ProjectEnvVarTest(BasePPTest):
# cleanup
del os.environ['ENV_VAR_NAME']
class ProfileEnvVarTest(BasePPTest):
@property
@@ -352,3 +354,63 @@ class ProfileEnvVarTest(BasePPTest):
manifest = get_manifest()
self.assertNotEqual(env_vars_checksum, manifest.state_check.profile_env_vars_hash.checksum)
class ProfileSecretEnvVarTest(BasePPTest):
@property
def profile_config(self):
# Need to set these here because the base integration test class
# calls 'load_config' before the tests are run.
# Note: only the specified profile is rendered, so there's no
# point it setting env_vars in non-used profiles.
# user is secret and password is not. postgres on macos doesn't care if the password
# changes so we have to change the user. related: https://github.com/dbt-labs/dbt-core/pull/4250
os.environ[SECRET_ENV_PREFIX + 'USER'] = 'root'
os.environ['ENV_VAR_PASS'] = 'password'
return {
'config': {
'send_anonymous_usage_stats': False
},
'test': {
'outputs': {
'dev': {
'type': 'postgres',
'threads': 1,
'host': self.database_host,
'port': 5432,
'user': "root",
'pass': "password",
'user': "{{ env_var('DBT_ENV_SECRET_USER') }}",
'pass': "{{ env_var('ENV_VAR_PASS') }}",
'dbname': 'dbt',
'schema': self.unique_schema()
},
},
'target': 'dev'
}
}
@use_profile('postgres')
def test_postgres_profile_secret_env_vars(self):
# Initial run
os.environ[SECRET_ENV_PREFIX + 'USER'] = 'root'
os.environ['ENV_VAR_PASS'] = 'password'
self.setup_directories()
self.copy_file('test-files/model_one.sql', 'models/model_one.sql')
results = self.run_dbt(["run"])
manifest = get_manifest()
env_vars_checksum = manifest.state_check.profile_env_vars_hash.checksum
# Change a secret var, it shouldn't register because we shouldn't save secrets.
os.environ[SECRET_ENV_PREFIX + 'USER'] = 'boop'
# this dbt run is going to fail because the password isn't actually the right one,
# but that doesn't matter because we just want to see if the manifest has included
# the secret in the hash of environment variables.
(results, log_output) = self.run_dbt_and_capture(["run"], expect_pass=False)
# I020 is the event code for "env vars used in profiles.yml have changed"
self.assertFalse('I020' in log_output)
manifest = get_manifest()
self.assertEqual(env_vars_checksum, manifest.state_check.profile_env_vars_hash.checksum)

204
test/interop/log_parsing/Cargo.lock generated Normal file
View File

@@ -0,0 +1,204 @@
# This file is automatically @generated by Cargo.
# It is not intended for manual editing.
version = 3
[[package]]
name = "autocfg"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cdb031dd78e28731d87d56cc8ffef4a8f36ca26c38fe2de700543e627f8a464a"
[[package]]
name = "chrono"
version = "0.4.19"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "670ad68c9088c2a963aaa298cb369688cf3f9465ce5e2d4ca10e6e0098a1ce73"
dependencies = [
"libc",
"num-integer",
"num-traits",
"serde",
"time",
"winapi",
]
[[package]]
name = "itoa"
version = "0.4.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b71991ff56294aa922b450139ee08b3bfc70982c6b2c7562771375cf73542dd4"
[[package]]
name = "libc"
version = "0.2.108"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8521a1b57e76b1ec69af7599e75e38e7b7fad6610f037db8c79b127201b5d119"
[[package]]
name = "log_parsing"
version = "0.1.0"
dependencies = [
"chrono",
"serde",
"serde_json",
"walkdir",
]
[[package]]
name = "num-integer"
version = "0.1.44"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d2cc698a63b549a70bc047073d2949cce27cd1c7b0a4a862d08a8031bc2801db"
dependencies = [
"autocfg",
"num-traits",
]
[[package]]
name = "num-traits"
version = "0.2.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9a64b1ec5cda2586e284722486d802acf1f7dbdc623e2bfc57e65ca1cd099290"
dependencies = [
"autocfg",
]
[[package]]
name = "proc-macro2"
version = "1.0.32"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ba508cc11742c0dc5c1659771673afbab7a0efab23aa17e854cbab0837ed0b43"
dependencies = [
"unicode-xid",
]
[[package]]
name = "quote"
version = "1.0.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "38bc8cc6a5f2e3655e0899c1b848643b2562f853f114bfec7be120678e3ace05"
dependencies = [
"proc-macro2",
]
[[package]]
name = "ryu"
version = "1.0.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3c9613b5a66ab9ba26415184cfc41156594925a9cf3a2057e57f31ff145f6568"
[[package]]
name = "same-file"
version = "1.0.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "93fc1dc3aaa9bfed95e02e6eadabb4baf7e3078b0bd1b4d7b6b0b68378900502"
dependencies = [
"winapi-util",
]
[[package]]
name = "serde"
version = "1.0.130"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f12d06de37cf59146fbdecab66aa99f9fe4f78722e3607577a5375d66bd0c913"
dependencies = [
"serde_derive",
]
[[package]]
name = "serde_derive"
version = "1.0.130"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d7bc1a1ab1961464eae040d96713baa5a724a8152c1222492465b54322ec508b"
dependencies = [
"proc-macro2",
"quote",
"syn",
]
[[package]]
name = "serde_json"
version = "1.0.72"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d0ffa0837f2dfa6fb90868c2b5468cad482e175f7dad97e7421951e663f2b527"
dependencies = [
"itoa",
"ryu",
"serde",
]
[[package]]
name = "syn"
version = "1.0.82"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8daf5dd0bb60cbd4137b1b587d2fc0ae729bc07cf01cd70b36a1ed5ade3b9d59"
dependencies = [
"proc-macro2",
"quote",
"unicode-xid",
]
[[package]]
name = "time"
version = "0.1.44"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6db9e6914ab8b1ae1c260a4ae7a49b6c5611b40328a735b21862567685e73255"
dependencies = [
"libc",
"wasi",
"winapi",
]
[[package]]
name = "unicode-xid"
version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8ccb82d61f80a663efe1f787a51b16b5a51e3314d6ac365b08639f52387b33f3"
[[package]]
name = "walkdir"
version = "2.3.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "808cf2735cd4b6866113f648b791c6adc5714537bc222d9347bb203386ffda56"
dependencies = [
"same-file",
"winapi",
"winapi-util",
]
[[package]]
name = "wasi"
version = "0.10.0+wasi-snapshot-preview1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1a143597ca7c7793eff794def352d41792a93c481eb1042423ff7ff72ba2c31f"
[[package]]
name = "winapi"
version = "0.3.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5c839a674fcd7a98952e593242ea400abe93992746761e38641405d28b00f419"
dependencies = [
"winapi-i686-pc-windows-gnu",
"winapi-x86_64-pc-windows-gnu",
]
[[package]]
name = "winapi-i686-pc-windows-gnu"
version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ac3b87c63620426dd9b991e5ce0329eff545bccbbb34f3be09ff6fb6ab51b7b6"
[[package]]
name = "winapi-util"
version = "0.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "70ec6ce85bb158151cae5e5c87f95a8e97d2c0c4b001223f33a334e3ce5de178"
dependencies = [
"winapi",
]
[[package]]
name = "winapi-x86_64-pc-windows-gnu"
version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "712e227841d057c1ee1cd2fb22fa7e5a5461ae8e48fa2ca79ec42cfc1931183f"

View File

@@ -0,0 +1,10 @@
[package]
name = "log_parsing"
version = "0.1.0"
edition = "2018"
[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_json = { version = "1.0" }
chrono = { version = "0.4", features = ["serde"] }
walkdir = "2"

View File

@@ -0,0 +1,260 @@
use chrono::{DateTime, Utc};
use serde::{Deserialize, Serialize};
use std::env;
use std::error::Error;
use std::fs::File;
use std::io::{self, BufRead};
use walkdir::WalkDir;
// Applies schema tests to file input
// if these fail, we either have a problem in dbt that needs to be resolved
// or we have changed our interface and the log_version should be bumped in dbt,
// modeled appropriately here, and publish new docs for the new log_version.
fn main() -> Result<(), Box<dyn Error>> {
let log_name = "dbt.log";
let path = env::var("LOG_DIR").expect("must pass absolute log path to tests with env var `LOG_DIR=/logs/live/here/`");
println!("Looking for files named `{}` in {}", log_name, path);
let lines: Vec<String> = get_input(&path, log_name)?;
println!("collected {} log lines.", lines.len());
println!("");
println!("testing type-level schema compliance by deserializing each line...");
let log_lines: Vec<LogLine> = deserialized_input(&lines)
.map_err(|e| format!("schema test failure: json doesn't match type definition\n{}", e))?;
println!("Done.");
println!("");
println!("because we skip non-json log lines, there are {} collected values to test.", log_lines.len());
println!("");
// make sure when we read a string in then output it back to a string the two strings
// contain all the same key-value pairs.
println!("testing serialization loop to make sure all key-value pairs are accounted for");
test_deserialize_serialize_is_unchanged(&lines);
println!("Done.");
println!("");
// make sure each log_line contains the values we expect
println!("testing that the field values in each log line are expected");
for log_line in log_lines {
log_line.value_test()
}
println!("Done.");
Ok(())
}
// each nested type of LogLine should define its own value_test function
// that asserts values are within an expected set of values when possible.
trait ValueTest {
fn value_test(&self) -> ();
}
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq)]
struct LogLine {
log_version: isize,
r#type: String,
code: String,
#[serde(with = "custom_date_format")]
ts: DateTime<Utc>,
pid: isize,
msg: String,
level: String,
invocation_id: String,
thread_name: String,
data: serde_json::Value, // TODO be more specific
node_info: serde_json::Value, // TODO be more specific
}
impl ValueTest for LogLine {
fn value_test(&self){
assert_eq!(
self.log_version, 1,
"The log version changed. Be sure this was intentional."
);
assert_eq!(
self.r#type,
"log_line".to_owned(),
"The type value has changed. If this is intentional, bump the log version"
);
assert!(
["debug", "info", "warn", "error"]
.iter()
.any(|level| **level == self.level),
"log level had unexpected value {}",
self.level
);
}
}
// logs output timestamps like this: "2021-11-30T12:31:04.312814Z"
// which is so close to the default except for the decimal.
// this requires handling the date with "%Y-%m-%dT%H:%M:%S%.6f" which requires this
// boilerplate-looking module.
mod custom_date_format {
use chrono::{NaiveDateTime, DateTime, Utc};
use serde::{self, Deserialize, Deserializer, Serializer};
const FORMAT: &'static str = "%Y-%m-%dT%H:%M:%S%.6fZ";
pub fn serialize<S>(date: &DateTime<Utc>, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let s = format!("{}", date.format(FORMAT));
serializer.serialize_str(&s)
}
pub fn deserialize<'de, D>(deserializer: D) -> Result<DateTime<Utc>, D::Error>
where
D: Deserializer<'de>,
{
let s = String::deserialize(deserializer)?;
Ok(DateTime::<Utc>::from_utc(NaiveDateTime::parse_from_str(&s, FORMAT).map_err(serde::de::Error::custom)?, Utc))
}
}
// finds all files in any subdirectory of this path with this name. returns the contents
// of each file line by line as one continuous structure. No distinction between files.
fn get_input(path: &str, file_name: &str) -> Result<Vec<String>, String> {
WalkDir::new(path)
.follow_links(true)
.into_iter()
// filters out all the exceptions encountered on this walk silently
.filter_map(|e| e.ok())
// walks through each file and returns the contents if the filename matches
.filter_map(|e| {
let f_name = e.file_name().to_string_lossy();
if f_name.ends_with(file_name) {
let contents = File::open(e.path())
.map_err(|e| format!("Something went wrong opening the log file {}\n{}", f_name, e))
.and_then(|file| {
io::BufReader::new(file)
.lines()
.map(|l| {
l.map_err(|e| format!("Something went wrong reading lines of the log file {}\n{}", f_name, e))
})
.collect::<Result<Vec<String>, String>>()
});
Some(contents)
} else {
None
}
})
.collect::<Result<Vec<Vec<String>>, String>>()
.map(|vv| vv.concat())
}
// attemps to deserialize the strings into LogLines. If the string isn't valid
// json it skips it instead of failing. This is so that any tests that generate
// non-json logs won't break the schema test.
fn deserialized_input(log_lines: &[String]) -> serde_json::Result<Vec<LogLine>> {
log_lines
.into_iter()
// if the log line isn't valid json format, toss it
.filter(|log_line| serde_json::from_str::<serde_json::Value>(log_line).is_ok())
// attempt to deserialize into our LogLine type
.map(|log_line| serde_json::from_str::<LogLine>(log_line))
.collect()
}
// turn a String into a LogLine and back into a String returning both Strings so
// they can be compared
fn deserialize_serialize_loop(
log_lines: &[String],
) -> serde_json::Result<Vec<(String, String)>> {
log_lines
.into_iter()
.map(|log_line| {
serde_json::from_str::<LogLine>(log_line).and_then(|parsed| {
serde_json::to_string(&parsed).map(|json| (log_line.clone(), json))
})
})
.collect()
}
// make sure when we read a string in then output it back to a string the two strings
// contain all the same key-value pairs.
fn test_deserialize_serialize_is_unchanged(lines: &[String]) {
let objects: Result<Vec<(serde_json::Value, serde_json::Value)>, serde_json::Error> =
deserialize_serialize_loop(lines).and_then(|v| {
v.into_iter()
.map(|(s0, s1)| {
serde_json::from_str::<serde_json::Value>(&s0).and_then(|s0v| {
serde_json::from_str::<serde_json::Value>(&s1).map(|s1v| (s0v, s1v))
})
})
.collect()
});
match objects {
Err(e) => assert!(false, "{}", e),
Ok(v) => {
for pair in v {
match pair {
(
serde_json::Value::Object(original),
serde_json::Value::Object(looped),
) => {
// looping through each key of each json value gives us meaningful failure messages
// instead of "this big string" != "this other big string"
for (key, value) in original.clone() {
let looped_val = looped.get(&key);
assert_eq!(
looped_val,
Some(&value),
"original key value ({}, {}) expected in re-serialized result",
key,
value
)
}
for (key, value) in looped.clone() {
let original_val = original.get(&key);
assert_eq!(
original_val,
Some(&value),
"looped key value ({}, {}) not found in original result",
key,
value
)
}
}
_ => assert!(false, "not comparing json objects"),
}
}
}
}
}
#[cfg(test)]
mod tests {
use crate::*;
const LOG_LINE: &str = r#"{"code": "Z023", "data": {"stats": {"error": 0, "pass": 3, "skip": 0, "total": 3, "warn": 0}}, "invocation_id": "f1e1557c-4f9d-4053-bb50-572cbbf2ca64", "level": "info", "log_version": 1, "msg": "Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3", "node_info": {}, "pid": 75854, "thread_name": "MainThread", "ts": "2021-12-03T01:32:38.334601Z", "type": "log_line"}"#;
#[test]
fn test_basic_loop() {
assert!(deserialize_serialize_loop(&[LOG_LINE.to_owned()]).is_ok())
}
#[test]
fn test_values() {
assert!(deserialized_input(&[LOG_LINE.to_owned()]).map(|v| {
v.into_iter().map(|ll| ll.value_test())
}).is_ok())
}
#[test]
fn test_values_loop() {
test_deserialize_serialize_is_unchanged(&[LOG_LINE.to_owned()]);
}
}

View File

@@ -154,3 +154,28 @@ class TestAgateHelper(unittest.TestCase):
for i, row in enumerate(tbl):
self.assertEqual(list(row), expected[i])
def test_nocast_bool_01(self):
# True and False values should not be cast to 1 and 0, and vice versa
# See: https://github.com/dbt-labs/dbt-core/issues/4511
column_names = ['a', 'b']
result_set = [
{'a': True, 'b': 1},
{'a': False, 'b': 0},
]
tbl = agate_helper.table_from_data_flat(data=result_set, column_names=column_names)
self.assertEqual(len(tbl), len(result_set))
assert isinstance(tbl.column_types[0], agate.data_types.Boolean)
assert isinstance(tbl.column_types[1], agate.data_types.Number)
expected = [
[True, Decimal(1)],
[False, Decimal(0)],
]
for i, row in enumerate(tbl):
self.assertEqual(list(row), expected[i])

View File

@@ -0,0 +1,59 @@
import functools
import pytest
from requests.exceptions import RequestException
from dbt.exceptions import ConnectionException
from dbt.utils import _connection_exception_retry
def no_retry_fn():
return "success"
class TestNoRetries:
def test_no_retry(self):
fn_to_retry = functools.partial(no_retry_fn)
result = _connection_exception_retry(fn_to_retry, 3)
expected = "success"
assert result == expected
def no_success_fn():
raise RequestException("You'll never pass")
return "failure"
class TestMaxRetries:
def test_no_retry(self):
fn_to_retry = functools.partial(no_success_fn)
with pytest.raises(ConnectionException):
_connection_exception_retry(fn_to_retry, 3)
def single_retry_fn():
global counter
if counter == 0:
counter += 1
raise RequestException("You won't pass this one time")
elif counter == 1:
counter += 1
return "success on 2"
return "How did we get here?"
class TestSingleRetry:
def test_no_retry(self):
global counter
counter = 0
fn_to_retry = functools.partial(single_retry_fn)
result = _connection_exception_retry(fn_to_retry, 3)
expected = "success on 2"
# We need to test the return value here, not just that it did not throw an error.
# If the value is not being passed it causes cryptic errors
assert result == expected
assert counter == 2

View File

@@ -1,4 +1,5 @@
import requests
import tarfile
import unittest
from dbt.exceptions import ConnectionException
@@ -11,21 +12,21 @@ class TestCoreDbtUtils(unittest.TestCase):
connection_exception_retry(lambda: Counter._add(), 5)
self.assertEqual(1, counter)
def test_connection_exception_retry_success_requests_exception(self):
Counter._reset()
connection_exception_retry(lambda: Counter._add_with_requests_exception(), 5)
self.assertEqual(2, counter) # 2 = original attempt returned None, plus 1 retry
def test_connection_exception_retry_max(self):
Counter._reset()
with self.assertRaises(ConnectionException):
connection_exception_retry(lambda: Counter._add_with_exception(), 5)
self.assertEqual(6, counter) # 6 = original attempt plus 5 retries
def test_connection_exception_retry_success(self):
def test_connection_exception_retry_success_failed_untar(self):
Counter._reset()
connection_exception_retry(lambda: Counter._add_with_limited_exception(), 5)
self.assertEqual(2, counter) # 2 = original attempt plus 1 retry
def test_connection_exception_retry_success_none_response(self):
Counter._reset()
connection_exception_retry(lambda: Counter._add_with_none_exception(), 5)
self.assertEqual(2, counter) # 2 = original attempt returned None, plus 1 retry
connection_exception_retry(lambda: Counter._add_with_untar_exception(), 5)
self.assertEqual(2, counter) # 2 = original attempt returned ReadError, plus 1 retry
counter:int = 0
@@ -33,20 +34,23 @@ class Counter():
def _add():
global counter
counter+=1
# All exceptions that Requests explicitly raises inherit from
# requests.exceptions.RequestException so we want to make sure that raises plus one exception
# that inherit from it for sanity
def _add_with_requests_exception():
global counter
counter+=1
if counter < 2:
raise requests.exceptions.RequestException
def _add_with_exception():
global counter
counter+=1
raise requests.exceptions.ConnectionError
def _add_with_limited_exception():
def _add_with_untar_exception():
global counter
counter+=1
if counter < 2:
raise requests.exceptions.ConnectionError
def _add_with_none_exception():
global counter
counter+=1
if counter < 2:
raise requests.exceptions.ContentDecodingError
raise tarfile.ReadError
def _reset():
global counter
counter = 0

Some files were not shown because too many files have changed in this diff Show More