Compare commits

...

49 Commits

Author SHA1 Message Date
Nathaniel May
cdb78d0270 Revert "Add Performance Regression Testing [Rust]" 2021-08-11 10:45:31 -04:00
Nathaniel May
1a984601ee Merge pull request #3602 from dbt-labs/performance-regression-testing
Add Performance Regression Testing [Rust]
2021-08-11 10:44:51 -04:00
Jeremy Cohen
454168204c Add build RPC method (#3674)
* Add build RPC method

* Add rpc test, some required flags

* Fix flake8

* PR feedback

* Update changelog [skip ci]

* Do not skip CI when rebasing
2021-08-10 10:51:43 -04:00
Drew Banin
43642956a2 Serialize Undefined values to JSON for rpc requests (#3687)
* (#3464) Serialize Undefined values to JSON for rpc requests

* Update changelog, fix typo
2021-08-09 21:26:09 -04:00
leahwicz
e7b8488be8 Remove converter.py since not used anymore (#3699) 2021-08-05 15:27:56 -04:00
Jeremy Cohen
0efaaf7daf Fix typo [skip ci] 2021-08-04 09:50:11 -04:00
Drew Banin
9ae7d68260 Merge pull request #3686 from dbt-labs/fix/cleanup-audit-integration-tests
Fix: Drop audit schema tests in tearDown for test suite
2021-08-03 19:54:36 -04:00
Github Build Bot
45fe76eef4 Merge remote-tracking branch 'origin/releases/0.21.0b1' into develop 2021-08-03 18:09:56 +00:00
Github Build Bot
ea772ae419 Release dbt v0.21.0b1 2021-08-03 17:30:32 +00:00
Drew Banin
c68fca7937 Fix: Drop audit schema tests in tearDown for test suite 2021-08-03 13:24:54 -04:00
Jeremy Cohen
159e79ee6b Update changelog in advance of v0.21.0b1 (#3678)
* Fixup Changelog

* More updates [skip ci]
2021-08-02 20:08:22 -04:00
leahwicz
57783bb5f6 Adding issue templates for different release types (#3644)
Co-authored-by: Kyle Wigley <kyle@fishtownanalytics.com>
Co-authored-by: Jeremy Cohen <jeremy@fishtownanalytics.com>
2021-08-02 12:50:49 -04:00
Nathaniel May
d73ee588e5 Merge pull request #3637 from dbt-labs/experimental-parser-fix
make experimental parser respect config merge behavior
2021-08-02 10:03:42 -04:00
Nathaniel May
40089d710b experimental parser respects config merge behavior 2021-08-02 09:38:30 -04:00
Jeremy Cohen
6ec61950eb Handle exception from tracker.flush() (#3661) 2021-08-02 08:25:41 -04:00
Gerda Shank
72c831a80a Merge pull request #3659 from dbt-labs/pp_internal_macro_processing
[#3636] Check for unique_ids when recursively removing macros
2021-07-30 15:34:14 -04:00
Gerda Shank
929931a26a Merge pull request #3654 from dbt-labs/change_config_call_handling
Switch from config_call list to config_call_dict dictionary
2021-07-30 14:08:30 -04:00
Gerda Shank
577e2438c1 [#3636] Check for unique_ids when recursively removing macros 2021-07-30 14:01:40 -04:00
Kyle Wigley
2679792199 Add tracking event for full re-parse reasoning (#3652)
* add tracking event for full reparse reason

* update changelog
2021-07-30 09:39:09 -04:00
Kyle Wigley
2adf982991 update links to dbt repo (#3521) 2021-07-30 08:46:58 -04:00
Gerda Shank
1fb4a7f428 Switch from config_call list to config_call_dict dictionary 2021-07-29 18:46:59 -04:00
Kyle Wigley
30e72bc5e2 Use SchemaParser render context to render test configs (#3646)
* use available context when rendering test configs

* add test

* update changelog
2021-07-29 12:59:48 -04:00
Jeremy Cohen
35645a7233 Include dbt-docs changes for 0.20.1-rc1 (#3643) 2021-07-29 09:56:04 -04:00
Gerda Shank
d583c8d737 Merge pull request #3632 from dbt-labs/pp_delete_schema_macro_patch
[#3627] Improve findability of macro_patches, schedule right macro file for processing
2021-07-28 17:49:27 -04:00
Gerda Shank
a83f00c594 [#3627] Improve findability of macro_patches, schedule right macro file
for processing
2021-07-28 17:27:42 -04:00
Daniele Frigo
c448702c1b Use old_relation for renaming in default materializations (#3547)
* table and view materializations should rename from old_relation to manage changes from view to table and reverse

* edited changelog

* edited changelog

* Update CHANGELOG.md

Co-authored-by: Jeremy Cohen <jtcohen6@gmail.com>

Co-authored-by: Jeremy Cohen <jtcohen6@gmail.com>
2021-07-28 06:59:27 -04:00
Niall Woodward
558a6a03ac Fix PR link in changelog (#3639)
Fix a typo introduced in https://github.com/dbt-labs/dbt/pull/3624
2021-07-28 06:51:45 -04:00
Niall Woodward
52ec7907d3 dbt deps prerelease install bugs + add install-prerelease parameter to packages.yml (#3624)
* Fix dbt deps prerelease install bugs

* Add install-prerelease parameter to hub packages in packages.yml
2021-07-27 21:59:46 -04:00
Jeremy Cohen
792f39a888 Snowflake: no transactions, except for DML (#3510)
* Rm Snowflake txnal logic. Explicit for DML

* Be less clever. Update create_or_replace_view()

* Seed DML as well

* Changelog entry

* Fix unit test

* One semicolon can change the world
2021-07-27 18:13:35 -04:00
Gerda Shank
16264f58c1 Merge pull request #3621 from dbt-labs/pp_macro_link_processing_error
[#3584] Partial parsing: handle source tests when changing test macro
2021-07-27 16:59:26 -04:00
Nathaniel May
2317c0c3c8 Merge pull request #3630 from dbt-labs/nate-3568
fix awkward exception being raised by a yml file with all comments
2021-07-27 16:50:56 -04:00
Gerda Shank
3c09ab9736 [#3584] Partial parsing: handle source tests when changing test macro 2021-07-27 16:34:23 -04:00
Gerda Shank
f10dc0e1b3 Merge pull request #3618 from dbt-labs/pp_yaml_version
[#3567] Fix partial parsing error with version key if previous file is empty
2021-07-27 16:30:06 -04:00
leahwicz
634bc41d8a Secret scrubbing for env variables (#3617)
Co-authored-by: Jeremy Cohen <jeremy@fishtownanalytics.com>
2021-07-27 16:06:10 -04:00
Gerda Shank
d7ea3648c6 [#3567] Fix partial parsing error with version key if previous file is
empty
2021-07-27 15:38:52 -04:00
Gerda Shank
e5c8e19ff2 Merge pull request #3619 from dbt-labs/model_config_iterator
[#3573] Put back config iterator for backwards compatibility
2021-07-27 15:34:51 -04:00
Nathaniel May
93cf1f085f handle None return value from yaml loading 2021-07-27 10:59:27 -04:00
Gerda Shank
a84f824a44 [#3573] Put back config iterator for backwards compatibility 2021-07-26 17:56:35 -04:00
Kyle Wigley
9c58f3465b Fix flaky test related to tracking events (#3604)
* skip all tracking event testing

* Turn off tracking in tests that hits model parsing code path
fix other random test that fails because global tracking.current_user exists but is null

* pytest did not respect skip mark

* fix gh actions
2021-07-26 16:55:16 -04:00
Gerda Shank
0e3778132b Merge pull request #3620 from dbt-labs/pp_already_removed_node
If SQL file already scheduled for parsing, don't reprocess
2021-07-26 15:49:10 -04:00
Jeremy Cohen
72722635f2 Fix error handling in dbt build (#3608)
* RunTask -> BuildTask

* Add test, changelog entry
2021-07-25 22:15:13 -04:00
Gerda Shank
a4c7c7fc55 If SQL file already scheduled for parsing, don't reprocess 2021-07-24 15:43:54 -04:00
Nathaniel May
2bad73eead Merge pull request #3610 from dbt-labs/derp-fix
fixing typo in test
2021-07-23 13:14:55 -04:00
Nathaniel May
67c194dcd1 fixing typo in test 2021-07-22 09:53:26 -04:00
matt-winkler
bd7010678a Feature: on_schema_change for incremental models (#3387)
* detect and act on schema changes

* update incremental helpers code

* update changelog

* fix error in diff_columns from testing

* abstract code a bit further

* address matching names vs. data types

* Update CHANGELOG.md

Co-authored-by: Jeremy Cohen <jeremy@fishtownanalytics.com>

* updates from Jeremy's feedback

* multi-column add / remove with full_refresh

* simple changes from JC's feedback

* updated for snowflake

* reorganize postgres code

* reorganize approach

* updated full refresh trigger logic

* fixed unintentional wipe behavior

* catch final else condition

* remove WHERE string replace

* touch ups

* port core to snowflake

* added bigquery code

* updated impacted unit tests

* updates from linting tests

* updates from linting again

* snowflake updates from further testing

* fix logging

* clean up incremental logic

* updated for bigquery

* update postgres with new strategy

* update nodeconfig

* starting integration tests

* integration test for ignore case

* add test for append_new_columns

* add integration test for sync

* remove extra tests

* add unique key and snowflake test

* move incremental integration test dir

* update integration tests

* update integration tests

* Suggestions for #3387 (#3558)

* PR feedback: rationalize macros + logging, fix + expand tests

* Rm alter_column_types, always true for sync_all_columns

* update logging and integration test on sync

* update integration tests

* test fix SF integration tests

Co-authored-by: Matt Winkler <matt.winkler@fishtownanalytics.com>

* rename integration test folder

* Update core/dbt/include/global_project/macros/materializations/incremental/incremental.sql

Accept Jeremy's suggested change

Co-authored-by: Jeremy Cohen <jeremy@fishtownanalytics.com>

* Update changelog [skip ci]

Co-authored-by: Jeremy Cohen <jeremy@fishtownanalytics.com>
2021-07-21 15:49:19 -04:00
leahwicz
9f716b31b3 Moving unit tests into separate workflow (#3588)
* Moving unit tests into separate workflow

* Fixing CircleCI error
2021-07-21 12:35:04 -04:00
Kyle Wigley
3dd486d8fa Source freshness task node selection and cli command parity (#3554)
* cli: add selection args for source freshness command

* rename command to `source freshness` and maintain alias to old command

* update and add tests for source freshness command and node selection

* update changelog, add comments

* fix formatting

* update changelog
2021-07-21 10:31:40 -04:00
Jeremy Cohen
33217891ca Refactor relationships test to support where config (#3583)
* Rewrite relationships with CTEs

* Update changelog PR num [skip ci]
2021-07-20 19:28:09 -04:00
dependabot[bot]
1d37c4e555 Update snowflake-connector-python[secure-local-storage] requirement (#3594)
Updates the requirements on [snowflake-connector-python[secure-local-storage]](https://github.com/snowflakedb/snowflake-connector-python) to permit the latest version.
- [Release notes](https://github.com/snowflakedb/snowflake-connector-python/releases)
- [Commits](https://github.com/snowflakedb/snowflake-connector-python/commits)

---
updated-dependencies:
- dependency-name: snowflake-connector-python[secure-local-storage]
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-07-20 14:01:35 -04:00
165 changed files with 3319 additions and 2724 deletions

View File

@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.21.0a1
current_version = 0.21.0b1
parse = (?P<major>\d+)
\.(?P<minor>\d+)
\.(?P<patch>\d+)
@@ -47,3 +47,4 @@ first_value = 1
[bumpversion:file:plugins/snowflake/dbt/adapters/snowflake/__version__.py]
[bumpversion:file:plugins/bigquery/dbt/adapters/bigquery/__version__.py]

View File

@@ -1,22 +1,12 @@
version: 2.1
jobs:
unit:
build-wheels:
docker: &test_only
- image: fishtownanalytics/test-container:12
environment:
DBT_INVOCATION_ENV: circle
DOCKER_TEST_DATABASE_HOST: "database"
TOX_PARALLEL_NO_SPINNER: 1
steps:
- checkout
- run: tox -p -e py36,py37,py38
lint:
docker: *test_only
steps:
- checkout
- run: tox -e mypy,flake8 -- -v
build-wheels:
docker: *test_only
steps:
- checkout
- run:
@@ -99,24 +89,12 @@ workflows:
version: 2
test-everything:
jobs:
- lint
- unit
- integration-postgres:
requires:
- unit
- integration-redshift:
requires:
- unit
- integration-bigquery:
requires:
- unit
- integration-snowflake:
requires:
- unit
- integration-postgres
- integration-redshift
- integration-bigquery
- integration-snowflake
- build-wheels:
requires:
- lint
- unit
- integration-postgres
- integration-redshift
- integration-bigquery

View File

@@ -0,0 +1,27 @@
---
name: Beta minor version release
about: Creates a tracking checklist of items for a Beta minor version release
title: "[Tracking] v#.##.#B# release "
labels: 'release'
assignees: ''
---
### Release Core
- [ ] [Engineering] Follow [dbt-release workflow](https://www.notion.so/dbtlabs/Releasing-b97c5ea9a02949e79e81db3566bbc8ef#03ff37da697d4d8ba63d24fae1bfa817)
- [ ] [Engineering] Verify new release branch is created in the repo
- [ ] [Product] Finalize migration guide (next.docs.getdbt.com)
### Release Cloud
- [ ] [Engineering] Create a platform issue to update dbt Cloud and verify it is completed. [Example issue](https://github.com/dbt-labs/dbt-cloud/issues/3481)
- [ ] [Engineering] Determine if schemas have changed. If so, generate new schemas and push to schemas.getdbt.com
### Announce
- [ ] [Product] Announce in dbt Slack
### Post-release
- [ ] [Engineering] [Bump plugin versions](https://www.notion.so/dbtlabs/Releasing-b97c5ea9a02949e79e81db3566bbc8ef#f01854e8da3641179fbcbe505bdf515c) (dbt-spark + dbt-presto), add compatibility as needed
- [ ] [Spark](https://github.com/dbt-labs/dbt-spark)
- [ ] [Presto](https://github.com/dbt-labs/dbt-presto)
- [ ] [Engineering] Create a platform issue to update dbt-spark versions to dbt Cloud. [Example issue](https://github.com/dbt-labs/dbt-cloud/issues/3481)
- [ ] [Engineering] Create an epic for the RC release

View File

@@ -0,0 +1,28 @@
---
name: Final minor version release
about: Creates a tracking checklist of items for a final minor version release
title: "[Tracking] v#.##.# final release "
labels: 'release'
assignees: ''
---
### Release Core
- [ ] [Engineering] Verify all necessary changes exist on the release branch
- [ ] [Engineering] Follow [dbt-release workflow](https://www.notion.so/dbtlabs/Releasing-b97c5ea9a02949e79e81db3566bbc8ef#03ff37da697d4d8ba63d24fae1bfa817)
- [ ] [Product] Merge `next` into `current` for docs.getdbt.com
### Release Cloud
- [ ] [Engineering] Create a platform issue to update dbt Cloud and verify it is completed. [Example issue](https://github.com/dbt-labs/dbt-cloud/issues/3481)
- [ ] [Engineering] Determine if schemas have changed. If so, generate new schemas and push to schemas.getdbt.com
### Announce
- [ ] [Product] Update discourse
- [ ] [Product] Announce in dbt Slack
### Post-release
- [ ] [Engineering] [Bump plugin versions](https://www.notion.so/dbtlabs/Releasing-b97c5ea9a02949e79e81db3566bbc8ef#f01854e8da3641179fbcbe505bdf515c) (dbt-spark + dbt-presto), add compatibility as needed
- [ ] [Spark](https://github.com/dbt-labs/dbt-spark)
- [ ] [Presto](https://github.com/dbt-labs/dbt-presto)
- [ ] [Engineering] Create a platform issue to update dbt-spark versions to dbt Cloud. [Example issue](https://github.com/dbt-labs/dbt-cloud/issues/3481)
- [ ] [Product] Release new version of dbt-utils with new dbt version compatibility. If there are breaking changes requiring a minor version, plan upgrades of other packages that depend on dbt-utils.

View File

@@ -1,29 +0,0 @@
---
name: Minor version release
about: Creates a tracking checklist of items for a minor version release
title: "[Tracking] v#.##.# release "
labels: ''
assignees: ''
---
### Release Core
- [ ] [Engineering] dbt-release workflow
- [ ] [Engineering] Create new protected `x.latest` branch
- [ ] [Product] Finalize migration guide (next.docs.getdbt.com)
### Release Cloud
- [ ] [Engineering] Create a platform issue to update dbt Cloud and verify it is completed
- [ ] [Engineering] Determine if schemas have changed. If so, generate new schemas and push to schemas.getdbt.com
### Announce
- [ ] [Product] Publish discourse
- [ ] [Product] Announce in dbt Slack
### Post-release
- [ ] [Engineering] [Bump plugin versions](https://www.notion.so/fishtownanalytics/Releasing-b97c5ea9a02949e79e81db3566bbc8ef#59571f5bc1a040d9a8fd096e23d2c7db) (dbt-spark + dbt-presto), add compatibility as needed
- [ ] Spark
- [ ] Presto
- [ ] [Engineering] Create a platform issue to update dbt-spark versions to dbt Cloud
- [ ] [Product] Release new version of dbt-utils with new dbt version compatibility. If there are breaking changes requiring a minor version, plan upgrades of other packages that depend on dbt-utils.
- [ ] [Engineering] If this isn't a final release, create an epic for the next release

View File

@@ -0,0 +1,29 @@
---
name: RC minor version release
about: Creates a tracking checklist of items for a RC minor version release
title: "[Tracking] v#.##.#RC# release "
labels: 'release'
assignees: ''
---
### Release Core
- [ ] [Engineering] Verify all necessary changes exist on the release branch
- [ ] [Engineering] Follow [dbt-release workflow](https://www.notion.so/dbtlabs/Releasing-b97c5ea9a02949e79e81db3566bbc8ef#03ff37da697d4d8ba63d24fae1bfa817)
- [ ] [Product] Update migration guide (next.docs.getdbt.com)
### Release Cloud
- [ ] [Engineering] Create a platform issue to update dbt Cloud and verify it is completed. [Example issue](https://github.com/dbt-labs/dbt-cloud/issues/3481)
- [ ] [Engineering] Determine if schemas have changed. If so, generate new schemas and push to schemas.getdbt.com
### Announce
- [ ] [Product] Publish discourse
- [ ] [Product] Announce in dbt Slack
### Post-release
- [ ] [Engineering] [Bump plugin versions](https://www.notion.so/dbtlabs/Releasing-b97c5ea9a02949e79e81db3566bbc8ef#f01854e8da3641179fbcbe505bdf515c) (dbt-spark + dbt-presto), add compatibility as needed
- [ ] [Spark](https://github.com/dbt-labs/dbt-spark)
- [ ] [Presto](https://github.com/dbt-labs/dbt-presto)
- [ ] [Engineering] Create a platform issue to update dbt-spark versions to dbt Cloud. [Example issue](https://github.com/dbt-labs/dbt-cloud/issues/3481)
- [ ] [Product] Release new version of dbt-utils with new dbt version compatibility. If there are breaking changes requiring a minor version, plan upgrades of other packages that depend on dbt-utils.
- [ ] [Engineering] Create an epic for the final release

View File

@@ -1,181 +0,0 @@
name: Performance Regression Testing
# Schedule triggers
on:
# TODO this is just while developing
pull_request:
branches:
- 'develop'
- 'performance-regression-testing'
schedule:
# runs twice a day at 10:05am and 10:05pm
- cron: '5 10,22 * * *'
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
jobs:
# checks fmt of runner code
# purposefully not a dependency of any other job
# will block merging, but not prevent developing
fmt:
name: Cargo fmt
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- run: rustup component add rustfmt
- uses: actions-rs/cargo@v1
with:
command: fmt
args: --manifest-path performance/runner/Cargo.toml --all -- --check
# runs any tests associated with the runner
# these tests make sure the runner logic is correct
test-runner:
name: Test Runner
runs-on: ubuntu-latest
env:
# turns errors into warnings
RUSTFLAGS: "-D warnings"
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- uses: actions-rs/cargo@v1
with:
command: test
args: --manifest-path performance/runner/Cargo.toml
# build an optimized binary to be used as the runner in later steps
build-runner:
needs: [test-runner]
name: Build Runner
runs-on: ubuntu-latest
env:
RUSTFLAGS: "-D warnings"
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- uses: actions-rs/cargo@v1
with:
command: build
args: --release --manifest-path performance/runner/Cargo.toml
- uses: actions/upload-artifact@v2
with:
name: runner
path: performance/runner/target/release/runner
# run the performance measurements on the current or default branch
measure-dev:
needs: [build-runner]
name: Measure Dev Branch
runs-on: ubuntu-latest
steps:
- name: checkout dev
uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v2.2.2
with:
python-version: '3.8'
- name: install dbt
run: pip install -r dev-requirements.txt -r editable-requirements.txt
- name: install hyperfine
run: wget https://github.com/sharkdp/hyperfine/releases/download/v1.11.0/hyperfine_1.11.0_amd64.deb && sudo dpkg -i hyperfine_1.11.0_amd64.deb
- uses: actions/download-artifact@v2
with:
name: runner
- name: change permissions
run: chmod +x ./runner
- name: run
run: ./runner measure -b dev -p ${{ github.workspace }}/performance/projects/
- uses: actions/upload-artifact@v2
with:
name: dev-results
path: performance/results/
# run the performance measurements on the release branch which we use
# as a performance baseline. This part takes by far the longest, so
# we do everything we can first so the job fails fast.
# -----
# we need to checkout dbt twice in this job: once for the baseline dbt
# version, and once to get the latest regression testing projects,
# metrics, and runner code from the develop or current branch so that
# the calculations match for both versions of dbt we are comparing.
measure-baseline:
needs: [build-runner]
name: Measure Baseline Branch
runs-on: ubuntu-latest
steps:
- name: checkout latest
uses: actions/checkout@v2
with:
ref: '0.20.latest'
- name: Setup Python
uses: actions/setup-python@v2.2.2
with:
python-version: '3.8'
- name: move repo up a level
run: mkdir ${{ github.workspace }}/../baseline/ && cp -r ${{ github.workspace }} ${{ github.workspace }}/../baseline
- name: "[debug] ls new dbt location"
run: ls ${{ github.workspace }}/../baseline/dbt/
# installation creates egg-links so we have to preserve source
- name: install dbt from new location
run: cd ${{ github.workspace }}/../baseline/dbt/ && pip install -r dev-requirements.txt -r editable-requirements.txt
# checkout the current branch to get all the target projects
# this deletes the old checked out code which is why we had to copy before
- name: checkout dev
uses: actions/checkout@v2
- name: install hyperfine
run: wget https://github.com/sharkdp/hyperfine/releases/download/v1.11.0/hyperfine_1.11.0_amd64.deb && sudo dpkg -i hyperfine_1.11.0_amd64.deb
- uses: actions/download-artifact@v2
with:
name: runner
- name: change permissions
run: chmod +x ./runner
- name: run runner
run: ./runner measure -b baseline -p ${{ github.workspace }}/performance/projects/
- uses: actions/upload-artifact@v2
with:
name: baseline-results
path: performance/results/
# detect regressions on the output generated from measuring
# the two branches. Exits with non-zero code if a regression is detected.
calculate-regressions:
needs: [measure-dev, measure-baseline]
name: Compare Results
runs-on: ubuntu-latest
steps:
- uses: actions/download-artifact@v2
with:
name: dev-results
- uses: actions/download-artifact@v2
with:
name: baseline-results
- name: "[debug] ls result files"
run: ls
- uses: actions/download-artifact@v2
with:
name: runner
- name: change permissions
run: chmod +x ./runner
- name: run calculation
run: ./runner calculate -r ./
# always attempt to upload the results even if there were regressions found
- uses: actions/upload-artifact@v2
if: ${{ always() }}
with:
name: final-calculations
path: ./final_calculations.json

View File

@@ -1,4 +1,4 @@
# This is a workflow to run our unit and integration tests for windows and mac
# This is a workflow to run our integration tests for windows and mac
name: dbt Tests
@@ -10,7 +10,7 @@ on:
- 'develop'
- '*.latest'
- 'releases/*'
pull_request_target:
pull_request:
branches:
- 'develop'
- '*.latest'
@@ -20,45 +20,9 @@ on:
workflow_dispatch:
jobs:
Linting:
runs-on: ubuntu-latest #no need to run on every OS
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v2.2.2
with:
python-version: '3.8'
architecture: 'x64'
- name: 'Install dependencies'
run: python -m pip install --upgrade pip && pip install tox
- name: 'Linting'
run: tox -e mypy,flake8 -- -v
UnitTest:
strategy:
matrix:
os: [windows-latest, ubuntu-latest, macos-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v2.2.2
with:
python-version: '3.8'
architecture: 'x64'
- name: 'Install dependencies'
run: python -m pip install --upgrade pip && pip install tox
- name: 'Run unit tests'
run: python -m tox -e py -- -v
PostgresIntegrationTest:
runs-on: 'windows-latest' #TODO: Add Mac support
environment: 'Postgres'
needs: UnitTest
steps:
- uses: actions/checkout@v2
- name: 'Install postgresql and set up database'
@@ -98,7 +62,6 @@ jobs:
os: [windows-latest, macos-latest]
runs-on: ${{ matrix.os }}
environment: 'Snowflake'
needs: UnitTest
steps:
- uses: actions/checkout@v2
- name: Setup Python
@@ -132,7 +95,6 @@ jobs:
os: [windows-latest, macos-latest]
runs-on: ${{ matrix.os }}
environment: 'Bigquery'
needs: UnitTest
steps:
- uses: actions/checkout@v2
- name: Setup Python
@@ -156,7 +118,6 @@ jobs:
os: [windows-latest, macos-latest]
runs-on: ${{ matrix.os }}
environment: 'Redshift'
needs: UnitTest
steps:
- uses: actions/checkout@v2
- name: Setup Python

61
.github/workflows/unit_tests.yml vendored Normal file
View File

@@ -0,0 +1,61 @@
# This is a workflow to run our linting and unit tests for windows, mac, and linux
name: Linting and Unit Tests
# Triggers
on:
# Trigger on commits to develop and releases branches
push:
branches:
- 'develop'
- '*.latest'
- 'releases/*'
pull_request: # Trigger for all PRs
workflow_dispatch: # Allow manual triggers
jobs:
Linting:
runs-on: ubuntu-latest #no need to run on every OS
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v2.2.2
with:
python-version: '3.6'
architecture: 'x64'
- name: 'Install dependencies'
run: python -m pip install --upgrade pip && pip install tox
- name: 'Linting'
run: tox -e mypy,flake8 -- -v
UnitTest:
strategy:
matrix:
os: [windows-latest, ubuntu-latest, macos-latest]
runs-on: ${{ matrix.os }}
needs: Linting
steps:
- uses: actions/checkout@v2
- name: Setup Python 3.6
uses: actions/setup-python@v2.2.2
with:
python-version: '3.6'
architecture: 'x64'
- name: Setup Python 3.7
uses: actions/setup-python@v2.2.2
with:
python-version: '3.7'
architecture: 'x64'
- name: Setup Python 3.8
uses: actions/setup-python@v2.2.2
with:
python-version: '3.8'
architecture: 'x64'
- name: 'Install dependencies'
run: python -m pip install --upgrade pip && pip install tox
- name: 'Run unit tests'
run: tox -p -e py36,py37,py38

View File

@@ -1,36 +1,74 @@
## dbt 0.21.0 (Release TBD)
### Under the hood
- Add `build` RPC method, and a subset of flags for `build` task ([#3595](https://github.com/dbt-labs/dbt/issues/3595), [#3674](https://github.com/dbt-labs/dbt/pull/3674))
## dbt 0.21.0b1 (August 03, 2021)
### Breaking changes
- Add full node selection to source freshness command and align selection syntax with other tasks (`dbt source freshness --select source_name` --> `dbt source freshness --select source:souce_name`) and rename `dbt source snapshot-freshness` -> `dbt source freshness`. ([#2987](https://github.com/dbt-labs/dbt/issues/2987), [#3554](https://github.com/dbt-labs/dbt/pull/3554))
- **dbt-snowflake:** Turn off transactions and turn on `autocommit` by default. Explicitly specify `begin` and `commit` for DML statements in incremental and snapshot materializations. Note that this may affect user-space code that depends on transactions.
### Features
- Add `dbt build` command to run models, tests, seeds, and snapshots in DAG order. ([#2743] (https://github.com/dbt-labs/dbt/issues/2743), [#3490] (https://github.com/dbt-labs/dbt/issues/3490))
- Add `dbt build` command to run models, tests, seeds, and snapshots in DAG order. ([#2743](https://github.com/dbt-labs/dbt/issues/2743), [#3490](https://github.com/dbt-labs/dbt/issues/3490), [#3608](https://github.com/dbt-labs/dbt/issues/3608))
- Introduce `on_schema_change` config to detect and handle schema changes on incremental models ([#1132](https://github.com/fishtown-analytics/dbt/issues/1132), [#3387](https://github.com/fishtown-analytics/dbt/issues/3387))
### Fixes
- Fix docs generation for cross-db sources in REDSHIFT RA3 node ([#3236](https://github.com/fishtown-analytics/dbt/issues/3236), [#3408](https://github.com/fishtown-analytics/dbt/pull/3408))
- Fix type coercion issues when fetching query result sets ([#2984](https://github.com/fishtown-analytics/dbt/issues/2984), [#3499](https://github.com/fishtown-analytics/dbt/pull/3499))
- Handle whitespace after a plus sign on the project config ([#3526](https://github.com/dbt-labs/dbt/pull/3526))
- Fix table and view materialization issue when switching from one to the other ([#2161](https://github.com/dbt-labs/dbt/issues/2161)), [#3547](https://github.com/dbt-labs/dbt/pull/3547))
- Fix for RPC requests that raise a RecursionError when serializing Undefined values as JSON ([#3464](https://github.com/dbt-labs/dbt/issues/3464), [#3687](https://github.com/dbt-labs/dbt/pull/3687))
### Under the hood
- Add performance regression testing [#3602](https://github.com/dbt-labs/dbt/pull/3602)
- Improve default view and table materialization performance by checking relational cache before attempting to drop temp relations ([#3112](https://github.com/fishtown-analytics/dbt/issues/3112), [#3468](https://github.com/fishtown-analytics/dbt/pull/3468))
- Add optional `sslcert`, `sslkey`, and `sslrootcert` profile arguments to the Postgres connector. ([#3472](https://github.com/fishtown-analytics/dbt/pull/3472), [#3473](https://github.com/fishtown-analytics/dbt/pull/3473))
- Move the example project used by `dbt init` into `dbt` repository, to avoid cloning an external repo ([#3005](https://github.com/fishtown-analytics/dbt/pull/3005), [#3474](https://github.com/fishtown-analytics/dbt/pull/3474), [#3536](https://github.com/fishtown-analytics/dbt/pull/3536))
- Better interaction between `dbt init` and adapters. Avoid raising errors while initializing a project ([#2814](https://github.com/fishtown-analytics/dbt/pull/2814), [#3483](https://github.com/fishtown-analytics/dbt/pull/3483))
- Update `create_adapter_plugins` script to include latest accessories, and stay up to date with latest dbt-core version ([#3002](https://github.com/fishtown-analytics/dbt/issues/3002), [#3509](https://github.com/fishtown-analytics/dbt/pull/3509))
- Scrub environment secrets from logs and console output ([#3617](https://github.com/dbt-labs/dbt/pull/3617))
### Dependencies
- Require `werkzeug>=1`
- Require `werkzeug>=1` ([#3590](https://github.com/dbt-labs/dbt/pull/3590))
Contributors:
- [@kostek-pl](https://github.com/kostek-pl) ([#3236](https://github.com/fishtown-analytics/dbt/pull/3408))
- [@matt-winkler](https://github.com/matt-winkler) ([#3387](https://github.com/dbt-labs/dbt/pull/3387))
- [@tconbeer](https://github.com/tconbeer) [#3468](https://github.com/fishtown-analytics/dbt/pull/3468))
- [@JLDLaughlin](https://github.com/JLDLaughlin) ([#3473](https://github.com/fishtown-analytics/dbt/pull/3473))
- [@jmriego](https://github.com/jmriego) ([#3526](https://github.com/dbt-labs/dbt/pull/3526))
- [@danielefrigo](https://github.com/danielefrigo) ([#3547](https://github.com/dbt-labs/dbt/pull/3547))
## dbt 0.20.1 (Release TBD)
### Fixes
- Fix `store_failures` config when defined as a modifier for `unique` and `not_null` tests ([#3575](https://github.com/fishtown-analytics/dbt/issues/3575), [#3577](https://github.com/fishtown-analytics/dbt/pull/3577))
### Features
- Adds `install-prerelease` parameter to hub packages in `packages.yml`. When set to `True`, allows prerelease packages to be installed. By default, this parameter is False unless explicitly set to True.
### Fixes
- Fix config merge behavior with experimental parser ([#3640](https://github.com/dbt-labs/dbt/pull/3640), [#3637](https://github.com/dbt-labs/dbt/pull/3637))
- Fix `store_failures` config when defined as a modifier for `unique` and `not_null` tests ([#3575](https://github.com/fishtown-analytics/dbt/issues/3575), [#3577](https://github.com/fishtown-analytics/dbt/pull/3577))
- Fix `where` config with `relationships` test by refactoring test SQL. Note: The default `relationships` test now includes CTEs, and may need reimplementing on adapters that don't support CTEs nested inside subqueries. ([#3579](https://github.com/fishtown-analytics/dbt/issues/3579), [#3583](https://github.com/fishtown-analytics/dbt/pull/3583))
- Fix `dbt deps` version comparison logic which was causing incorrect pre-release package versions to be installed. ([#3578](https://github.com/dbt-labs/dbt/issues/3578), [#3609](https://github.com/dbt-labs/dbt/issues/3609))
- Fix exception on yml files with all comments ([#3568](https://github.com/dbt-labs/dbt/issues/3568), [#3630](https://github.com/dbt-labs/dbt/issues/3630))
- Partial parsing: don't reprocess SQL file already scheduled ([#3589](https://github.com/dbt-labs/dbt/issues/3589), [#3620](https://github.com/dbt-labs/dbt/pull/3620))
- Handle interator functions in model config ([#3573](https://github.com/dbt-labs/dbt/issues/3573), [#3619](https://github.com/dbt-labs/dbt/issues/3619))
- Partial parsing: fix error after changing empty yaml file ([#3567](https://github.com/dbt-labs/dbt/issues/3567), [#3618](https://github.com/dbt-labs/dbt/pull/3618))
- Partial parsing: handle source tests when changing test macro ([#3584](https://github.com/dbt-labs/dbt/issues/3584), [#3620](https://github.com/dbt-labs/dbt/pull/3620))
- Partial parsing: schedule new macro file for parsing when macro patching ([#3627](https://github.com/dbt-labs/dbt/issues/3627), [#3627](https://github.com/dbt-labs/dbt/pull/3627))
- Use `SchemaParser`'s render context to render test configs in order to support `var()` configured at the project level and passed in from the cli ([#3564](https://github.com/dbt-labs/dbt/issues/3564), [#3646](https://github.com/dbt-labs/dbt/pull/3646))
- Partial parsing: check unique_ids when recursively removing macros ([#3636](https://github.com/dbt-labs/dbt/issues/3636), [#3659](https://github.com/dbt-labs/dbt/issues/3659))
### Docs
- Fix docs site crash if `relationships` test has one dependency instead of two ([docs#207](https://github.com/dbt-labs/dbt-docs/issues/207), ([docs#208](https://github.com/dbt-labs/dbt-docs/issues/208)))
### Under the hood
- Handle exceptions from anonymous usage tracking for users of `dbt-snowflake` on Apple M1 chips ([#3162](https://github.com/dbt-labs/dbt/issues/3162), [#3661](https://github.com/dbt-labs/dbt/issues/3661))
- Add tracking for determine why `dbt` needs to re-parse entire project when partial parsing is enabled ([#3572](https://github.com/dbt-labs/dbt/issues/3572), [#3652](https://github.com/dbt-labs/dbt/pull/3652))
Contributors:
- [@NiallRees](https://github.com/NiallRees) ([#3624](https://github.com/dbt-labs/dbt/pull/3624))
## dbt 0.20.0 (July 12, 2021)
@@ -112,7 +150,7 @@ Contributors:
- Use shutil.which so Windows can pick up git.bat as a git executable ([#3035](https://github.com/fishtown-analytics/dbt/issues/3035), [#3134](https://github.com/fishtown-analytics/dbt/issues/3134))
- Add `ssh-client` and update `git` version (using buster backports) in Docker image ([#3337](https://github.com/fishtown-analytics/dbt/issues/3337), [#3338](https://github.com/fishtown-analytics/dbt/pull/3338))
- Add `tags` and `meta` properties to the exposure resource schema. ([#3404](https://github.com/fishtown-analytics/dbt/issues/3404), [#3405](https://github.com/fishtown-analytics/dbt/pull/3405))
- Update test sub-query alias ([#3398](https://github.com/fishtown-analytics/dbt/issues/3398), [#3414](https://github.com/fishtown-analytics/dbt/pull/3414))
- Update test sub-query alias ([#3398](https://github.com/fishtown-analytics/dbt/issues/3398), [#3414](https://github.com/fishtown-analytics/dbt/pull/3414))
- Bump schema versions for run results and manifest artifacts ([#3422](https://github.com/fishtown-analytics/dbt/issues/3422), [#3421](https://github.com/fishtown-analytics/dbt/pull/3421))
- Add deprecation warning for using `packages` argument with `adapter.dispatch` ([#3419](https://github.com/fishtown-analytics/dbt/issues/3419), [#3420](https://github.com/fishtown-analytics/dbt/pull/3420))
@@ -1700,7 +1738,7 @@ Full installation instructions for macOS, Windows, and Linux can be found [here]
#### macOS Installation Instructions
```bash
brew update
brew tap fishtown-analytics/dbt
brew tap dbt-labs/dbt
brew install dbt
```

View File

@@ -24,7 +24,7 @@ Please note that all contributors to `dbt` must sign the [Contributor License Ag
### Defining the problem
If you have an idea for a new feature or if you've discovered a bug in `dbt`, the first step is to open an issue. Please check the list of [open issues](https://github.com/fishtown-analytics/dbt/issues) before creating a new one. If you find a relevant issue, please add a comment to the open issue instead of creating a new one. There are hundreds of open issues in this repository and it can be hard to know where to look for a relevant open issue. **The `dbt` maintainers are always happy to point contributors in the right direction**, so please err on the side of documenting your idea in a new issue if you are unsure where a problem statement belongs.
If you have an idea for a new feature or if you've discovered a bug in `dbt`, the first step is to open an issue. Please check the list of [open issues](https://github.com/dbt-labs/dbt/issues) before creating a new one. If you find a relevant issue, please add a comment to the open issue instead of creating a new one. There are hundreds of open issues in this repository and it can be hard to know where to look for a relevant open issue. **The `dbt` maintainers are always happy to point contributors in the right direction**, so please err on the side of documenting your idea in a new issue if you are unsure where a problem statement belongs.
> **Note:** All community-contributed Pull Requests _must_ be associated with an open issue. If you submit a Pull Request that does not pertain to an open issue, you will be asked to create an issue describing the problem before the Pull Request can be reviewed.
@@ -36,7 +36,7 @@ After you open an issue, a `dbt` maintainer will follow up by commenting on your
If an issue is appropriately well scoped and describes a beneficial change to the `dbt` codebase, then anyone may submit a Pull Request to implement the functionality described in the issue. See the sections below on how to do this.
The `dbt` maintainers will add a `good first issue` label if an issue is suitable for a first-time contributor. This label often means that the required code change is small, limited to one database adapter, or a net-new addition that does not impact existing functionality. You can see the list of currently open issues on the [Contribute](https://github.com/fishtown-analytics/dbt/contribute) page.
The `dbt` maintainers will add a `good first issue` label if an issue is suitable for a first-time contributor. This label often means that the required code change is small, limited to one database adapter, or a net-new addition that does not impact existing functionality. You can see the list of currently open issues on the [Contribute](https://github.com/dbt-labs/dbt/contribute) page.
Here's a good workflow:
- Comment on the open issue, expressing your interest in contributing the required code change
@@ -52,15 +52,15 @@ The `dbt` maintainers use labels to categorize open issues. Some labels indicate
| tag | description |
| --- | ----------- |
| [triage](https://github.com/fishtown-analytics/dbt/labels/triage) | This is a new issue which has not yet been reviewed by a `dbt` maintainer. This label is removed when a maintainer reviews and responds to the issue. |
| [bug](https://github.com/fishtown-analytics/dbt/labels/bug) | This issue represents a defect or regression in `dbt` |
| [enhancement](https://github.com/fishtown-analytics/dbt/labels/enhancement) | This issue represents net-new functionality in `dbt` |
| [good first issue](https://github.com/fishtown-analytics/dbt/labels/good%20first%20issue) | This issue does not require deep knowledge of the `dbt` codebase to implement. This issue is appropriate for a first-time contributor. |
| [help wanted](https://github.com/fishtown-analytics/`dbt`/labels/help%20wanted) / [discussion](https://github.com/fishtown-analytics/dbt/labels/discussion) | Conversation around this issue in ongoing, and there isn't yet a clear path forward. Input from community members is most welcome. |
| [duplicate](https://github.com/fishtown-analytics/dbt/issues/duplicate) | This issue is functionally identical to another open issue. The `dbt` maintainers will close this issue and encourage community members to focus conversation on the other one. |
| [snoozed](https://github.com/fishtown-analytics/dbt/labels/snoozed) | This issue describes a good idea, but one which will probably not be addressed in a six-month time horizon. The `dbt` maintainers will revist these issues periodically and re-prioritize them accordingly. |
| [stale](https://github.com/fishtown-analytics/dbt/labels/stale) | This is an old issue which has not recently been updated. Stale issues will periodically be closed by `dbt` maintainers, but they can be re-opened if the discussion is restarted. |
| [wontfix](https://github.com/fishtown-analytics/dbt/labels/wontfix) | This issue does not require a code change in the `dbt` repository, or the maintainers are unwilling/unable to merge a Pull Request which implements the behavior described in the issue. |
| [triage](https://github.com/dbt-labs/dbt/labels/triage) | This is a new issue which has not yet been reviewed by a `dbt` maintainer. This label is removed when a maintainer reviews and responds to the issue. |
| [bug](https://github.com/dbt-labs/dbt/labels/bug) | This issue represents a defect or regression in `dbt` |
| [enhancement](https://github.com/dbt-labs/dbt/labels/enhancement) | This issue represents net-new functionality in `dbt` |
| [good first issue](https://github.com/dbt-labs/dbt/labels/good%20first%20issue) | This issue does not require deep knowledge of the `dbt` codebase to implement. This issue is appropriate for a first-time contributor. |
| [help wanted](https://github.com/dbt-labs/dbt/labels/help%20wanted) / [discussion](https://github.com/dbt-labs/dbt/labels/discussion) | Conversation around this issue in ongoing, and there isn't yet a clear path forward. Input from community members is most welcome. |
| [duplicate](https://github.com/dbt-labs/dbt/issues/duplicate) | This issue is functionally identical to another open issue. The `dbt` maintainers will close this issue and encourage community members to focus conversation on the other one. |
| [snoozed](https://github.com/dbt-labs/dbt/labels/snoozed) | This issue describes a good idea, but one which will probably not be addressed in a six-month time horizon. The `dbt` maintainers will revist these issues periodically and re-prioritize them accordingly. |
| [stale](https://github.com/dbt-labs/dbt/labels/stale) | This is an old issue which has not recently been updated. Stale issues will periodically be closed by `dbt` maintainers, but they can be re-opened if the discussion is restarted. |
| [wontfix](https://github.com/dbt-labs/dbt/labels/wontfix) | This issue does not require a code change in the `dbt` repository, or the maintainers are unwilling/unable to merge a Pull Request which implements the behavior described in the issue. |
#### Branching Strategy
@@ -78,17 +78,17 @@ You will need `git` in order to download and modify the `dbt` source code. On ma
### External contributors
If you are not a member of the `fishtown-analytics` GitHub organization, you can contribute to `dbt` by forking the `dbt` repository. For a detailed overview on forking, check out the [GitHub docs on forking](https://help.github.com/en/articles/fork-a-repo). In short, you will need to:
If you are not a member of the `dbt-labs` GitHub organization, you can contribute to `dbt` by forking the `dbt` repository. For a detailed overview on forking, check out the [GitHub docs on forking](https://help.github.com/en/articles/fork-a-repo). In short, you will need to:
1. fork the `dbt` repository
2. clone your fork locally
3. check out a new branch for your proposed changes
4. push changes to your fork
5. open a pull request against `fishtown-analytics/dbt` from your forked repository
5. open a pull request against `dbt-labs/dbt` from your forked repository
### Core contributors
If you are a member of the `fishtown-analytics` GitHub organization, you will have push access to the `dbt` repo. Rather than forking `dbt` to make your changes, just clone the repository, check out a new branch, and push directly to that branch.
If you are a member of the `dbt-labs` GitHub organization, you will have push access to the `dbt` repo. Rather than forking `dbt` to make your changes, just clone the repository, check out a new branch, and push directly to that branch.
## Setting up an environment
@@ -155,7 +155,7 @@ Configure your [profile](https://docs.getdbt.com/docs/configure-your-profile) as
Getting the `dbt` integration tests set up in your local environment will be very helpful as you start to make changes to your local version of `dbt`. The section that follows outlines some helpful tips for setting up the test environment.
Since `dbt` works with a number of different databases, you will need to supply credentials for one or more of these databases in your test environment. Most organizations don't have access to each of a BigQuery, Redshift, Snowflake, and Postgres database, so it's likely that you will be unable to run every integration test locally. Fortunately, Fishtown Analytics provides a CI environment with access to sandboxed Redshift, Snowflake, BigQuery, and Postgres databases. See the section on [_Submitting a Pull Request_](#submitting-a-pull-request) below for more information on this CI setup.
Since `dbt` works with a number of different databases, you will need to supply credentials for one or more of these databases in your test environment. Most organizations don't have access to each of a BigQuery, Redshift, Snowflake, and Postgres database, so it's likely that you will be unable to run every integration test locally. Fortunately, dbt Labs provides a CI environment with access to sandboxed Redshift, Snowflake, BigQuery, and Postgres databases. See the section on [_Submitting a Pull Request_](#submitting-a-pull-request) below for more information on this CI setup.
### Initial setup
@@ -224,7 +224,7 @@ python -m pytest test/unit/test_graph.py::GraphTest::test__dependency_list
> is a list of useful command-line options for `pytest` to use while developing.
## Submitting a Pull Request
Fishtown Analytics provides a sandboxed Redshift, Snowflake, and BigQuery database for use in a CI environment. When pull requests are submitted to the `fishtown-analytics/dbt` repo, GitHub will trigger automated tests in CircleCI and Azure Pipelines.
dbt Labs provides a sandboxed Redshift, Snowflake, and BigQuery database for use in a CI environment. When pull requests are submitted to the `dbt-labs/dbt` repo, GitHub will trigger automated tests in CircleCI and Azure Pipelines.
A `dbt` maintainer will review your PR. They may suggest code revision for style or clarity, or request that you add unit or integration test(s). These are good things! We believe that, with a little bit of help, anyone can contribute high-quality code.

View File

@@ -1,73 +0,0 @@
#!/usr/bin/env python
import json
import yaml
import sys
import argparse
from datetime import datetime, timezone
import dbt.clients.registry as registry
def yaml_type(fname):
with open(fname) as f:
return yaml.load(f)
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--project", type=yaml_type, default="dbt_project.yml")
parser.add_argument("--namespace", required=True)
return parser.parse_args()
def get_full_name(args):
return "{}/{}".format(args.namespace, args.project["name"])
def init_project_in_packages(args, packages):
full_name = get_full_name(args)
if full_name not in packages:
packages[full_name] = {
"name": args.project["name"],
"namespace": args.namespace,
"latest": args.project["version"],
"assets": {},
"versions": {},
}
return packages[full_name]
def add_version_to_package(args, project_json):
project_json["versions"][args.project["version"]] = {
"id": "{}/{}".format(get_full_name(args), args.project["version"]),
"name": args.project["name"],
"version": args.project["version"],
"description": "",
"published_at": datetime.now(timezone.utc).astimezone().isoformat(),
"packages": args.project.get("packages") or [],
"works_with": [],
"_source": {
"type": "github",
"url": "",
"readme": "",
},
"downloads": {
"tarball": "",
"format": "tgz",
"sha1": "",
},
}
def main():
args = parse_args()
packages = registry.packages()
project_json = init_project_in_packages(args, packages)
if args.project["version"] in project_json["versions"]:
raise Exception("Version {} already in packages JSON"
.format(args.project["version"]),
file=sys.stderr)
add_version_to_package(args, project_json)
print(json.dumps(packages, indent=2))
if __name__ == "__main__":
main()

View File

@@ -513,7 +513,7 @@ class BaseAdapter(metaclass=AdapterMeta):
def get_columns_in_relation(
self, relation: BaseRelation
) -> List[BaseColumn]:
"""Get a list of the columns in the given Relation."""
"""Get a list of the columns in the given Relation. """
raise NotImplementedException(
'`get_columns_in_relation` is not implemented for this adapter!'
)

View File

@@ -1,5 +1,5 @@
import dbt.exceptions
from typing import Any, Dict, Optional
import yaml
import yaml.scanner
@@ -56,7 +56,7 @@ def contextualized_yaml_error(raw_contents, error):
raw_error=error)
def safe_load(contents):
def safe_load(contents) -> Optional[Dict[str, Any]]:
return yaml.load(contents, Loader=SafeLoader)

View File

@@ -120,7 +120,7 @@ class BaseContextConfigGenerator(Generic[T]):
def calculate_node_config(
self,
config_calls: List[Dict[str, Any]],
config_call_dict: Dict[str, Any],
fqn: List[str],
resource_type: NodeType,
project_name: str,
@@ -134,8 +134,9 @@ class BaseContextConfigGenerator(Generic[T]):
for fqn_config in project_configs:
result = self._update_from_config(result, fqn_config)
for config_call in config_calls:
result = self._update_from_config(result, config_call)
# config_calls are created in the 'experimental' model parser and
# the ParseConfigObject (via add_config_call)
result = self._update_from_config(result, config_call_dict)
if own_config.project_name != self._active_project.project_name:
for fqn_config in self._active_project_configs(fqn, resource_type):
@@ -147,7 +148,7 @@ class BaseContextConfigGenerator(Generic[T]):
@abstractmethod
def calculate_node_config_dict(
self,
config_calls: List[Dict[str, Any]],
config_call_dict: Dict[str, Any],
fqn: List[str],
resource_type: NodeType,
project_name: str,
@@ -186,14 +187,14 @@ class ContextConfigGenerator(BaseContextConfigGenerator[C]):
def calculate_node_config_dict(
self,
config_calls: List[Dict[str, Any]],
config_call_dict: Dict[str, Any],
fqn: List[str],
resource_type: NodeType,
project_name: str,
base: bool,
) -> Dict[str, Any]:
config = self.calculate_node_config(
config_calls=config_calls,
config_call_dict=config_call_dict,
fqn=fqn,
resource_type=resource_type,
project_name=project_name,
@@ -209,14 +210,14 @@ class UnrenderedConfigGenerator(BaseContextConfigGenerator[Dict[str, Any]]):
def calculate_node_config_dict(
self,
config_calls: List[Dict[str, Any]],
config_call_dict: Dict[str, Any],
fqn: List[str],
resource_type: NodeType,
project_name: str,
base: bool,
) -> Dict[str, Any]:
return self.calculate_node_config(
config_calls=config_calls,
config_call_dict=config_call_dict,
fqn=fqn,
resource_type=resource_type,
project_name=project_name,
@@ -251,14 +252,32 @@ class ContextConfig:
resource_type: NodeType,
project_name: str,
) -> None:
self._config_calls: List[Dict[str, Any]] = []
self._config_call_dict: Dict[str, Any] = {}
self._active_project = active_project
self._fqn = fqn
self._resource_type = resource_type
self._project_name = project_name
def update_in_model_config(self, opts: Dict[str, Any]) -> None:
self._config_calls.append(opts)
def add_config_call(self, opts: Dict[str, Any]) -> None:
dct = self._config_call_dict
self._add_config_call(dct, opts)
@classmethod
def _add_config_call(cls, config_call_dict, opts: Dict[str, Any]) -> None:
for k, v in opts.items():
# MergeBehavior for post-hook and pre-hook is to collect all
# values, instead of overwriting
if k in BaseConfig.mergebehavior['append']:
if not isinstance(v, list):
v = [v]
if k in BaseConfig.mergebehavior['update'] and not isinstance(v, dict):
raise InternalException(f'expected dict, got {v}')
if k in config_call_dict and isinstance(config_call_dict[k], list):
config_call_dict[k].extend(v)
elif k in config_call_dict and isinstance(config_call_dict[k], dict):
config_call_dict[k].update(v)
else:
config_call_dict[k] = v
def build_config_dict(
self,
@@ -272,7 +291,7 @@ class ContextConfig:
src = UnrenderedConfigGenerator(self._active_project)
return src.calculate_node_config_dict(
config_calls=self._config_calls,
config_call_dict=self._config_call_dict,
fqn=self._fqn,
resource_type=self._resource_type,
project_name=self._project_name,

View File

@@ -279,7 +279,7 @@ class Config(Protocol):
...
# `config` implementations
# Implementation of "config(..)" calls in models
class ParseConfigObject(Config):
def __init__(self, model, context_config: Optional[ContextConfig]):
self.model = model
@@ -316,7 +316,7 @@ class ParseConfigObject(Config):
raise RuntimeException(
'At parse time, did not receive a context config'
)
self.context_config.update_in_model_config(opts)
self.context_config.add_config_call(opts)
return ''
def set(self, name, value):

View File

@@ -220,7 +220,7 @@ class SchemaSourceFile(BaseSourceFile):
# node patches contain models, seeds, snapshots, analyses
ndp: List[str] = field(default_factory=list)
# any macro patches in this file by macro unique_id.
mcp: List[str] = field(default_factory=list)
mcp: Dict[str, str] = field(default_factory=dict)
# any source patches in this file. The entries are package, name pairs
# Patches are only against external sources. Sources can be
# created too, but those are in 'sources'

View File

@@ -759,7 +759,7 @@ class Manifest(MacroMethods, DataClassMessagePackMixin, dbtClassMixin):
if macro.patch_path:
package_name, existing_file_path = macro.patch_path.split('://')
raise_duplicate_macro_patch_name(patch, existing_file_path)
source_file.macro_patches.append(unique_id)
source_file.macro_patches[patch.name] = unique_id
macro.patch(patch)
def add_source_patch(

View File

@@ -2,13 +2,13 @@ from dataclasses import field, Field, dataclass
from enum import Enum
from itertools import chain
from typing import (
Any, List, Optional, Dict, Union, Type, TypeVar
Any, List, Optional, Dict, Union, Type, TypeVar, Callable
)
from dbt.dataclass_schema import (
dbtClassMixin, ValidationError, register_pattern,
)
from dbt.contracts.graph.unparsed import AdditionalPropertiesAllowed
from dbt.exceptions import InternalException
from dbt.exceptions import InternalException, CompilationException
from dbt.contracts.util import Replaceable, list_str
from dbt import hooks
from dbt.node_types import NodeType
@@ -204,6 +204,34 @@ class BaseConfig(
else:
self._extra[key] = value
def __delitem__(self, key):
if hasattr(self, key):
msg = (
'Error, tried to delete config key "{}": Cannot delete '
'built-in keys'
).format(key)
raise CompilationException(msg)
else:
del self._extra[key]
def _content_iterator(self, include_condition: Callable[[Field], bool]):
seen = set()
for fld, _ in self._get_fields():
seen.add(fld.name)
if include_condition(fld):
yield fld.name
for key in self._extra:
if key not in seen:
seen.add(key)
yield key
def __iter__(self):
yield from self._content_iterator(include_condition=lambda f: True)
def __len__(self):
return len(self._get_fields()) + len(self._extra)
@staticmethod
def compare_key(
unrendered: Dict[str, Any],
@@ -239,8 +267,14 @@ class BaseConfig(
return False
return True
# This is used in 'add_config_call' to created the combined config_call_dict.
mergebehavior = {
"append": ['pre-hook', 'pre_hook', 'post-hook', 'post_hook', 'tags'],
"update": ['quoting', 'column_types'],
}
@classmethod
def _extract_dict(
def _merge_dicts(
cls, src: Dict[str, Any], data: Dict[str, Any]
) -> Dict[str, Any]:
"""Find all the items in data that match a target_field on this class,
@@ -286,10 +320,10 @@ class BaseConfig(
adapter_config_cls = get_config_class_by_name(adapter_type)
self_merged = self._extract_dict(dct, data)
self_merged = self._merge_dicts(dct, data)
dct.update(self_merged)
adapter_merged = adapter_config_cls._extract_dict(dct, data)
adapter_merged = adapter_config_cls._merge_dicts(dct, data)
dct.update(adapter_merged)
# any remaining fields must be "clobber"
@@ -322,6 +356,8 @@ class SourceConfig(BaseConfig):
@dataclass
class NodeConfig(BaseConfig):
# Note: if any new fields are added with MergeBehavior, also update the
# 'mergebehavior' dictionary
enabled: bool = True
materialized: str = 'view'
persist_docs: Dict[str, Any] = field(default_factory=dict)
@@ -369,6 +405,7 @@ class NodeConfig(BaseConfig):
CompareBehavior.Exclude),
)
full_refresh: Optional[bool] = None
on_schema_change: Optional[str] = 'ignore'
@classmethod
def __pre_deserialize__(cls, data):

View File

@@ -83,6 +83,7 @@ class GitPackage(Package):
class RegistryPackage(Package):
package: str
version: Union[RawVersion, List[RawVersion]]
install_prerelease: Optional[bool] = False
def get_versions(self) -> List[str]:
if isinstance(self.version, list):

View File

@@ -116,6 +116,16 @@ class RPCDocsGenerateParameters(RPCParameters):
state: Optional[str] = None
@dataclass
class RPCBuildParameters(RPCParameters):
threads: Optional[int] = None
models: Union[None, str, List[str]] = None
exclude: Union[None, str, List[str]] = None
selector: Optional[str] = None
state: Optional[str] = None
defer: Optional[bool] = None
@dataclass
class RPCCliParameters(RPCParameters):
cli: str
@@ -186,6 +196,8 @@ class RPCRunOperationParameters(RPCParameters):
class RPCSourceFreshnessParameters(RPCParameters):
threads: Optional[int] = None
select: Union[None, str, List[str]] = None
exclude: Union[None, str, List[str]] = None
selector: Optional[str] = None
@dataclass

View File

@@ -71,10 +71,14 @@ class RegistryUnpinnedPackage(
RegistryPackageMixin, UnpinnedPackage[RegistryPinnedPackage]
):
def __init__(
self, package: str, versions: List[semver.VersionSpecifier]
self,
package: str,
versions: List[semver.VersionSpecifier],
install_prerelease: bool
) -> None:
super().__init__(package)
self.versions = versions
self.install_prerelease = install_prerelease
def _check_in_index(self):
index = registry.index_cached()
@@ -91,13 +95,18 @@ class RegistryUnpinnedPackage(
semver.VersionSpecifier.from_version_string(v)
for v in raw_version
]
return cls(package=contract.package, versions=versions)
return cls(
package=contract.package,
versions=versions,
install_prerelease=contract.install_prerelease
)
def incorporate(
self, other: 'RegistryUnpinnedPackage'
) -> 'RegistryUnpinnedPackage':
return RegistryUnpinnedPackage(
package=self.package,
install_prerelease=self.install_prerelease,
versions=self.versions + other.versions,
)
@@ -111,12 +120,16 @@ class RegistryUnpinnedPackage(
raise DependencyException(new_msg) from e
available = registry.get_available_versions(self.package)
installable = semver.filter_installable(
available,
self.install_prerelease
)
# for now, pick a version and then recurse. later on,
# we'll probably want to traverse multiple options
# so we can match packages. not going to make a difference
# right now.
target = semver.resolve_to_specific_version(range_, available)
target = semver.resolve_to_specific_version(range_, installable)
if not target:
package_version_not_found(self.package, range_, available)
package_version_not_found(self.package, range_, installable)
return RegistryPinnedPackage(package=self.package, version=target)

View File

@@ -710,7 +710,7 @@ def system_error(operation_name):
raise_compiler_error(
"dbt encountered an error when attempting to {}. "
"If this error persists, please create an issue at: \n\n"
"https://github.com/fishtown-analytics/dbt"
"https://github.com/dbt-labs/dbt"
.format(operation_name))

View File

@@ -311,3 +311,34 @@
{{ config.set('sql_header', caller()) }}
{%- endmacro %}
{% macro alter_relation_add_remove_columns(relation, add_columns = none, remove_columns = none) -%}
{{ return(adapter.dispatch('alter_relation_add_remove_columns')(relation, add_columns, remove_columns)) }}
{% endmacro %}
{% macro default__alter_relation_add_remove_columns(relation, add_columns, remove_columns) %}
{% if add_columns is none %}
{% set add_columns = [] %}
{% endif %}
{% if remove_columns is none %}
{% set remove_columns = [] %}
{% endif %}
{% set sql -%}
alter {{ relation.type }} {{ relation }}
{% for column in add_columns %}
add column {{ column.name }} {{ column.data_type }}{{ ',' if not loop.last }}
{% endfor %}{{ ',' if remove_columns | length > 0 }}
{% for column in remove_columns %}
drop column {{ column.name }}{{ ',' if not loop.last }}
{% endfor %}
{%- endset -%}
{% do run_query(sql) %}
{% endmacro %}

View File

@@ -79,7 +79,7 @@
(
select {{ dest_cols_csv }}
from {{ source }}
);
)
{%- endmacro %}

View File

@@ -1,5 +1,6 @@
{% macro incremental_upsert(tmp_relation, target_relation, unique_key=none, statement_name="main") %}
{%- set dest_columns = adapter.get_columns_in_relation(target_relation) -%}
{%- set dest_cols_csv = dest_columns | map(attribute='quoted') | join(', ') -%}

View File

@@ -5,6 +5,10 @@
{% set target_relation = this.incorporate(type='table') %}
{% set existing_relation = load_relation(this) %}
{% set tmp_relation = make_temp_relation(target_relation) %}
{%- set full_refresh_mode = (should_full_refresh()) -%}
{% set on_schema_change = incremental_validate_on_schema_change(config.get('on_schema_change'), default='ignore') %}
{% set tmp_identifier = model['name'] + '__dbt_tmp' %}
{% set backup_identifier = model['name'] + "__dbt_backup" %}
@@ -28,9 +32,16 @@
{{ run_hooks(pre_hooks, inside_transaction=True) }}
{% set to_drop = [] %}
{# -- first check whether we want to full refresh for source view or config reasons #}
{% set trigger_full_refresh = (full_refresh_mode or existing_relation.is_view) %}
{% if existing_relation is none %}
{% set build_sql = create_table_as(False, target_relation, sql) %}
{% elif existing_relation.is_view or should_full_refresh() %}
{% elif trigger_full_refresh %}
{#-- Make sure the backup doesn't exist so we don't encounter issues with the rename below #}
{% set tmp_identifier = model['name'] + '__dbt_tmp' %}
{% set backup_identifier = model['name'] + '__dbt_backup' %}
{% set intermediate_relation = existing_relation.incorporate(path={"identifier": tmp_identifier}) %}
{% set backup_relation = existing_relation.incorporate(path={"identifier": backup_identifier}) %}
@@ -38,12 +49,13 @@
{% set need_swap = true %}
{% do to_drop.append(backup_relation) %}
{% else %}
{% set tmp_relation = make_temp_relation(target_relation) %}
{% do run_query(create_table_as(True, tmp_relation, sql)) %}
{% do adapter.expand_target_column_types(
{% do run_query(create_table_as(True, tmp_relation, sql)) %}
{% do adapter.expand_target_column_types(
from_relation=tmp_relation,
to_relation=target_relation) %}
{% set build_sql = incremental_upsert(tmp_relation, target_relation, unique_key=unique_key) %}
{% do process_schema_changes(on_schema_change, tmp_relation, existing_relation) %}
{% set build_sql = incremental_upsert(tmp_relation, target_relation, unique_key=unique_key) %}
{% endif %}
{% call statement("main") %}

View File

@@ -0,0 +1,164 @@
{% macro incremental_validate_on_schema_change(on_schema_change, default='ignore') %}
{% if on_schema_change not in ['sync_all_columns', 'append_new_columns', 'fail', 'ignore'] %}
{% set log_message = 'Invalid value for on_schema_change (%s) specified. Setting default value of %s.' % (on_schema_change, default) %}
{% do log(log_message) %}
{{ return(default) }}
{% else %}
{{ return(on_schema_change) }}
{% endif %}
{% endmacro %}
{% macro diff_columns(source_columns, target_columns) %}
{% set result = [] %}
{% set source_names = source_columns | map(attribute = 'column') | list %}
{% set target_names = target_columns | map(attribute = 'column') | list %}
{# --check whether the name attribute exists in the target - this does not perform a data type check #}
{% for sc in source_columns %}
{% if sc.name not in target_names %}
{{ result.append(sc) }}
{% endif %}
{% endfor %}
{{ return(result) }}
{% endmacro %}
{% macro diff_column_data_types(source_columns, target_columns) %}
{% set result = [] %}
{% for sc in source_columns %}
{% set tc = target_columns | selectattr("name", "equalto", sc.name) | list | first %}
{% if tc %}
{% if sc.data_type != tc.data_type %}
{{ result.append( { 'column_name': tc.name, 'new_type': sc.data_type } ) }}
{% endif %}
{% endif %}
{% endfor %}
{{ return(result) }}
{% endmacro %}
{% macro check_for_schema_changes(source_relation, target_relation) %}
{% set schema_changed = False %}
{%- set source_columns = adapter.get_columns_in_relation(source_relation) -%}
{%- set target_columns = adapter.get_columns_in_relation(target_relation) -%}
{%- set source_not_in_target = diff_columns(source_columns, target_columns) -%}
{%- set target_not_in_source = diff_columns(target_columns, source_columns) -%}
{% set new_target_types = diff_column_data_types(source_columns, target_columns) %}
{% if source_not_in_target != [] %}
{% set schema_changed = True %}
{% elif target_not_in_source != [] or new_target_types != [] %}
{% set schema_changed = True %}
{% elif new_target_types != [] %}
{% set schema_changed = True %}
{% endif %}
{% set changes_dict = {
'schema_changed': schema_changed,
'source_not_in_target': source_not_in_target,
'target_not_in_source': target_not_in_source,
'new_target_types': new_target_types
} %}
{% set msg %}
In {{ target_relation }}:
Schema changed: {{ schema_changed }}
Source columns not in target: {{ source_not_in_target }}
Target columns not in source: {{ target_not_in_source }}
New column types: {{ new_target_types }}
{% endset %}
{% do log(msg) %}
{{ return(changes_dict) }}
{% endmacro %}
{% macro sync_column_schemas(on_schema_change, target_relation, schema_changes_dict) %}
{%- set add_to_target_arr = schema_changes_dict['source_not_in_target'] -%}
{%- if on_schema_change == 'append_new_columns'-%}
{%- if add_to_target_arr | length > 0 -%}
{%- do alter_relation_add_remove_columns(target_relation, add_to_target_arr, none) -%}
{%- endif -%}
{% elif on_schema_change == 'sync_all_columns' %}
{%- set remove_from_target_arr = schema_changes_dict['target_not_in_source'] -%}
{%- set new_target_types = schema_changes_dict['new_target_types'] -%}
{% if add_to_target_arr | length > 0 or remove_from_target_arr | length > 0 %}
{%- do alter_relation_add_remove_columns(target_relation, add_to_target_arr, remove_from_target_arr) -%}
{% endif %}
{% if new_target_types != [] %}
{% for ntt in new_target_types %}
{% set column_name = ntt['column_name'] %}
{% set new_type = ntt['new_type'] %}
{% do alter_column_type(target_relation, column_name, new_type) %}
{% endfor %}
{% endif %}
{% endif %}
{% set schema_change_message %}
In {{ target_relation }}:
Schema change approach: {{ on_schema_change }}
Columns added: {{ add_to_target_arr }}
Columns removed: {{ remove_from_target_arr }}
Data types changed: {{ new_target_types }}
{% endset %}
{% do log(schema_change_message) %}
{% endmacro %}
{% macro process_schema_changes(on_schema_change, source_relation, target_relation) %}
{% if on_schema_change != 'ignore' %}
{% set schema_changes_dict = check_for_schema_changes(source_relation, target_relation) %}
{% if schema_changes_dict['schema_changed'] %}
{% if on_schema_change == 'fail' %}
{% set fail_msg %}
The source and target schemas on this incremental model are out of sync!
They can be reconciled in several ways:
- set the `on_schema_change` config to either append_new_columns or sync_all_columns, depending on your situation.
- Re-run the incremental model with `full_refresh: True` to update the target schema.
- update the schema manually and re-run the process.
{% endset %}
{% do exceptions.raise_compiler_error(fail_msg) %}
{# -- unless we ignore, run the sync operation per the config #}
{% else %}
{% do sync_column_schemas(on_schema_change, target_relation, schema_changes_dict) %}
{% endif %}
{% endif %}
{% endif %}
{% endmacro %}

View File

@@ -21,7 +21,6 @@
and DBT_INTERNAL_SOURCE.dbt_change_type = 'insert'
then insert ({{ insert_cols_csv }})
values ({{ insert_cols_csv }})
;
{% endmacro %}

View File

@@ -48,7 +48,7 @@
-- cleanup
{% if old_relation is not none %}
{{ adapter.rename_relation(target_relation, backup_relation) }}
{{ adapter.rename_relation(old_relation, backup_relation) }}
{% endif %}
{{ adapter.rename_relation(intermediate_relation, target_relation) }}

View File

@@ -4,6 +4,7 @@
{% endmacro %}
{% macro default__handle_existing_table(full_refresh, old_relation) %}
{{ log("Dropping relation " ~ old_relation ~ " because it is of type " ~ old_relation.type) }}
{{ adapter.drop_relation(old_relation) }}
{% endmacro %}
@@ -19,7 +20,7 @@
*/
#}
{% macro create_or_replace_view(run_outside_transaction_hooks=True) %}
{% macro create_or_replace_view() %}
{%- set identifier = model['alias'] -%}
{%- set old_relation = adapter.get_relation(database=database, schema=schema, identifier=identifier) -%}
@@ -30,13 +31,7 @@
identifier=identifier, schema=schema, database=database,
type='view') -%}
{% if run_outside_transaction_hooks %}
-- no transactions on BigQuery
{{ run_hooks(pre_hooks, inside_transaction=False) }}
{% endif %}
-- `BEGIN` happens here on Snowflake
{{ run_hooks(pre_hooks, inside_transaction=True) }}
{{ run_hooks(pre_hooks) }}
-- If there's a table with the same name and we weren't told to full refresh,
-- that's an error. If we were told to full refresh, drop it. This behavior differs
@@ -50,14 +45,7 @@
{{ create_view_as(target_relation, sql) }}
{%- endcall %}
{{ run_hooks(post_hooks, inside_transaction=True) }}
{{ adapter.commit() }}
{% if run_outside_transaction_hooks %}
-- No transactions on BigQuery
{{ run_hooks(post_hooks, inside_transaction=False) }}
{% endif %}
{{ run_hooks(post_hooks) }}
{{ return({'relations': [target_relation]}) }}

View File

@@ -54,7 +54,7 @@
-- cleanup
-- move the existing view out of the way
{% if old_relation is not none %}
{{ adapter.rename_relation(target_relation, backup_relation) }}
{{ adapter.rename_relation(old_relation, backup_relation) }}
{% endif %}
{{ adapter.rename_relation(intermediate_relation, target_relation) }}

View File

@@ -1,16 +1,23 @@
{% macro default__test_relationships(model, column_name, to, field) %}
with child as (
select * from {{ model }}
where {{ column_name }} is not null
),
parent as (
select * from {{ to }}
)
select
child.{{ column_name }}
from {{ model }} as child
left join {{ to }} as parent
from child
left join parent
on child.{{ column_name }} = parent.{{ field }}
where child.{{ column_name }} is not null
and parent.{{ field }} is null
where parent.{{ field }} is null
{% endmacro %}

File diff suppressed because one or more lines are too long

View File

@@ -43,6 +43,15 @@ DEBUG_LOG_FORMAT = (
'{record.message}'
)
SECRET_ENV_PREFIX = 'DBT_ENV_SECRET_'
def get_secret_env() -> List[str]:
return [
v for k, v in os.environ.items()
if k.startswith(SECRET_ENV_PREFIX)
]
ExceptionInformation = str
@@ -333,6 +342,12 @@ class TimestampNamed(logbook.Processor):
record.extra[self.name] = datetime.utcnow().isoformat()
class ScrubSecrets(logbook.Processor):
def process(self, record):
for secret in get_secret_env():
record.message = record.message.replace(secret, "*****")
logger = logbook.Logger('dbt')
# provide this for the cache, disabled by default
CACHE_LOGGER = logbook.Logger('dbt.cache')
@@ -473,7 +488,8 @@ class LogManager(logbook.NestedSetup):
self._file_handler = DelayedFileHandler()
self._relevel_processor = Relevel(allowed=['dbt', 'werkzeug'])
self._state_processor = DbtProcessState('internal')
# keep track of wheter we've already entered to decide if we should
self._scrub_processor = ScrubSecrets()
# keep track of whether we've already entered to decide if we should
# be actually pushing. This allows us to log in main() and also
# support entering dbt execution via handle_and_check.
self._stack_depth = 0
@@ -483,6 +499,7 @@ class LogManager(logbook.NestedSetup):
self._file_handler,
self._relevel_processor,
self._state_processor,
self._scrub_processor
])
def push_application(self):

View File

@@ -41,6 +41,7 @@ class DBTVersion(argparse.Action):
"""This is very very similar to the builtin argparse._Version action,
except it just calls dbt.version.get_version_information().
"""
def __init__(self,
option_strings,
version=None,
@@ -755,23 +756,14 @@ def _build_test_subparser(subparsers, base_subparser):
return sub
def _build_source_snapshot_freshness_subparser(subparsers, base_subparser):
def _build_source_freshness_subparser(subparsers, base_subparser):
sub = subparsers.add_parser(
'snapshot-freshness',
'freshness',
parents=[base_subparser],
help='''
Snapshots the current freshness of the project's sources
''',
)
sub.add_argument(
'-s',
'--select',
required=False,
nargs='+',
help='''
Specify the sources to snapshot freshness
''',
dest='selected'
aliases=['snapshot-freshness'],
)
sub.add_argument(
'-o',
@@ -792,9 +784,16 @@ def _build_source_snapshot_freshness_subparser(subparsers, base_subparser):
)
sub.set_defaults(
cls=freshness_task.FreshnessTask,
which='snapshot-freshness',
rpc_method='snapshot-freshness',
which='source-freshness',
rpc_method='source-freshness',
)
_add_select_argument(
sub,
dest='select',
metavar='SELECTOR',
required=False,
)
_add_common_selector_arguments(sub)
return sub
@@ -1073,18 +1072,18 @@ def parse_args(args, cls=DBTArgumentParser):
seed_sub = _build_seed_subparser(subs, base_subparser)
# --threads, --no-version-check
_add_common_arguments(run_sub, compile_sub, generate_sub, test_sub,
rpc_sub, seed_sub, parse_sub)
rpc_sub, seed_sub, parse_sub, build_sub)
# --models, --exclude
# list_sub sets up its own arguments.
_add_selection_arguments(build_sub, run_sub, compile_sub, generate_sub, test_sub)
_add_selection_arguments(snapshot_sub, seed_sub, models_name='select')
# --defer
_add_defer_argument(run_sub, test_sub)
_add_defer_argument(run_sub, test_sub, build_sub)
# --full-refresh
_add_table_mutability_arguments(run_sub, compile_sub)
_add_table_mutability_arguments(run_sub, compile_sub, build_sub)
_build_docs_serve_subparser(docs_subs, base_subparser)
_build_source_snapshot_freshness_subparser(source_subs, base_subparser)
_build_source_freshness_subparser(source_subs, base_subparser)
_build_run_operation_subparser(subs, base_subparser)
if len(args) == 0:

View File

@@ -2,7 +2,7 @@ from dataclasses import dataclass
from dataclasses import field
import os
from typing import (
Dict, Optional, Mapping, Callable, Any, List, Type, Union
Dict, Optional, Mapping, Callable, Any, List, Type, Union, Tuple
)
import time
@@ -59,13 +59,23 @@ from dbt.parser.sources import SourcePatcher
from dbt.ui import warning_tag
from dbt.version import __version__
from dbt.dataclass_schema import dbtClassMixin
from dbt.dataclass_schema import StrEnum, dbtClassMixin
PARTIAL_PARSE_FILE_NAME = 'partial_parse.msgpack'
PARSING_STATE = DbtProcessState('parsing')
DEFAULT_PARTIAL_PARSE = False
class ReparseReason(StrEnum):
version_mismatch = '01_version_mismatch'
file_not_found = '02_file_not_found'
vars_changed = '03_vars_changed'
profile_changed = '04_profile_changed'
deps_changed = '05_deps_changed'
project_config_changed = '06_project_config_changed'
load_file_failure = '07_load_file_failure'
# Part of saved performance info
@dataclass
class ParserInfo(dbtClassMixin):
@@ -379,10 +389,10 @@ class ManifestLoader:
if not self.partially_parsing and HookParser in parser_types:
hook_parser = HookParser(project, self.manifest, self.root_project)
path = hook_parser.get_path()
file_block = FileBlock(
load_source_file(path, ParseFileType.Hook, project.project_name)
)
hook_parser.parse_file(file_block)
file = load_source_file(path, ParseFileType.Hook, project.project_name)
if file:
file_block = FileBlock(file)
hook_parser.parse_file(file_block)
# Store the performance info
elapsed = time.perf_counter() - start_timer
@@ -441,24 +451,28 @@ class ManifestLoader:
except Exception:
raise
def matching_parse_results(self, manifest: Manifest) -> bool:
def is_partial_parsable(self, manifest: Manifest) -> Tuple[bool, Optional[str]]:
"""Compare the global hashes of the read-in parse results' values to
the known ones, and return if it is ok to re-use the results.
"""
valid = True
reparse_reason = None
if manifest.metadata.dbt_version != __version__:
logger.info("Unable to do partial parsing because of a dbt version mismatch")
return False # If the version is wrong, the other checks might not work
# If the version is wrong, the other checks might not work
return False, ReparseReason.version_mismatch
if self.manifest.state_check.vars_hash != manifest.state_check.vars_hash:
logger.info("Unable to do partial parsing because config vars, "
"config profile, or config target have changed")
valid = False
reparse_reason = ReparseReason.vars_changed
if self.manifest.state_check.profile_hash != manifest.state_check.profile_hash:
# Note: This should be made more granular. We shouldn't need to invalidate
# partial parsing if a non-used profile section has changed.
logger.info("Unable to do partial parsing because profile has changed")
valid = False
reparse_reason = ReparseReason.profile_changed
missing_keys = {
k for k in self.manifest.state_check.project_hashes
@@ -467,6 +481,7 @@ class ManifestLoader:
if missing_keys:
logger.info("Unable to do partial parsing because a project dependency has been added")
valid = False
reparse_reason = ReparseReason.deps_changed
for key, new_value in self.manifest.state_check.project_hashes.items():
if key in manifest.state_check.project_hashes:
@@ -475,7 +490,8 @@ class ManifestLoader:
logger.info("Unable to do partial parsing because "
"a project config has changed")
valid = False
return valid
reparse_reason = ReparseReason.project_config_changed
return valid, reparse_reason
def _partial_parse_enabled(self):
# if the CLI is set, follow that
@@ -494,6 +510,8 @@ class ManifestLoader:
path = os.path.join(self.root_project.target_path,
PARTIAL_PARSE_FILE_NAME)
reparse_reason = None
if os.path.exists(path):
try:
with open(path, 'rb') as fp:
@@ -502,7 +520,8 @@ class ManifestLoader:
# keep this check inside the try/except in case something about
# the file has changed in weird ways, perhaps due to being a
# different version of dbt
if self.matching_parse_results(manifest):
is_partial_parseable, reparse_reason = self.is_partial_parsable(manifest)
if is_partial_parseable:
return manifest
except Exception as exc:
logger.debug(
@@ -510,8 +529,13 @@ class ManifestLoader:
.format(path, exc),
exc_info=True
)
reparse_reason = ReparseReason.load_file_failure
else:
logger.info(f"Unable to do partial parsing because {path} not found")
reparse_reason = ReparseReason.file_not_found
# this event is only fired if a full reparse is needed
dbt.tracking.track_partial_parser({'full_reparse_reason': reparse_reason})
return None

View File

@@ -7,9 +7,8 @@ from dbt.parser.search import FileBlock
import dbt.tracking as tracking
from dbt import utils
from dbt_extractor import ExtractionError, py_extract_from_source # type: ignore
import itertools
import random
from typing import Any, Dict, List, Tuple
from typing import Any, Dict, List
class ModelParser(SimpleSQLParser[ParsedModelNode]):
@@ -40,9 +39,9 @@ class ModelParser(SimpleSQLParser[ParsedModelNode]):
experimentally_parsed: Dict[str, List[Any]] = py_extract_from_source(node.raw_sql)
# second config format
config_calls: List[Dict[str, str]] = []
config_call_dict: Dict[str, Any] = {}
for c in experimentally_parsed['configs']:
config_calls.append({c[0]: c[1]})
ContextConfig._add_config_call(config_call_dict, {c[0]: c[1]})
# format sources TODO change extractor to match this type
source_calls: List[List[str]] = []
@@ -64,22 +63,15 @@ class ModelParser(SimpleSQLParser[ParsedModelNode]):
if isinstance(experimentally_parsed, Exception):
result += ["01_experimental_parser_cannot_parse"]
else:
# rearrange existing configs to match:
real_configs: List[Tuple[str, Any]] = list(
itertools.chain.from_iterable(
map(lambda x: x.items(), config._config_calls)
)
)
# look for false positive configs
for c in experimentally_parsed['configs']:
if c not in real_configs:
for k in config_call_dict.keys():
if k not in config._config_call_dict:
result += ["02_false_positive_config_value"]
break
# look for missed configs
for c in real_configs:
if c not in experimentally_parsed['configs']:
for k in config._config_call_dict.keys():
if k not in config_call_dict:
result += ["03_missed_config_value"]
break
@@ -127,7 +119,7 @@ class ModelParser(SimpleSQLParser[ParsedModelNode]):
# since it doesn't need python jinja, fit the refs, sources, and configs
# into the node. Down the line the rest of the node will be updated with
# this information. (e.g. depends_on etc.)
config._config_calls = config_calls
config._config_call_dict = config_call_dict
# this uses the updated config to set all the right things in the node.
# if there are hooks present, it WILL render jinja. Will need to change
@@ -138,11 +130,12 @@ class ModelParser(SimpleSQLParser[ParsedModelNode]):
# values from yaml files are in there already
node.unrendered_config.update(dict(experimentally_parsed['configs']))
# set refs, sources, and configs on the node object
# set refs and sources on the node object
node.refs += experimentally_parsed['refs']
node.sources += experimentally_parsed['sources']
for configv in experimentally_parsed['configs']:
node.config[configv[0]] = configv[1]
# configs don't need to be merged into the node
# setting them in config._config_call_dict is sufficient
self.manifest._parsing_info.static_analysis_parsed_path_count += 1

View File

@@ -147,6 +147,18 @@ class PartialParsing:
file_id not in self.file_diff['deleted']):
self.project_parser_files[project_name][parser_name].append(file_id)
def already_scheduled_for_parsing(self, source_file):
file_id = source_file.file_id
project_name = source_file.project_name
if project_name not in self.project_parser_files:
return False
parser_name = parse_file_type_to_parser[source_file.parse_file_type]
if parser_name not in self.project_parser_files[project_name]:
return False
if file_id not in self.project_parser_files[project_name][parser_name]:
return False
return True
# Add new files, including schema files
def add_to_saved(self, file_id):
# add file object to saved manifest.files
@@ -211,6 +223,9 @@ class PartialParsing:
# Updated schema files should have been processed already.
def update_mssat_in_saved(self, new_source_file, old_source_file):
if self.already_scheduled_for_parsing(old_source_file):
return
# These files only have one node.
unique_id = old_source_file.nodes[0]
@@ -251,12 +266,16 @@ class PartialParsing:
schema_file.node_patches.remove(unique_id)
def update_macro_in_saved(self, new_source_file, old_source_file):
if self.already_scheduled_for_parsing(old_source_file):
return
self.handle_macro_file_links(old_source_file, follow_references=True)
file_id = new_source_file.file_id
self.saved_files[file_id] = new_source_file
self.add_to_pp_files(new_source_file)
def update_doc_in_saved(self, new_source_file, old_source_file):
if self.already_scheduled_for_parsing(old_source_file):
return
self.delete_doc_node(old_source_file)
self.saved_files[new_source_file.file_id] = new_source_file
self.add_to_pp_files(new_source_file)
@@ -343,7 +362,8 @@ class PartialParsing:
for unique_id in macros:
if unique_id not in self.saved_manifest.macros:
# This happens when a macro has already been removed
source_file.macros.remove(unique_id)
if unique_id in source_file.macros:
source_file.macros.remove(unique_id)
continue
base_macro = self.saved_manifest.macros.pop(unique_id)
@@ -369,7 +389,9 @@ class PartialParsing:
macro_patch = self.get_schema_element(macro_patches, base_macro.name)
self.delete_schema_macro_patch(schema_file, macro_patch)
self.merge_patch(schema_file, 'macros', macro_patch)
source_file.macros.remove(unique_id)
# The macro may have already been removed by handling macro children
if unique_id in source_file.macros:
source_file.macros.remove(unique_id)
# similar to schedule_nodes_for_parsing but doesn't do sources and exposures
# and handles schema tests
@@ -385,12 +407,21 @@ class PartialParsing:
patch_list = []
if key in schema_file.dict_from_yaml:
patch_list = schema_file.dict_from_yaml[key]
node_patch = self.get_schema_element(patch_list, name)
if node_patch:
self.delete_schema_mssa_links(schema_file, key, node_patch)
self.merge_patch(schema_file, key, node_patch)
if unique_id in schema_file.node_patches:
schema_file.node_patches.remove(unique_id)
patch = self.get_schema_element(patch_list, name)
if patch:
if key in ['models', 'seeds', 'snapshots']:
self.delete_schema_mssa_links(schema_file, key, patch)
self.merge_patch(schema_file, key, patch)
if unique_id in schema_file.node_patches:
schema_file.node_patches.remove(unique_id)
elif key == 'sources':
# re-schedule source
if 'overrides' in patch:
# This is a source patch; need to re-parse orig source
self.remove_source_override_target(patch)
self.delete_schema_source(schema_file, patch)
self.remove_tests(schema_file, 'sources', patch['name'])
self.merge_patch(schema_file, 'sources', patch)
else:
file_id = node.file_id
if file_id in self.saved_files and file_id not in self.file_diff['deleted']:
@@ -426,7 +457,13 @@ class PartialParsing:
new_schema_file = self.new_files[file_id]
saved_yaml_dict = saved_schema_file.dict_from_yaml
new_yaml_dict = new_schema_file.dict_from_yaml
saved_schema_file.pp_dict = {"version": saved_yaml_dict['version']}
if 'version' in new_yaml_dict:
# despite the fact that this goes in the saved_schema_file, it
# should represent the new yaml dictionary, and should produce
# an error if the updated yaml file doesn't have a version
saved_schema_file.pp_dict = {"version": new_yaml_dict['version']}
else:
saved_schema_file.pp_dict = {}
self.handle_schema_file_changes(saved_schema_file, saved_yaml_dict, new_yaml_dict)
# copy from new schema_file to saved_schema_file to preserve references
@@ -634,19 +671,17 @@ class PartialParsing:
def delete_schema_macro_patch(self, schema_file, macro):
# This is just macro patches that need to be reapplied
for unique_id in schema_file.macro_patches:
parts = unique_id.split('.')
macro_name = parts[-1]
if macro_name == macro['name']:
macro_unique_id = unique_id
break
macro_unique_id = None
if macro['name'] in schema_file.macro_patches:
macro_unique_id = schema_file.macro_patches[macro['name']]
del schema_file.macro_patches[macro['name']]
if macro_unique_id and macro_unique_id in self.saved_manifest.macros:
macro = self.saved_manifest.macros.pop(macro_unique_id)
self.deleted_manifest.macros[macro_unique_id] = macro
macro_file_id = macro.file_id
self.add_to_pp_files(self.saved_files[macro_file_id])
if macro_unique_id in schema_file.macro_patches:
schema_file.macro_patches.remove(macro_unique_id)
if macro_file_id in self.new_files:
self.saved_files[macro_file_id] = self.new_files[macro_file_id]
self.add_to_pp_files(self.saved_files[macro_file_id])
# exposures are created only from schema files, so just delete
# the exposure.

View File

@@ -6,12 +6,13 @@ from dbt.contracts.files import (
from dbt.parser.schemas import yaml_from_file, schema_file_keys, check_format_version
from dbt.exceptions import CompilationException
from dbt.parser.search import FilesystemSearcher
from typing import Optional
# This loads the files contents and creates the SourceFile object
def load_source_file(
path: FilePath, parse_file_type: ParseFileType,
project_name: str) -> AnySourceFile:
project_name: str) -> Optional[AnySourceFile]:
file_contents = load_file_contents(path.absolute_path, strip=False)
checksum = FileHash.from_contents(file_contents)
sf_cls = SchemaSourceFile if parse_file_type == ParseFileType.Schema else SourceFile
@@ -20,8 +21,11 @@ def load_source_file(
source_file.contents = file_contents.strip()
if parse_file_type == ParseFileType.Schema and source_file.contents:
dfy = yaml_from_file(source_file)
validate_yaml(source_file.path.original_file_path, dfy)
source_file.dfy = dfy
if dfy:
validate_yaml(source_file.path.original_file_path, dfy)
source_file.dfy = dfy
else:
source_file = None
return source_file
@@ -76,8 +80,10 @@ def get_source_files(project, paths, extension, parse_file_type):
if parse_file_type == ParseFileType.Seed:
fb_list.append(load_seed_source_file(fp, project.project_name))
else:
fb_list.append(load_source_file(
fp, parse_file_type, project.project_name))
file = load_source_file(fp, parse_file_type, project.project_name)
# only append the list if it has contents. added to fix #3568
if file:
fb_list.append(file)
return fb_list

View File

@@ -171,15 +171,15 @@ class SchemaParser(SimpleParser[SchemaTestBlock, ParsedSchemaTestNode]):
self.project.config_version == 2
)
if all_v_2:
ctx = generate_schema_yml(
self.render_ctx = generate_schema_yml(
self.root_project, self.project.project_name
)
else:
ctx = generate_target_context(
self.render_ctx = generate_target_context(
self.root_project, self.root_project.cli_vars
)
self.raw_renderer = SchemaYamlRenderer(ctx)
self.raw_renderer = SchemaYamlRenderer(self.render_ctx)
internal_package_names = get_adapter_package_names(
self.root_project.credentials.type
@@ -287,17 +287,13 @@ class SchemaParser(SimpleParser[SchemaTestBlock, ParsedSchemaTestNode]):
tags: List[str],
column_name: Optional[str],
) -> ParsedSchemaTestNode:
render_ctx = generate_target_context(
self.root_project, self.root_project.cli_vars
)
try:
builder = TestBuilder(
test=test,
target=target,
column_name=column_name,
package_name=target.package_name,
render_ctx=render_ctx,
render_ctx=self.render_ctx,
)
except CompilationException as exc:
context = _trimmed(str(target))

View File

@@ -286,7 +286,7 @@ class SourcePatcher:
)
return generator.calculate_node_config(
config_calls=[],
config_call_dict={},
fqn=fqn,
resource_type=NodeType.Source,
project_name=project_name,

View File

@@ -1,5 +1,8 @@
from dataclasses import dataclass
import re
from typing import List
from packaging import version as packaging_version
from dbt.exceptions import VersionsNotCompatibleException
import dbt.utils
@@ -125,12 +128,26 @@ class VersionSpecifier(VersionSpecification):
if self.is_unbounded or other.is_unbounded:
return 0
for key in ['major', 'minor', 'patch']:
comparison = int(getattr(self, key)) - int(getattr(other, key))
if comparison > 0:
for key in ['major', 'minor', 'patch', 'prerelease']:
(a, b) = (getattr(self, key), getattr(other, key))
if key == 'prerelease':
if a is None and b is None:
continue
if a is None:
if self.matcher == Matchers.LESS_THAN:
# If 'a' is not a pre-release but 'b' is, and b must be
# less than a, return -1 to prevent installations of
# pre-releases with greater base version than a
# maximum specified non-pre-release version.
return -1
# Otherwise, stable releases are considered greater than
# pre-release
return 1
if b is None:
return -1
if packaging_version.parse(a) > packaging_version.parse(b):
return 1
elif comparison < 0:
elif packaging_version.parse(a) < packaging_version.parse(b):
return -1
equal = ((self.matcher == Matchers.GREATER_THAN_OR_EQUAL and
@@ -408,10 +425,23 @@ def resolve_to_specific_version(requested_range, available_versions):
version = VersionSpecifier.from_version_string(version_string)
if(versions_compatible(version,
requested_range.start,
requested_range.end) and
requested_range.start, requested_range.end) and
(max_version is None or max_version.compare(version) < 0)):
max_version = version
max_version_string = version_string
return max_version_string
def filter_installable(
versions: List[str],
install_prerelease: bool
) -> List[str]:
if install_prerelease:
return versions
installable = []
for version_string in versions:
version = VersionSpecifier.from_version_string(version_string)
if not version.prerelease:
installable.append(version_string)
return installable

View File

@@ -158,7 +158,7 @@ class ConfiguredTask(BaseTask):
INTERNAL_ERROR_STRING = """This is an error in dbt. Please try again. If \
the error persists, open an issue at https://github.com/fishtown-analytics/dbt
the error persists, open an issue at https://github.com/dbt-labs/dbt
""".strip()

View File

@@ -1,6 +1,4 @@
from .compile import CompileTask
from .run import ModelRunner as run_model_runner
from .run import RunTask, ModelRunner as run_model_runner
from .snapshot import SnapshotRunner as snapshot_model_runner
from .seed import SeedRunner as seed_runner
from .test import TestRunner as test_runner
@@ -10,7 +8,7 @@ from dbt.exceptions import InternalException
from dbt.node_types import NodeType
class BuildTask(CompileTask):
class BuildTask(RunTask):
"""The Build task processes all assets of a given process and attempts to 'build'
them in an opinionated fashion. Every resource type outlined in RUNNER_MAP
will be processed by the mapped runner class.

View File

@@ -19,7 +19,7 @@ from dbt.exceptions import RuntimeException, InternalException
from dbt.logger import print_timestamped_line
from dbt.node_types import NodeType
from dbt.graph import NodeSelector, SelectionSpec, parse_difference
from dbt.graph import ResourceTypeSelector, SelectionSpec, parse_difference
from dbt.contracts.graph.parsed import ParsedSourceDefinition
@@ -117,7 +117,7 @@ class FreshnessRunner(BaseRunner):
return self.node
class FreshnessSelector(NodeSelector):
class FreshnessSelector(ResourceTypeSelector):
def node_is_match(self, node):
if not super().node_is_match(node):
return False
@@ -137,11 +137,16 @@ class FreshnessTask(GraphRunnableTask):
return False
def get_selection_spec(self) -> SelectionSpec:
include = [
'source:{}'.format(s)
for s in (self.args.selected or ['*'])
]
spec = parse_difference(include, None)
"""Generates a selection spec from task arguments to use when
processing graph. A SelectionSpec describes what nodes to select
when creating queue from graph of nodes.
"""
if self.args.selector_name:
# use pre-defined selector (--selector) to create selection spec
spec = self.config.get_selector(self.args.selector_name)
else:
# use --select and --exclude args to create selection spec
spec = parse_difference(self.args.select, self.args.exclude)
return spec
def get_node_selector(self):
@@ -153,6 +158,7 @@ class FreshnessTask(GraphRunnableTask):
graph=self.graph,
manifest=self.manifest,
previous_state=self.previous_state,
resource_types=[NodeType.Source]
)
def get_runner_type(self, _):

View File

@@ -87,9 +87,12 @@ def print_hook_end_line(
def print_skip_line(
model, schema: str, relation: str, index: int, num_models: int
node, schema: str, relation: str, index: int, num_models: int
) -> None:
msg = 'SKIP relation {}.{}'.format(schema, relation)
if node.resource_type in NodeType.refable():
msg = f'SKIP relation {schema}.{relation}'
else:
msg = f'SKIP {node.resource_type} {node.name}'
print_fancy_output_line(
msg, ui.yellow('SKIP'), logger.info, index, num_models)

View File

@@ -21,6 +21,7 @@ from dbt.contracts.rpc import (
RPCSnapshotParameters,
RPCSourceFreshnessParameters,
RPCListParameters,
RPCBuildParameters,
)
from dbt.exceptions import RuntimeException
from dbt.rpc.method import (
@@ -37,6 +38,7 @@ from dbt.task.seed import SeedTask
from dbt.task.snapshot import SnapshotTask
from dbt.task.test import TestTask
from dbt.task.list import ListTask
from dbt.task.build import BuildTask
from .base import RPCTask
from .cli import HasCLI
@@ -228,15 +230,24 @@ class RemoteSourceFreshnessTask(
RPCCommandTask[RPCSourceFreshnessParameters],
FreshnessTask
):
METHOD_NAME = 'snapshot-freshness'
METHOD_NAME = 'source-freshness'
def set_args(self, params: RPCSourceFreshnessParameters) -> None:
self.args.selected = self._listify(params.select)
self.args.select = self._listify(params.select)
self.args.exclude = self._listify(params.exclude)
self.args.selector_name = params.selector
if params.threads is not None:
self.args.threads = params.threads
self.args.output = None
class RemoteSourceSnapshotFreshnessTask(
RemoteSourceFreshnessTask
):
""" Deprecated task method name, aliases to `source-freshness` """
METHOD_NAME = 'snapshot-freshness'
# this is a weird and special method.
class GetManifest(
RemoteManifestMethod[GetManifestParameters, GetManifestResult]
@@ -296,3 +307,22 @@ class RemoteListTask(
output=[json.loads(x) for x in results],
logs=None
)
class RemoteBuildProjectTask(RPCCommandTask[RPCBuildParameters], BuildTask):
METHOD_NAME = 'build'
def set_args(self, params: RPCBuildParameters) -> None:
self.args.models = self._listify(params.models)
self.args.exclude = self._listify(params.exclude)
self.args.selector_name = params.selector
if params.threads is not None:
self.args.threads = params.threads
if params.defer is None:
self.args.defer = flags.DEFER_MODE
else:
self.args.defer = params.defer
self.args.state = state_path(params.state)
self.set_previous_state()

View File

@@ -31,6 +31,7 @@ DEPRECATION_WARN_SPEC = 'iglu:com.dbt/deprecation_warn/jsonschema/1-0-0'
LOAD_ALL_TIMING_SPEC = 'iglu:com.dbt/load_all_timing/jsonschema/1-0-3'
RESOURCE_COUNTS = 'iglu:com.dbt/resource_counts/jsonschema/1-0-0'
EXPERIMENTAL_PARSER = 'iglu:com.dbt/experimental_parser/jsonschema/1-0-0'
PARTIAL_PARSER = 'iglu:com.dbt/partial_parser/jsonschema/1-0-0'
DBT_INVOCATION_ENV = 'DBT_INVOCATION_ENV'
@@ -131,7 +132,7 @@ class User:
# will change in every dbt invocation until the user points to a
# profile dir file which contains a valid profiles.yml file.
#
# See: https://github.com/fishtown-analytics/dbt/issues/1645
# See: https://github.com/dbt-labs/dbt/issues/1645
user = {"id": str(uuid.uuid4())}
@@ -426,7 +427,7 @@ def track_invalid_invocation(
def track_experimental_parser_sample(options):
context = [SelfDescribingJson(EXPERIMENTAL_PARSER, options)]
assert active_user is not None, \
'Cannot track project loading time when active user is None'
'Cannot track experimental parser info when active user is None'
track(
active_user,
@@ -437,9 +438,28 @@ def track_experimental_parser_sample(options):
)
def track_partial_parser(options):
context = [SelfDescribingJson(PARTIAL_PARSER, options)]
assert active_user is not None, \
'Cannot track partial parser info when active user is None'
track(
active_user,
category='dbt',
action='partial_parser',
label=active_user.invocation_id,
context=context
)
def flush():
logger.debug("Flushing usage events")
tracker.flush()
try:
tracker.flush()
except Exception:
logger.debug(
"An error was encountered while trying to flush usage events"
)
def disable_tracking():

View File

@@ -6,6 +6,7 @@ import decimal
import functools
import hashlib
import itertools
import jinja2
import json
import os
from contextlib import contextmanager
@@ -306,14 +307,16 @@ def timestring() -> str:
class JSONEncoder(json.JSONEncoder):
"""A 'custom' json encoder that does normal json encoder things, but also
handles `Decimal`s. Naturally, this can lose precision because they get
converted to floats.
handles `Decimal`s. and `Undefined`s. Decimals can lose precision because
they get converted to floats. Undefined's are serialized to an empty string
"""
def default(self, obj):
if isinstance(obj, DECIMALS):
return float(obj)
if isinstance(obj, (datetime.datetime, datetime.date, datetime.time)):
return obj.isoformat()
if isinstance(obj, jinja2.Undefined):
return ""
if hasattr(obj, 'to_dict'):
# if we have a to_dict we should try to serialize the result of
# that!

View File

@@ -96,5 +96,5 @@ def _get_dbt_plugins_info():
yield plugin_name, mod.version
__version__ = '0.21.0a1'
__version__ = '0.21.0b1'
installed = get_installed_version()

View File

@@ -284,12 +284,12 @@ def parse_args(argv=None):
parser.add_argument('adapter')
parser.add_argument('--title-case', '-t', default=None)
parser.add_argument('--dependency', action='append')
parser.add_argument('--dbt-core-version', default='0.21.0a1')
parser.add_argument('--dbt-core-version', default='0.21.0b1')
parser.add_argument('--email')
parser.add_argument('--author')
parser.add_argument('--url')
parser.add_argument('--sql', action='store_true')
parser.add_argument('--package-version', default='0.21.0a1')
parser.add_argument('--package-version', default='0.21.0b1')
parser.add_argument('--project-version', default='1.0')
parser.add_argument(
'--no-dependency', action='store_false', dest='set_dependency'

View File

@@ -24,7 +24,7 @@ def read(fname):
package_name = "dbt-core"
package_version = "0.21.0a1"
package_version = "0.21.0b1"
description = """dbt (data build tool) is a command line tool that helps \
analysts and engineers transform data in their warehouse more effectively"""
@@ -34,9 +34,9 @@ setup(
version=package_version,
description=description,
long_description=description,
author="Fishtown Analytics",
author_email="info@fishtownanalytics.com",
url="https://github.com/fishtown-analytics/dbt",
author="dbt Labs",
author_email="info@dbtlabs.com",
url="https://github.com/dbt-labs/dbt",
packages=find_namespace_packages(include=['dbt', 'dbt.*']),
include_package_data = True,
test_suite='test',
@@ -63,7 +63,7 @@ setup(
'networkx>=2.3,<3',
'packaging~=20.9',
'sqlparse>=0.2.3,<0.4',
'dbt-extractor==0.2.0',
'dbt-extractor==0.4.0',
'typing-extensions>=3.7.4,<3.11',
'werkzeug>=1,<3',
# the following are all to match snowflake-connector-python

View File

@@ -0,0 +1,75 @@
agate==1.6.1
asn1crypto==1.4.0
attrs==21.2.0
azure-common==1.1.27
azure-core==1.16.0
azure-storage-blob==12.8.1
Babel==2.9.1
boto3==1.18.12
botocore==1.21.12
cachetools==4.2.2
certifi==2021.5.30
cffi==1.14.6
chardet==4.0.0
charset-normalizer==2.0.4
colorama==0.4.4
cryptography==3.4.7
google-api-core==1.31.1
google-auth==1.34.0
google-cloud-bigquery==2.23.2
google-cloud-core==1.7.2
google-crc32c==1.1.2
google-resumable-media==1.3.3
googleapis-common-protos==1.53.0
grpcio==1.39.0
hologram==0.0.14
idna==3.2
importlib-metadata==4.6.3
isodate==0.6.0
jeepney==0.7.1
Jinja2==2.11.3
jmespath==0.10.0
json-rpc==1.13.0
jsonschema==3.1.1
keyring==21.8.0
leather==0.3.3
Logbook==1.5.3
MarkupSafe==2.0.1
mashumaro==2.5
minimal-snowplow-tracker==0.0.2
msgpack==1.0.2
msrest==0.6.21
networkx==2.6.2
oauthlib==3.1.1
oscrypto==1.2.1
packaging==20.9
parsedatetime==2.6
proto-plus==1.19.0
protobuf==3.17.3
psycopg2-binary==2.9.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pycryptodomex==3.10.1
PyJWT==2.1.0
pyOpenSSL==20.0.1
pyparsing==2.4.7
pyrsistent==0.18.0
python-dateutil==2.8.2
python-slugify==5.0.2
pytimeparse==1.1.8
pytz==2021.1
PyYAML==5.4.1
requests==2.26.0
requests-oauthlib==1.3.0
rsa==4.7.2
s3transfer==0.5.0
SecretStorage==3.3.1
six==1.16.0
snowflake-connector-python==2.5.1
sqlparse==0.3.1
text-unidecode==1.3
typing-extensions==3.10.0.0
urllib3==1.26.6
Werkzeug==2.0.1
zipp==3.5.0

View File

@@ -1,18 +0,0 @@
# Performance Regression Testing
This directory includes dbt project setups to test on and a test runner written in Rust which runs specific dbt commands on each of the projects. Orchestration is done via the GitHub Action workflow in `/.github/workflows/performance.yml`. The workflow is scheduled to run every night, but it can also be triggered manually.
The github workflow hardcodes our baseline branch for performance metrics as `0.20.latest`. As future versions become faster, this branch will be updated to hold us to those new standards.
## Adding a new dbt project
Just make a new directory under `performance/projects/`. It will automatically be picked up by the tests.
## Adding a new dbt command
In `runner/src/measure.rs::measure` add a metric to the `metrics` Vec. The Github Action will handle recompilation if you don't have the rust toolchain installed.
## Future work
- add more projects to test different configurations that have been known bottlenecks
- add more dbt commands to measure
- possibly using the uploaded json artifacts to store these results so they can be graphed over time
- reading new metrics from a file so no one has to edit rust source to add them to the suite
- instead of building the rust every time, we could publish and pull down the latest version.
- instead of manually setting the baseline version of dbt to test, pull down the latest stable version as the baseline.

View File

@@ -1 +0,0 @@
id: 5d0c160e-f817-4b77-bce3-ffb2e37f0c9b

View File

@@ -1,12 +0,0 @@
default:
target: dev
outputs:
dev:
type: postgres
host: localhost
user: dummy
password: dummy_password
port: 5432
dbname: dummy
schema: dummy
threads: 4

View File

@@ -1,38 +0,0 @@
# Name your package! Package names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'my_new_package'
version: 1.0.0
config-version: 2
# This setting configures which "profile" dbt uses for this project. Profiles contain
# database connection information, and should be configured in the ~/.dbt/profiles.yml file
profile: 'default'
# These configurations specify where dbt should look for different types of files.
# The `source-paths` config, for example, states that source models can be found
# in the "models/" directory. You probably won't need to change these!
source-paths: ["models"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
data-paths: ["data"]
macro-paths: ["macros"]
target-path: "target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
- "target"
- "dbt_modules"
# You can define configurations for models in the `source-paths` directory here.
# Using these configurations, you can enable or disable models, change how they
# are materialized, and more!
# In this example config, we tell dbt to build all models in the example/ directory
# as views (the default). These settings can be overridden in the individual model files
# using the `{{ config(...) }}` macro.
models:
my_new_package:
# Applies to all files under models/example/
example:
materialized: view

View File

@@ -1 +0,0 @@
select 1 as id

View File

@@ -1,11 +0,0 @@
models:
- columns:
- name: id
tests:
- unique
- not_null
- relationships:
field: id
to: node_0
name: node_0
version: 2

View File

@@ -1,3 +0,0 @@
select 1 as id
union all
select * from {{ ref('node_0') }}

View File

@@ -1,11 +0,0 @@
models:
- columns:
- name: id
tests:
- unique
- not_null
- relationships:
field: id
to: node_0
name: node_1
version: 2

View File

@@ -1,3 +0,0 @@
select 1 as id
union all
select * from {{ ref('node_0') }}

View File

@@ -1,11 +0,0 @@
models:
- columns:
- name: id
tests:
- unique
- not_null
- relationships:
field: id
to: node_0
name: node_2
version: 2

View File

@@ -1,38 +0,0 @@
# Name your package! Package names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'my_new_package'
version: 1.0.0
config-version: 2
# This setting configures which "profile" dbt uses for this project. Profiles contain
# database connection information, and should be configured in the ~/.dbt/profiles.yml file
profile: 'default'
# These configurations specify where dbt should look for different types of files.
# The `source-paths` config, for example, states that source models can be found
# in the "models/" directory. You probably won't need to change these!
source-paths: ["models"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
data-paths: ["data"]
macro-paths: ["macros"]
target-path: "target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
- "target"
- "dbt_modules"
# You can define configurations for models in the `source-paths` directory here.
# Using these configurations, you can enable or disable models, change how they
# are materialized, and more!
# In this example config, we tell dbt to build all models in the example/ directory
# as views (the default). These settings can be overridden in the individual model files
# using the `{{ config(...) }}` macro.
models:
my_new_package:
# Applies to all files under models/example/
example:
materialized: view

View File

@@ -1 +0,0 @@
select 1 as id

View File

@@ -1,11 +0,0 @@
models:
- columns:
- name: id
tests:
- unique
- not_null
- relationships:
field: id
to: node_0
name: node_0
version: 2

View File

@@ -1,3 +0,0 @@
select 1 as id
union all
select * from {{ ref('node_0') }}

View File

@@ -1,11 +0,0 @@
models:
- columns:
- name: id
tests:
- unique
- not_null
- relationships:
field: id
to: node_0
name: node_1
version: 2

View File

@@ -1,3 +0,0 @@
select 1 as id
union all
select * from {{ ref('node_0') }}

View File

@@ -1,11 +0,0 @@
models:
- columns:
- name: id
tests:
- unique
- not_null
- relationships:
field: id
to: node_0
name: node_2
version: 2

View File

@@ -1,5 +0,0 @@
# all files here are generated results
*
# except this one
!.gitignore

View File

@@ -1,2 +0,0 @@
target/
projects/*/logs

View File

@@ -1,307 +0,0 @@
# This file is automatically @generated by Cargo.
# It is not intended for manual editing.
version = 3
[[package]]
name = "ansi_term"
version = "0.11.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ee49baf6cb617b853aa8d93bf420db2383fab46d314482ca2803b40d5fde979b"
dependencies = [
"winapi",
]
[[package]]
name = "atty"
version = "0.2.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d9b39be18770d11421cdb1b9947a45dd3f37e93092cbf377614828a319d5fee8"
dependencies = [
"hermit-abi",
"libc",
"winapi",
]
[[package]]
name = "bitflags"
version = "1.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cf1de2fe8c75bc145a2f577add951f8134889b4795d47466a54a5c846d691693"
[[package]]
name = "clap"
version = "2.33.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "37e58ac78573c40708d45522f0d80fa2f01cc4f9b4e2bf749807255454312002"
dependencies = [
"ansi_term",
"atty",
"bitflags",
"strsim",
"textwrap",
"unicode-width",
"vec_map",
]
[[package]]
name = "either"
version = "1.6.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e78d4f1cc4ae33bbfc157ed5d5a5ef3bc29227303d595861deb238fcec4e9457"
[[package]]
name = "heck"
version = "0.3.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6d621efb26863f0e9924c6ac577e8275e5e6b77455db64ffa6c65c904e9e132c"
dependencies = [
"unicode-segmentation",
]
[[package]]
name = "hermit-abi"
version = "0.1.19"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "62b467343b94ba476dcb2500d242dadbb39557df889310ac77c5d99100aaac33"
dependencies = [
"libc",
]
[[package]]
name = "itertools"
version = "0.10.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "69ddb889f9d0d08a67338271fa9b62996bc788c7796a5c18cf057420aaed5eaf"
dependencies = [
"either",
]
[[package]]
name = "itoa"
version = "0.4.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dd25036021b0de88a0aff6b850051563c6516d0bf53f8638938edbb9de732736"
[[package]]
name = "lazy_static"
version = "1.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e2abad23fbc42b3700f2f279844dc832adb2b2eb069b2df918f455c4e18cc646"
[[package]]
name = "libc"
version = "0.2.98"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "320cfe77175da3a483efed4bc0adc1968ca050b098ce4f2f1c13a56626128790"
[[package]]
name = "proc-macro-error"
version = "1.0.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "da25490ff9892aab3fcf7c36f08cfb902dd3e71ca0f9f9517bea02a73a5ce38c"
dependencies = [
"proc-macro-error-attr",
"proc-macro2",
"quote",
"syn",
"version_check",
]
[[package]]
name = "proc-macro-error-attr"
version = "1.0.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a1be40180e52ecc98ad80b184934baf3d0d29f979574e439af5a55274b35f869"
dependencies = [
"proc-macro2",
"quote",
"version_check",
]
[[package]]
name = "proc-macro2"
version = "1.0.28"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5c7ed8b8c7b886ea3ed7dde405212185f423ab44682667c8c6dd14aa1d9f6612"
dependencies = [
"unicode-xid",
]
[[package]]
name = "quote"
version = "1.0.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c3d0b9745dc2debf507c8422de05d7226cc1f0644216dfdfead988f9b1ab32a7"
dependencies = [
"proc-macro2",
]
[[package]]
name = "runner"
version = "0.1.0"
dependencies = [
"itertools",
"serde",
"serde_json",
"structopt",
"thiserror",
]
[[package]]
name = "ryu"
version = "1.0.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "71d301d4193d031abdd79ff7e3dd721168a9572ef3fe51a1517aba235bd8f86e"
[[package]]
name = "serde"
version = "1.0.127"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f03b9878abf6d14e6779d3f24f07b2cfa90352cfec4acc5aab8f1ac7f146fae8"
dependencies = [
"serde_derive",
]
[[package]]
name = "serde_derive"
version = "1.0.127"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a024926d3432516606328597e0f224a51355a493b49fdd67e9209187cbe55ecc"
dependencies = [
"proc-macro2",
"quote",
"syn",
]
[[package]]
name = "serde_json"
version = "1.0.66"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "336b10da19a12ad094b59d870ebde26a45402e5b470add4b5fd03c5048a32127"
dependencies = [
"itoa",
"ryu",
"serde",
]
[[package]]
name = "strsim"
version = "0.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8ea5119cdb4c55b55d432abb513a0429384878c15dde60cc77b1c99de1a95a6a"
[[package]]
name = "structopt"
version = "0.3.22"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "69b041cdcb67226aca307e6e7be44c8806423d83e018bd662360a93dabce4d71"
dependencies = [
"clap",
"lazy_static",
"structopt-derive",
]
[[package]]
name = "structopt-derive"
version = "0.4.15"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7813934aecf5f51a54775e00068c237de98489463968231a51746bbbc03f9c10"
dependencies = [
"heck",
"proc-macro-error",
"proc-macro2",
"quote",
"syn",
]
[[package]]
name = "syn"
version = "1.0.74"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1873d832550d4588c3dbc20f01361ab00bfe741048f71e3fecf145a7cc18b29c"
dependencies = [
"proc-macro2",
"quote",
"unicode-xid",
]
[[package]]
name = "textwrap"
version = "0.11.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d326610f408c7a4eb6f51c37c330e496b08506c9457c9d34287ecc38809fb060"
dependencies = [
"unicode-width",
]
[[package]]
name = "thiserror"
version = "1.0.26"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "93119e4feac1cbe6c798c34d3a53ea0026b0b1de6a120deef895137c0529bfe2"
dependencies = [
"thiserror-impl",
]
[[package]]
name = "thiserror-impl"
version = "1.0.26"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "060d69a0afe7796bf42e9e2ff91f5ee691fb15c53d38b4b62a9a53eb23164745"
dependencies = [
"proc-macro2",
"quote",
"syn",
]
[[package]]
name = "unicode-segmentation"
version = "1.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8895849a949e7845e06bd6dc1aa51731a103c42707010a5b591c0038fb73385b"
[[package]]
name = "unicode-width"
version = "0.1.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9337591893a19b88d8d87f2cec1e73fad5cdfd10e5a6f349f498ad6ea2ffb1e3"
[[package]]
name = "unicode-xid"
version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8ccb82d61f80a663efe1f787a51b16b5a51e3314d6ac365b08639f52387b33f3"
[[package]]
name = "vec_map"
version = "0.8.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f1bddf1187be692e79c5ffeab891132dfb0f236ed36a43c7ed39f1165ee20191"
[[package]]
name = "version_check"
version = "0.9.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5fecdca9a5291cc2b8dcf7dc02453fee791a280f3743cb0905f8822ae463b3fe"
[[package]]
name = "winapi"
version = "0.3.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5c839a674fcd7a98952e593242ea400abe93992746761e38641405d28b00f419"
dependencies = [
"winapi-i686-pc-windows-gnu",
"winapi-x86_64-pc-windows-gnu",
]
[[package]]
name = "winapi-i686-pc-windows-gnu"
version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ac3b87c63620426dd9b991e5ce0329eff545bccbbb34f3be09ff6fb6ab51b7b6"
[[package]]
name = "winapi-x86_64-pc-windows-gnu"
version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "712e227841d057c1ee1cd2fb22fa7e5a5461ae8e48fa2ca79ec42cfc1931183f"

View File

@@ -1,11 +0,0 @@
[package]
name = "runner"
version = "0.1.0"
edition = "2018"
[dependencies]
itertools = "0.10.1"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
structopt = "0.3"
thiserror = "1.0.26"

View File

@@ -1,269 +0,0 @@
use crate::exceptions::{CalculateError, IOError};
use itertools::Itertools;
use serde::{Deserialize, Serialize};
use std::fs;
use std::fs::DirEntry;
use std::path::{Path, PathBuf};
// This type exactly matches the type of array elements
// from hyperfine's output. Deriving `Serialize` and `Deserialize`
// gives us read and write capabilities via json_serde.
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub struct Measurement {
pub command: String,
pub mean: f64,
pub stddev: f64,
pub median: f64,
pub user: f64,
pub system: f64,
pub min: f64,
pub max: f64,
pub times: Vec<f64>,
}
// This type exactly matches the type of hyperfine's output.
// Deriving `Serialize` and `Deserialize` gives us read and
// write capabilities via json_serde.
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub struct Measurements {
pub results: Vec<Measurement>,
}
// Output data from a comparison between runs on the baseline
// and dev branches.
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct Data {
pub threshold: f64,
pub difference: f64,
pub baseline: f64,
pub dev: f64,
}
// The full output from a comparison between runs on the baseline
// and dev branches.
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct Calculation {
pub metric: String,
pub regression: bool,
pub data: Data,
}
// A type to describe which measurement we are working with. This
// information is parsed from the filename of hyperfine's output.
#[derive(Debug, Clone, PartialEq)]
pub struct MeasurementGroup {
pub version: String,
pub run: String,
pub measurement: Measurement,
}
// Given two measurements, return all the calculations. Calculations are
// flagged as regressions or not regressions.
fn calculate(metric: &str, dev: &Measurement, baseline: &Measurement) -> Vec<Calculation> {
let median_threshold = 1.05; // 5% regression threshold
let median_difference = dev.median / baseline.median;
let stddev_threshold = 1.20; // 20% regression threshold
let stddev_difference = dev.stddev / baseline.stddev;
vec![
Calculation {
metric: ["median", metric].join("_"),
regression: median_difference > median_threshold,
data: Data {
threshold: median_threshold,
difference: median_difference,
baseline: baseline.median,
dev: dev.median,
},
},
Calculation {
metric: ["stddev", metric].join("_"),
regression: stddev_difference > stddev_threshold,
data: Data {
threshold: stddev_threshold,
difference: stddev_difference,
baseline: baseline.stddev,
dev: dev.stddev,
},
},
]
}
// Given a directory, read all files in the directory and return each
// filename with the deserialized json contents of that file.
fn measurements_from_files(
results_directory: &Path,
) -> Result<Vec<(PathBuf, Measurements)>, CalculateError> {
fs::read_dir(results_directory)
.or_else(|e| Err(IOError::ReadErr(results_directory.to_path_buf(), Some(e))))
.or_else(|e| Err(CalculateError::CalculateIOError(e)))?
.into_iter()
.map(|entry| {
let ent: DirEntry = entry
.or_else(|e| Err(IOError::ReadErr(results_directory.to_path_buf(), Some(e))))
.or_else(|e| Err(CalculateError::CalculateIOError(e)))?;
Ok(ent.path())
})
.collect::<Result<Vec<PathBuf>, CalculateError>>()?
.iter()
.filter(|path| {
path.extension()
.and_then(|ext| ext.to_str())
.map_or(false, |ext| ext.ends_with("json"))
})
.map(|path| {
fs::read_to_string(path)
.or_else(|e| Err(IOError::BadFileContentsErr(path.clone(), Some(e))))
.or_else(|e| Err(CalculateError::CalculateIOError(e)))
.and_then(|contents| {
serde_json::from_str::<Measurements>(&contents)
.or_else(|e| Err(CalculateError::BadJSONErr(path.clone(), Some(e))))
})
.map(|m| (path.clone(), m))
})
.collect()
}
// Given a list of filename-measurement pairs, detect any regressions by grouping
// measurements together by filename.
fn calculate_regressions(
measurements: &[(&PathBuf, &Measurement)],
) -> Result<Vec<Calculation>, CalculateError> {
/*
Strategy of this function body:
1. [Measurement] -> [MeasurementGroup]
2. Sort the MeasurementGroups
3. Group the MeasurementGroups by "run"
4. Call `calculate` with the two resulting Measurements as input
*/
let mut measurement_groups: Vec<MeasurementGroup> = measurements
.iter()
.map(|(p, m)| {
p.file_name()
.ok_or_else(|| IOError::MissingFilenameErr(p.to_path_buf()))
.and_then(|name| {
name.to_str()
.ok_or_else(|| IOError::FilenameNotUnicodeErr(p.to_path_buf()))
})
.map(|name| {
let parts: Vec<&str> = name.split("_").collect();
MeasurementGroup {
version: parts[0].to_owned(),
run: parts[1..].join("_"),
measurement: (*m).clone(),
}
})
})
.collect::<Result<Vec<MeasurementGroup>, IOError>>()
.or_else(|e| Err(CalculateError::CalculateIOError(e)))?;
measurement_groups.sort_by(|x, y| (&x.run, &x.version).cmp(&(&y.run, &y.version)));
// locking up mutation
let sorted_measurement_groups = measurement_groups;
let calculations: Vec<Calculation> = sorted_measurement_groups
.iter()
.group_by(|x| &x.run)
.into_iter()
.map(|(_, g)| {
let mut groups: Vec<&MeasurementGroup> = g.collect();
groups.sort_by(|x, y| x.version.cmp(&y.version));
match groups.len() {
2 => {
let dev = &groups[1];
let baseline = &groups[0];
if dev.version == "dev" && baseline.version == "baseline" {
Ok(calculate(&dev.run, &dev.measurement, &baseline.measurement))
} else {
Err(CalculateError::BadBranchNameErr(
baseline.version.clone(),
dev.version.clone(),
))
}
}
i => {
let gs: Vec<MeasurementGroup> = groups.into_iter().map(|x| x.clone()).collect();
Err(CalculateError::BadGroupSizeErr(i, gs))
}
}
})
.collect::<Result<Vec<Vec<Calculation>>, CalculateError>>()?
.concat();
Ok(calculations)
}
// Top-level function. Given a path for the result directory, call the above
// functions to compare and collect calculations. Calculations include both
// metrics that fall within the threshold and regressions.
pub fn regressions(results_directory: &PathBuf) -> Result<Vec<Calculation>, CalculateError> {
measurements_from_files(Path::new(&results_directory)).and_then(|v| {
// exit early with an Err if there are no results to process
if v.len() <= 0 {
Err(CalculateError::NoResultsErr(results_directory.clone()))
// we expect two runs for each project-metric pairing: one for each branch, baseline
// and dev. An odd result count is unexpected.
} else if v.len() % 2 == 1 {
Err(CalculateError::OddResultsCountErr(
v.len(),
results_directory.clone(),
))
} else {
// otherwise, we can do our comparisons
let measurements = v
.iter()
// the way we're running these, the files will each contain exactly one measurement, hence `results[0]`
.map(|(p, ms)| (p, &ms.results[0]))
.collect::<Vec<(&PathBuf, &Measurement)>>();
calculate_regressions(&measurements[..])
}
})
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn detects_5_percent_regression() {
let dev = Measurement {
command: "some command".to_owned(),
mean: 1.06,
stddev: 1.06,
median: 1.06,
user: 1.06,
system: 1.06,
min: 1.06,
max: 1.06,
times: vec![],
};
let baseline = Measurement {
command: "some command".to_owned(),
mean: 1.00,
stddev: 1.00,
median: 1.00,
user: 1.00,
system: 1.00,
min: 1.00,
max: 1.00,
times: vec![],
};
let calculations = calculate("test_metric", &dev, &baseline);
let regressions: Vec<&Calculation> =
calculations.iter().filter(|calc| calc.regression).collect();
// expect one regression for median
println!("{:#?}", regressions);
assert_eq!(regressions.len(), 1);
assert_eq!(regressions[0].metric, "median_test_metric");
}
}

View File

@@ -1,155 +0,0 @@
use crate::calculate::*;
use std::io;
#[cfg(test)]
use std::path::Path;
use std::path::PathBuf;
use thiserror::Error;
// Custom IO Error messages for the IO errors we encounter.
// New constructors should be added to wrap any new IO errors.
// The desired output of these errors is tested below.
#[derive(Debug, Error)]
pub enum IOError {
#[error("ReadErr: The file cannot be read.\nFilepath: {}\nOriginating Exception: {}", .0.to_string_lossy().into_owned(), .1.as_ref().map_or("None".to_owned(), |e| format!("{}", e)))]
ReadErr(PathBuf, Option<io::Error>),
#[error("MissingFilenameErr: The path provided does not specify a file.\nFilepath: {}", .0.to_string_lossy().into_owned())]
MissingFilenameErr(PathBuf),
#[error("FilenameNotUnicodeErr: The filename is not expressible in unicode. Consider renaming the file.\nFilepath: {}", .0.to_string_lossy().into_owned())]
FilenameNotUnicodeErr(PathBuf),
#[error("BadFileContentsErr: Check that the file exists and is readable.\nFilepath: {}\nOriginating Exception: {}", .0.to_string_lossy().into_owned(), .1.as_ref().map_or("None".to_owned(), |e| format!("{}", e)))]
BadFileContentsErr(PathBuf, Option<io::Error>),
#[error("CommandErr: System command failed to run.\nOriginating Exception: {}", .0.as_ref().map_or("None".to_owned(), |e| format!("{}", e)))]
CommandErr(Option<io::Error>),
}
// Custom Error messages for the error states we could encounter
// during calculation, and are not prevented at compile time. New
// constructors should be added for any new error situations that
// come up. The desired output of these errors is tested below.
#[derive(Debug, Error)]
pub enum CalculateError {
#[error("BadJSONErr: JSON in file cannot be deserialized as expected.\nFilepath: {}\nOriginating Exception: {}", .0.to_string_lossy().into_owned(), .1.as_ref().map_or("None".to_owned(), |e| format!("{}", e)))]
BadJSONErr(PathBuf, Option<serde_json::Error>),
#[error("{}", .0)]
CalculateIOError(IOError),
#[error("NoResultsErr: The results directory has no json files in it.\nFilepath: {}", .0.to_string_lossy().into_owned())]
NoResultsErr(PathBuf),
#[error("OddResultsCountErr: The results directory has an odd number of results in it. Expected an even number.\nFile Count: {}\nFilepath: {}", .0, .1.to_string_lossy().into_owned())]
OddResultsCountErr(usize, PathBuf),
#[error("BadGroupSizeErr: Expected two results per group, one for each branch-project pair.\nCount: {}\nGroup: {:?}", .0, .1.into_iter().map(|group| (&group.version[..], &group.run[..])).collect::<Vec<(&str, &str)>>())]
BadGroupSizeErr(usize, Vec<MeasurementGroup>),
#[error("BadBranchNameErr: Branch names must be 'baseline' and 'dev'.\nFound: {}, {}", .0, .1)]
BadBranchNameErr(String, String),
}
// Tests for exceptions
#[cfg(test)]
mod tests {
use super::*;
// Tests the output fo io error messages. There should be at least one per enum constructor.
#[test]
fn test_io_error_messages() {
let pairs = vec![
(
IOError::ReadErr(Path::new("dummy/path/file.json").to_path_buf(), None),
r#"ReadErr: The file cannot be read.
Filepath: dummy/path/file.json
Originating Exception: None"#,
),
(
IOError::MissingFilenameErr(Path::new("dummy/path/no_file/").to_path_buf()),
r#"MissingFilenameErr: The path provided does not specify a file.
Filepath: dummy/path/no_file/"#,
),
(
IOError::FilenameNotUnicodeErr(Path::new("dummy/path/no_file/").to_path_buf()),
r#"FilenameNotUnicodeErr: The filename is not expressible in unicode. Consider renaming the file.
Filepath: dummy/path/no_file/"#,
),
(
IOError::BadFileContentsErr(
Path::new("dummy/path/filenotexist.json").to_path_buf(),
None,
),
r#"BadFileContentsErr: Check that the file exists and is readable.
Filepath: dummy/path/filenotexist.json
Originating Exception: None"#,
),
(
IOError::CommandErr(None),
r#"CommandErr: System command failed to run.
Originating Exception: None"#,
),
];
for (err, msg) in pairs {
assert_eq!(format!("{}", err), msg)
}
}
// Tests the output fo calculate error messages. There should be at least one per enum constructor.
#[test]
fn test_calculate_error_messages() {
let pairs = vec![
(
CalculateError::BadJSONErr(Path::new("dummy/path/file.json").to_path_buf(), None),
r#"BadJSONErr: JSON in file cannot be deserialized as expected.
Filepath: dummy/path/file.json
Originating Exception: None"#,
),
(
CalculateError::BadJSONErr(Path::new("dummy/path/file.json").to_path_buf(), None),
r#"BadJSONErr: JSON in file cannot be deserialized as expected.
Filepath: dummy/path/file.json
Originating Exception: None"#,
),
(
CalculateError::NoResultsErr(Path::new("dummy/path/no_file/").to_path_buf()),
r#"NoResultsErr: The results directory has no json files in it.
Filepath: dummy/path/no_file/"#,
),
(
CalculateError::OddResultsCountErr(
3,
Path::new("dummy/path/no_file/").to_path_buf(),
),
r#"OddResultsCountErr: The results directory has an odd number of results in it. Expected an even number.
File Count: 3
Filepath: dummy/path/no_file/"#,
),
(
CalculateError::BadGroupSizeErr(
1,
vec![MeasurementGroup {
version: "dev".to_owned(),
run: "some command".to_owned(),
measurement: Measurement {
command: "some command".to_owned(),
mean: 1.0,
stddev: 1.0,
median: 1.0,
user: 1.0,
system: 1.0,
min: 1.0,
max: 1.0,
times: vec![1.0, 1.1, 0.9, 1.0, 1.1, 0.9, 1.1],
},
}],
),
r#"BadGroupSizeErr: Expected two results per group, one for each branch-project pair.
Count: 1
Group: [("dev", "some command")]"#,
),
(
CalculateError::BadBranchNameErr("boop".to_owned(), "noop".to_owned()),
r#"BadBranchNameErr: Branch names must be 'baseline' and 'dev'.
Found: boop, noop"#,
),
];
for (err, msg) in pairs {
assert_eq!(format!("{}", err), msg)
}
}
}

View File

@@ -1,119 +0,0 @@
extern crate structopt;
mod calculate;
mod exceptions;
mod measure;
use crate::calculate::Calculation;
use crate::exceptions::CalculateError;
use std::fs::File;
use std::io::Write;
use std::path::PathBuf;
use structopt::StructOpt;
// This type defines the commandline interface and is generated
// by `derive(StructOpt)`
#[derive(Clone, Debug, StructOpt)]
#[structopt(name = "performance", about = "performance regression testing runner")]
enum Opt {
#[structopt(name = "measure")]
Measure {
#[structopt(parse(from_os_str))]
#[structopt(short)]
projects_dir: PathBuf,
#[structopt(short)]
branch_name: String,
},
#[structopt(name = "calculate")]
Calculate {
#[structopt(parse(from_os_str))]
#[structopt(short)]
results_dir: PathBuf,
},
}
// enables proper useage of exit() in main.
// https://doc.rust-lang.org/std/process/fn.exit.html#examples
//
// This is where all the printing should happen. Exiting happens
// in main, and module functions should only return values.
fn run_app() -> Result<i32, CalculateError> {
// match what the user inputs from the cli
match Opt::from_args() {
// measure subcommand
Opt::Measure {
projects_dir,
branch_name,
} => {
// if there are any nonzero exit codes from the hyperfine runs,
// return the first one. otherwise return zero.
measure::measure(&projects_dir, &branch_name)
.or_else(|e| Err(CalculateError::CalculateIOError(e)))?
.iter()
.map(|status| status.code())
.flatten()
.filter(|code| *code != 0)
.collect::<Vec<i32>>()
.get(0)
.map_or(Ok(0), |x| {
println!("Main: a child process exited with a nonzero status code.");
Ok(*x)
})
}
// calculate subcommand
Opt::Calculate { results_dir } => {
// get all the calculations or gracefully show the user an exception
let calculations = calculate::regressions(&results_dir)?;
// print all calculations to stdout so they can be easily debugged
// via CI.
println!(":: All Calculations ::\n");
for c in &calculations {
println!("{:#?}\n", c);
}
// indented json string representation of the calculations array
let json_calcs = serde_json::to_string_pretty(&calculations)
.expect("Main: Failed to serialize calculations to json");
// create the empty destination file, and write the json string
let outfile = &mut results_dir.into_os_string();
outfile.push("/final_calculations.json");
let mut f = File::create(outfile).expect("Main: Unable to create file");
f.write_all(json_calcs.as_bytes())
.expect("Main: Unable to write data");
// filter for regressions
let regressions: Vec<&Calculation> =
calculations.iter().filter(|c| c.regression).collect();
// return a non-zero exit code if there are regressions
match regressions[..] {
[] => {
println!("congrats! no regressions :)");
Ok(0)
}
_ => {
// print all calculations to stdout so they can be easily
// debugged via CI.
println!(":: Regressions Found ::\n");
for r in regressions {
println!("{:#?}\n", r);
}
Ok(1)
}
}
}
}
}
fn main() {
std::process::exit(match run_app() {
Ok(code) => code,
Err(err) => {
eprintln!("{}", err);
1
}
});
}

View File

@@ -1,89 +0,0 @@
use crate::exceptions::IOError;
use std::fs;
use std::path::PathBuf;
use std::process::{Command, ExitStatus};
// `Metric` defines a dbt command that we want to measure on both the
// baseline and dev branches.
#[derive(Debug, Clone)]
struct Metric<'a> {
name: &'a str,
prepare: &'a str,
cmd: &'a str,
}
impl Metric<'_> {
// Returns the proper filename for the hyperfine output for this metric.
fn outfile(&self, project: &str, branch: &str) -> String {
[branch, "_", self.name, "_", project, ".json"].join("")
}
}
// Calls hyperfine via system command, and returns all the exit codes for each hyperfine run.
pub fn measure<'a>(
projects_directory: &PathBuf,
dbt_branch: &str,
) -> Result<Vec<ExitStatus>, IOError> {
/*
Strategy of this function body:
1. Read all directory names in `projects_directory`
2. Pair `n` projects with `m` metrics for a total of n*m pairs
3. Run hyperfine on each project-metric pair
*/
// To add a new metric to the test suite, simply define it in this list:
// TODO: This could be read from a config file in a future version.
let metrics: Vec<Metric> = vec![Metric {
name: "parse",
prepare: "rm -rf target/",
cmd: "dbt parse --no-version-check",
}];
fs::read_dir(projects_directory)
.or_else(|e| Err(IOError::ReadErr(projects_directory.to_path_buf(), Some(e))))?
.map(|entry| {
let path = entry
.or_else(|e| Err(IOError::ReadErr(projects_directory.to_path_buf(), Some(e))))?
.path();
let project_name: String = path
.file_name()
.ok_or_else(|| IOError::MissingFilenameErr(path.clone().to_path_buf()))
.and_then(|x| {
x.to_str()
.ok_or_else(|| IOError::FilenameNotUnicodeErr(path.clone().to_path_buf()))
})?
.to_owned();
// each project-metric pair we will run
let pairs = metrics
.iter()
.map(|metric| (path.clone(), project_name.clone(), metric))
.collect::<Vec<(PathBuf, String, &Metric<'a>)>>();
Ok(pairs)
})
.collect::<Result<Vec<Vec<(PathBuf, String, &Metric<'a>)>>, IOError>>()?
.concat()
.iter()
// run hyperfine on each pairing
.map(|(path, project_name, metric)| {
Command::new("hyperfine")
.current_dir(path)
// warms filesystem caches by running the command first without counting it.
// alternatively we could clear them before each run
.arg("--warmup")
.arg("1")
.arg("--prepare")
.arg(metric.prepare)
.arg([metric.cmd, " --profiles-dir ", "../../project_config/"].join(""))
.arg("--export-json")
.arg(["../../results/", &metric.outfile(project_name, dbt_branch)].join(""))
// this prevents hyperfine from capturing dbt's output.
// Noisy, but good for debugging when tests fail.
.arg("--show-output")
.status() // use spawn() here instead for more information
.or_else(|e| Err(IOError::CommandErr(Some(e))))
})
.collect()
}

View File

@@ -1 +1 @@
version = '0.21.0a1'
version = '0.21.0b1'

View File

@@ -128,6 +128,38 @@
{% do adapter.rename_relation(from_relation, to_relation) %}
{% endmacro %}
{% macro bigquery__alter_relation_add_columns(relation, add_columns) %}
{% set sql -%}
alter {{ relation.type }} {{ relation }}
{% for column in add_columns %}
add column {{ column.name }} {{ column.data_type }}{{ ',' if not loop.last }}
{% endfor %}
{%- endset -%}
{{ return(run_query(sql)) }}
{% endmacro %}
{% macro bigquery__alter_relation_drop_columns(relation, drop_columns) %}
{% set sql -%}
alter {{ relation.type }} {{ relation }}
{% for column in drop_columns %}
drop column {{ column.name }}{{ ',' if not loop.last }}
{% endfor %}
{%- endset -%}
{{ return(run_query(sql)) }}
{% endmacro %}
{% macro bigquery__alter_column_type(relation, column_name, new_column_type) -%}
{#
Changing a column's data type using a query requires you to scan the entire table.

View File

@@ -15,7 +15,9 @@
{% endmacro %}
{% macro bq_insert_overwrite(tmp_relation, target_relation, sql, unique_key, partition_by, partitions, dest_columns) %}
{% macro bq_insert_overwrite(
tmp_relation, target_relation, sql, unique_key, partition_by, partitions, dest_columns, tmp_relation_exists
) %}
{% if partitions is not none and partitions != [] %} {# static #}
@@ -52,8 +54,13 @@
where {{ partition_by.field }} is not null
);
-- 1. create a temp table
{{ create_table_as(True, tmp_relation, sql) }}
{# have we already created the temp table to check for schema changes? #}
{% if not tmp_relation_exists %}
-- 1. create a temp table
{{ create_table_as(True, tmp_relation, sql) }}
{% else %}
-- 1. temp table already exists, we used it to check for schema changes
{% endif %}
-- 2. define partitions to update
set (dbt_partitions_for_replacement) = (
@@ -77,6 +84,44 @@
{% endmacro %}
{% macro bq_generate_incremental_build_sql(
strategy, tmp_relation, target_relation, sql, unique_key, partition_by, partitions, dest_columns, tmp_relation_exists
) %}
{#-- if partitioned, use BQ scripting to get the range of partition values to be updated --#}
{% if strategy == 'insert_overwrite' %}
{% set missing_partition_msg -%}
The 'insert_overwrite' strategy requires the `partition_by` config.
{%- endset %}
{% if partition_by is none %}
{% do exceptions.raise_compiler_error(missing_partition_msg) %}
{% endif %}
{% set build_sql = bq_insert_overwrite(
tmp_relation, target_relation, sql, unique_key, partition_by, partitions, dest_columns, on_schema_change
) %}
{% else %} {# strategy == 'merge' #}
{%- set source_sql -%}
{%- if tmp_relation_exists -%}
(
select * from {{ tmp_relation }}
)
{%- else -%} {#-- wrap sql in parens to make it a subquery --#}
(
{{sql}}
)
{%- endif -%}
{%- endset -%}
{% set build_sql = get_merge_sql(target_relation, source_sql, unique_key, dest_columns) %}
{% endif %}
{{ return(build_sql) }}
{% endmacro %}
{% materialization incremental, adapter='bigquery' -%}
{%- set unique_key = config.get('unique_key') -%}
@@ -94,14 +139,18 @@
{%- set partitions = config.get('partitions', none) -%}
{%- set cluster_by = config.get('cluster_by', none) -%}
{% set on_schema_change = incremental_validate_on_schema_change(config.get('on_schema_change'), default='ignore') %}
{{ run_hooks(pre_hooks) }}
{% if existing_relation is none %}
{% set build_sql = create_table_as(False, target_relation, sql) %}
{% elif existing_relation.is_view %}
{#-- There's no way to atomically replace a view with a table on BQ --#}
{{ adapter.drop_relation(existing_relation) }}
{% set build_sql = create_table_as(False, target_relation, sql) %}
{% elif full_refresh_mode %}
{#-- If the partition/cluster config has changed, then we must drop and recreate --#}
{% if not adapter.is_replaceable(existing_relation, partition_by, cluster_by) %}
@@ -109,39 +158,19 @@
{{ adapter.drop_relation(existing_relation) }}
{% endif %}
{% set build_sql = create_table_as(False, target_relation, sql) %}
{% else %}
{% set dest_columns = adapter.get_columns_in_relation(existing_relation) %}
{#-- if partitioned, use BQ scripting to get the range of partition values to be updated --#}
{% if strategy == 'insert_overwrite' %}
{% set missing_partition_msg -%}
The 'insert_overwrite' strategy requires the `partition_by` config.
{%- endset %}
{% if partition_by is none %}
{% do exceptions.raise_compiler_error(missing_partition_msg) %}
{% endif %}
{% set build_sql = bq_insert_overwrite(
tmp_relation,
target_relation,
sql,
unique_key,
partition_by,
partitions,
dest_columns) %}
{% else %}
{#-- wrap sql in parens to make it a subquery --#}
{%- set source_sql -%}
(
{{sql}}
)
{%- endset -%}
{% set build_sql = get_merge_sql(target_relation, source_sql, unique_key, dest_columns) %}
{% endif %}
{% set tmp_relation_exists = false %}
{% if on_schema_change != 'ignore' %} {# Check first, since otherwise we may not build a temp table #}
{% do run_query(create_table_as(True, tmp_relation, sql)) %}
{% set tmp_relation_exists = true %}
{% do process_schema_changes(on_schema_change, tmp_relation, existing_relation) %}
{% endif %}
{% set dest_columns = adapter.get_columns_in_relation(existing_relation) %}
{% set build_sql = bq_generate_incremental_build_sql(
strategy, tmp_relation, target_relation, sql, unique_key, partition_by, partitions, dest_columns, tmp_relation_exists
) %}
{% endif %}

View File

@@ -9,7 +9,7 @@
{% materialization view, adapter='bigquery' -%}
{% set to_return = create_or_replace_view(run_outside_transaction_hooks=False) %}
{% set to_return = create_or_replace_view() %}
{% set target_relation = this.incorporate(type='view') %}
{% do persist_docs(target_relation, model) %}

View File

@@ -20,7 +20,7 @@ except ImportError:
package_name = "dbt-bigquery"
package_version = "0.21.0a1"
package_version = "0.21.0b1"
description = """The bigquery adapter plugin for dbt (data build tool)"""
this_directory = os.path.abspath(os.path.dirname(__file__))
@@ -33,9 +33,9 @@ setup(
description=description,
long_description=long_description,
long_description_content_type='text/markdown',
author="Fishtown Analytics",
author_email="info@fishtownanalytics.com",
url="https://github.com/fishtown-analytics/dbt",
author="dbt Labs",
author_email="info@dbtlabs.com",
url="https://github.com/dbt-labs/dbt",
packages=find_namespace_packages(include=['dbt', 'dbt.*']),
package_data={
'dbt': [

View File

@@ -1 +1 @@
version = '0.21.0a1'
version = '0.21.0b1'

View File

@@ -41,7 +41,7 @@ def _dbt_psycopg2_name():
package_name = "dbt-postgres"
package_version = "0.21.0a1"
package_version = "0.21.0b1"
description = """The postgres adpter plugin for dbt (data build tool)"""
this_directory = os.path.abspath(os.path.dirname(__file__))
@@ -56,9 +56,9 @@ setup(
description=description,
long_description=description,
long_description_content_type='text/markdown',
author="Fishtown Analytics",
author_email="info@fishtownanalytics.com",
url="https://github.com/fishtown-analytics/dbt",
author="dbt Labs",
author_email="info@dbtlabs.com",
url="https://github.com/dbt-labs/dbt",
packages=find_namespace_packages(include=['dbt', 'dbt.*']),
package_data={
'dbt': [

View File

@@ -1 +1 @@
version = '0.21.0a1'
version = '0.21.0b1'

View File

@@ -255,3 +255,29 @@
{% do return(postgres__alter_column_comment(relation, column_dict)) %}
{% endmacro %}
{% macro redshift__alter_relation_add_remove_columns(relation, add_columns, remove_columns) %}
{% if add_columns %}
{% for column in add_columns %}
{% set sql -%}
alter {{ relation.type }} {{ relation }} add column {{ column.name }} {{ column.data_type }}
{% endset %}
{% do run_query(sql) %}
{% endfor %}
{% endif %}
{% if remove_columns %}
{% for column in remove_columns %}
{% set sql -%}
alter {{ relation.type }} {{ relation }} drop column {{ column.name }}
{% endset %}
{% do run_query(sql) %}
{% endfor %}
{% endif %}
{% endmacro %}

View File

@@ -20,7 +20,7 @@ except ImportError:
package_name = "dbt-redshift"
package_version = "0.21.0a1"
package_version = "0.21.0b1"
description = """The redshift adapter plugin for dbt (data build tool)"""
this_directory = os.path.abspath(os.path.dirname(__file__))
@@ -33,9 +33,9 @@ setup(
description=description,
long_description=description,
long_description_content_type='text/markdown',
author="Fishtown Analytics",
author_email="info@fishtownanalytics.com",
url="https://github.com/fishtown-analytics/dbt",
author="dbt Labs",
author_email="info@dbtlabs.com",
url="https://github.com/dbt-labs/dbt",
packages=find_namespace_packages(include=['dbt', 'dbt.*']),
package_data={
'dbt': [

View File

@@ -1 +1 @@
version = '0.21.0a1'
version = '0.21.0b1'

View File

@@ -224,7 +224,7 @@ class SnowflakeConnectionManager(SQLConnectionManager):
schema=creds.schema,
warehouse=creds.warehouse,
role=creds.role,
autocommit=False,
autocommit=True,
client_session_keep_alive=creds.client_session_keep_alive,
application='dbt',
**creds.auth_args()
@@ -275,6 +275,23 @@ class SnowflakeConnectionManager(SQLConnectionManager):
code=code
)
# disable transactional logic by default on Snowflake
# except for DML statements where explicitly defined
def add_begin_query(self, *args, **kwargs):
pass
def add_commit_query(self, *args, **kwargs):
pass
def begin(self):
pass
def commit(self):
pass
def clear_transaction(self):
pass
@classmethod
def _split_queries(cls, sql):
"Splits sql statements at semicolons into discrete queries"
@@ -352,15 +369,3 @@ class SnowflakeConnectionManager(SQLConnectionManager):
)
return connection, cursor
@classmethod
def _rollback_handle(cls, connection):
"""On snowflake, rolling back the handle of an aborted session raises
an exception.
"""
try:
connection.handle.rollback()
except snowflake.connector.errors.ProgrammingError as e:
msg = str(e)
if 'Session no longer exists' not in msg:
raise

View File

@@ -191,3 +191,61 @@
{% endif %}
{% endif %}
{% endmacro %}
{% macro snowflake__alter_relation_add_remove_columns(relation, add_columns, remove_columns) %}
{% if add_columns %}
{% set sql -%}
alter {{ relation.type }} {{ relation }} add column
{% for column in add_columns %}
{{ column.name }} {{ column.data_type }}{{ ',' if not loop.last }}
{% endfor %}
{%- endset -%}
{% do run_query(sql) %}
{% endif %}
{% if remove_columns %}
{% set sql -%}
alter {{ relation.type }} {{ relation }} drop column
{% for column in remove_columns %}
{{ column.name }}{{ ',' if not loop.last }}
{% endfor %}
{%- endset -%}
{% do run_query(sql) %}
{% endif %}
{% endmacro %}
{% macro snowflake_dml_explicit_transaction(dml) %}
{#
Use this macro to wrap all INSERT, MERGE, UPDATE, DELETE, and TRUNCATE
statements before passing them into run_query(), or calling in the 'main' statement
of a materialization
#}
{% set dml_transaction -%}
begin;
{{ dml }};
commit;
{%- endset %}
{% do return(dml_transaction) %}
{% endmacro %}
{% macro snowflake__truncate_relation(relation) -%}
{% set truncate_dml %}
truncate table {{ relation }}
{% endset %}
{% call statement('truncate_relation') -%}
{{ snowflake_dml_explicit_transaction(truncate_dml) }}
{%- endcall %}
{% endmacro %}

View File

@@ -25,7 +25,7 @@
{% endmacro %}
{% materialization incremental, adapter='snowflake' -%}
{% set original_query_tag = set_query_tag() %}
{%- set unique_key = config.get('unique_key') -%}
@@ -37,41 +37,38 @@
{#-- Validate early so we don't run SQL if the strategy is invalid --#}
{% set strategy = dbt_snowflake_validate_get_incremental_strategy(config) -%}
{% set on_schema_change = incremental_validate_on_schema_change(config.get('on_schema_change'), default='ignore') %}
-- setup
{{ run_hooks(pre_hooks, inside_transaction=False) }}
-- `BEGIN` happens here:
{{ run_hooks(pre_hooks, inside_transaction=True) }}
{{ run_hooks(pre_hooks) }}
{% if existing_relation is none %}
{% set build_sql = create_table_as(False, target_relation, sql) %}
{% elif existing_relation.is_view %}
{#-- Can't overwrite a view with a table - we must drop --#}
{{ log("Dropping relation " ~ target_relation ~ " because it is a view and this model is a table.") }}
{% do adapter.drop_relation(existing_relation) %}
{% set build_sql = create_table_as(False, target_relation, sql) %}
{% elif full_refresh_mode %}
{% set build_sql = create_table_as(False, target_relation, sql) %}
{% else %}
{% do run_query(create_table_as(True, tmp_relation, sql)) %}
{% do adapter.expand_target_column_types(
from_relation=tmp_relation,
to_relation=target_relation) %}
{% set dest_columns = adapter.get_columns_in_relation(target_relation) %}
{% do process_schema_changes(on_schema_change, tmp_relation, existing_relation) %}
{% set dest_columns = adapter.get_columns_in_relation(existing_relation) %}
{% set build_sql = dbt_snowflake_get_incremental_sql(strategy, tmp_relation, target_relation, unique_key, dest_columns) %}
{% endif %}
{%- call statement('main') -%}
{{ build_sql }}
{%- endcall -%}
{{ run_hooks(post_hooks, inside_transaction=True) }}
-- `COMMIT` happens here
{{ adapter.commit() }}
{{ run_hooks(post_hooks, inside_transaction=False) }}
{{ run_hooks(post_hooks) }}
{% set target_relation = target_relation.incorporate(type='table') %}
{% do persist_docs(target_relation, model) %}
@@ -80,4 +77,4 @@
{{ return({'relations': [target_relation]}) }}
{%- endmaterialization %}
{%- endmaterialization %}

View File

@@ -9,6 +9,7 @@
{%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute='name')) -%}
{%- set sql_header = config.get('sql_header', none) -%}
{%- set dml -%}
{%- if unique_key is none -%}
{{ sql_header if sql_header is not none }}
@@ -17,12 +18,27 @@
(
select {{ dest_cols_csv }}
from {{ source_sql }}
);
)
{%- else -%}
{{ default__get_merge_sql(target, source_sql, unique_key, dest_columns, predicates) }}
{%- endif -%}
{%- endset -%}
{% do return(snowflake_dml_explicit_transaction(dml)) %}
{% endmacro %}
{% macro snowflake__get_delete_insert_merge_sql(target, source, unique_key, dest_columns) %}
{% set dml = default__get_delete_insert_merge_sql(target, source, unique_key, dest_columns) %}
{% do return(snowflake_dml_explicit_transaction(dml)) %}
{% endmacro %}
{% macro snowflake__snapshot_merge_sql(target, source, insert_cols) %}
{% set dml = default__snapshot_merge_sql(target, source, insert_cols) %}
{% do return(snowflake_dml_explicit_transaction(dml)) %}
{% endmacro %}

View File

@@ -0,0 +1,36 @@
{% macro snowflake__load_csv_rows(model, agate_table) %}
{% set cols_sql = get_seed_column_quoted_csv(model, agate_table.column_names) %}
{% set bindings = [] %}
{% set statements = [] %}
{% for chunk in agate_table.rows | batch(batch_size) %}
{% set bindings = [] %}
{% for row in chunk %}
{% do bindings.extend(row) %}
{% endfor %}
{% set sql %}
insert into {{ this.render() }} ({{ cols_sql }}) values
{% for row in chunk -%}
({%- for column in agate_table.column_names -%}
%s
{%- if not loop.last%},{%- endif %}
{%- endfor -%})
{%- if not loop.last%},{%- endif %}
{%- endfor %}
{% endset %}
{% do adapter.add_query('BEGIN', auto_begin=False) %}
{% do adapter.add_query(sql, bindings=bindings, abridge_sql_log=True) %}
{% do adapter.add_query('COMMIT', auto_begin=False) %}
{% if loop.index0 == 0 %}
{% do statements.append(sql) %}
{% endif %}
{% endfor %}
{# Return SQL so we can render it out into the compiled files #}
{{ return(statements[0]) }}
{% endmacro %}

View File

@@ -9,10 +9,7 @@
schema=schema,
database=database, type='table') -%}
{{ run_hooks(pre_hooks, inside_transaction=False) }}
-- `BEGIN` happens here:
{{ run_hooks(pre_hooks, inside_transaction=True) }}
{{ run_hooks(pre_hooks) }}
{#-- Drop the relation if it was a view to "convert" it in a table. This may lead to
-- downtime, but it should be a relatively infrequent occurrence #}
@@ -26,12 +23,7 @@
{{ create_table_as(false, target_relation, sql) }}
{%- endcall %}
{{ run_hooks(post_hooks, inside_transaction=True) }}
-- `COMMIT` happens here
{{ adapter.commit() }}
{{ run_hooks(post_hooks, inside_transaction=False) }}
{{ run_hooks(post_hooks) }}
{% do persist_docs(target_relation, model) %}

View File

@@ -20,7 +20,7 @@ except ImportError:
package_name = "dbt-snowflake"
package_version = "0.21.0a1"
package_version = "0.21.0b1"
description = """The snowflake adapter plugin for dbt (data build tool)"""
this_directory = os.path.abspath(os.path.dirname(__file__))
@@ -33,9 +33,9 @@ setup(
description=description,
long_description=description,
long_description_content_type='text/markdown',
author="Fishtown Analytics",
author_email="info@fishtownanalytics.com",
url="https://github.com/fishtown-analytics/dbt",
author="dbt Labs",
author_email="info@dbtlabs.com",
url="https://github.com/dbt-labs/dbt",
packages=find_namespace_packages(include=['dbt', 'dbt.*']),
package_data={
'dbt': [
@@ -47,7 +47,7 @@ setup(
},
install_requires=[
'dbt-core=={}'.format(package_version),
'snowflake-connector-python[secure-local-storage]~=2.4.1',
'snowflake-connector-python[secure-local-storage]>=2.4.1,<2.6.0',
'requests<3.0.0',
'cryptography>=3.2,<4',
],

View File

@@ -24,7 +24,7 @@ with open(os.path.join(this_directory, 'README.md')) as f:
package_name = "dbt"
package_version = "0.21.0a1"
package_version = "0.21.0b1"
description = """With dbt, data analysts and engineers can build analytics \
the way engineers build applications."""
@@ -37,9 +37,9 @@ setup(
long_description=long_description,
long_description_content_type='text/markdown',
author="Fishtown Analytics",
author_email="info@fishtownanalytics.com",
url="https://github.com/fishtown-analytics/dbt",
author="dbt Labs",
author_email="info@dbtlabs.com",
url="https://github.com/dbt-labs/dbt",
packages=[],
install_requires=[
'dbt-core=={}'.format(package_version),

View File

@@ -12,3 +12,11 @@ models:
- fail_calc
- where: # test override + weird quoting
where: "\"favorite_color\" = 'red'"
columns:
- name: id
tests:
# relationships with where
- relationships:
to: ref('table_copy') # itself
field: id
where: 1=1

Some files were not shown because too many files have changed in this diff Show More