dbt-adapters updated the incremental_strategy validation of incremental models such that
the validation now _always_ happens when an incremental model is executed. A test in dbt-core
`TestMicrobatchCustomUserStrategyEnvVarTrueInvalid` was previously set to _expect_ buggy behavior
where an incremental model would succeed on it's "first"/"refresh" run even if it had an invalid
incremental strategy. Thus we needed to update this test in dbt-core to expect the now correct
behavior of incremental model execution time validation
* [Tidy-First]: Fix `timings` object for hooks and macros, and make types of timings explicit
* cast literal to str
* change test
* change jsonschema to enum
* Discard changes to schemas/dbt/manifest/v12.json
* nits
---------
Co-authored-by: Chenyu Li <chenyu.li@dbtlabs.com>
* Add `order_by` and `limit` fields to saved queries.
* Update JSON schema
* Add change log for #10531.
* Check order by / limit in saved-query parsing test.
* Add test that checks microbatch models are all operating with the same `current_time`
* Set an `invocated_at` on the `RuntimeConfig` and plumb to `MicrobatchBuilder`
* Add changie doc
* Rename `invocated_at` to `invoked_at`
* Simply conditional logic for setting MicrobatchBuilder.batch_current_time
* Rename `batch_current_time` to `default_end_time` for MicrobatchBuilder
* Begin testing that microbatch execution times are being tracked and set
* Begin tracking the execution time of batches for microbatch models
* Add changie doc
* Additional assertions in microbatch testing
* Validate that `event_time_start` is before `event_time_end` when passed from CLI
Sometimes CLI options have restrictions based on other CLI options. This is the case
for `--event-time-start` and `--event-time-end`. Unfortunately, click doesn't provide
a good way for validating this, at least not that I found. Additionaly I'm not sure
if we have had anything like this previously. In any case, I couldn't find a
centralized validation area for such occurances. Thus I've gone and added one,
`validate_option_interactions`. Long term if more validations are added, we should
add this wrapper to each CLI command. For now I've only added it to the commands that
support `event_time_start` and `event_time_end`, specifically `build` and `run`.
* Add changie doc
* If `--event-time-end` is not specififed, ensure `--event-time-start` is less than the current time
* Fixup error message about event_time_start and event_time_end
* Move logic to validate `event_time` cli flags to `flags.py`
* Update validation of `--event-time-start` against `datetime.now` to use UTC
* When retrying microbatch models, propagate prior successful state
* Changie doc for microbatch dbt retry fixes
* Fix test_manifest unit tests for batch_info key
* Add functional test for when a microbatch model has multiple retries
* Add comment about when batch_info will be something other than None
* Add tests to check how microbatch models respect `full_refresh` model configs
* Fix `_is_incremental` to properly respect `full_refresh` model config
In dbt-core, it is generally expected that values passed via CLI flags take
precedence over model level configs. However, `full_refresh` on a model is an
exception to this rule, where in the model config takes precedence. This
config exists specifically to _prevent_ accidental full refreshes of large
incremental models, as doing so can be costly. **_It is actually best
practice_** to set `full_refresh=False` on incremental models.
Prior to this commit, for microbatch models, the above was not happening. The
CLI flag `--full-refresh` was taking precedence over the model config
`full_refresh`. That meant that if `--full-refresh` was supplied, then the
microbatch model **_would full refresh_** even if `full_refresh=False` was
set on the model. This commit solves that problem.
* Add changie doc for microbatch `full_refresh` config handling
* Add `PartialSuccess` status type and use it for microbatch models with mixed results
* Handle `PartialSuccess` in `interpret_run_result`
* Add `BatchResults` object to `BaseResult` and begin tracking during microbatch runs
* Ensure batch_results being propagated to `run_results` artifact
* Move `batch_results` from `BaseResult` class to `RunResult` class
* Move `BatchResults` and `BatchType` to separate arifacts file to avoid circular imports
In our next commit we're gonna modify `dbt/contracts/graph/nodes.py` to import the
`BatchType` as part of our work to implement dbt retry for microbatch model nodes.
Unfortunately, the import in `nodes.py` creates a circular dependency because
`dbt/artifacts/schemas/results.py` imports from `nodes.py` and `dbt/artifacts/schemas/run/v5/run.py`
imports from that `results.py`. Thus the new import creates a circular import. Now this
_shouldn't_ be necessary as nothing in artifacts should import from the rest of dbt-core.
However, we do. We should fix this, but this is also out of scope for this segement of work.
* Add `PartialSuccess` as a retry-able status, and use batches to retry microbatch models
* Fix BatchType type so that the first datetime is no longer Optional
* Ensure `PartialSuccess` causes skipping of downstream nodes
* Alter `PartialSuccess` status to be considered an error in `interpret_run_result`
* Update schemas and test artifacts to include new batch_results run results key
* Add functional test to check that 'dbt retry' retries 'PartialSuccess' models
* Update partition failure test to assert downstream models are skipped
* Improve `success`/`error`/`partial success` messaging for microbatch models
* Include `PartialSuccess` in status that `--fail-fast` counts as a failure
* Update `LogModelResult` to handle partial successes
* Update `EndOfRunSummary` to handle partial successes
* Cleanup TODO comment
* Raise a DbtInternalError if we get a batch run result without `batch_results`
* When running a microbatch model with supplied batches, force non full-refresh behavior
This is necessary because of retry. Say on the initial run the microbatch model
succeeds on 97% of it's batches. Then on retry it does the last 3%. If the retry
of the microbatch model executes in full refresh mode it _might_ blow away the
97% of work that has been done. This edge case seems to be adapter specific.
* Only pass batches to retry for microbatch model when there was a PartialSuccess
In the previous commit we made it so that retries of microbatch models wouldn't
run in full refresh mode when the microbatch model to retry has batches already
specified from the prior run. This is only problematic when the run being retried
was a full refresh AND all the batches for a given microbatch model failed. In
that case WE DO want to do a full refresh for the given microbatch model. To better
outline the problem, consider the following:
* a microbatch model had a begin of `2020-01-01` and has been running this way for awhile
* the begin config has changed to `2024-01-01` and dbt run --full-refresh gets run
* every batch for an microbatch model fails
* on dbt retry the the relation is said to exist, and the now out of range data (2020-01-01 through 2023-12-31) is never purged
To avoid this, all we have to do is ONLY pass the batch information for partially successful microbatch
models. Note: microbatch models only have a partially successful status IFF they have both
successful and failed batches.
* Fix test_manifest unit tests to know about model 'batches' key
* Add some console output assertions to microbatch functional tests
* add batch_results: None to expected_run_results
* Add changie doc for microbatch retry functionality
* maintain protoc version 5.26.1
* Cleanup extraneous comment in LogModelResult
---------
Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>
* Test case for `merge_exclude_columns`
* Update expected output for `merge_exclude_columns`
* Skip TestMergeExcludeColumns test
* Enable this test since PostgreSQL 15+ is available in CI now
* Undo modification to expected output
* Remove duplicated constructor for `ResourceTypeSelector`
* Add type annotation for `ResourceTypeSelector`
* Standardize on constructor for `ResourceTypeSelector` where `include_empty_nodes=True`
* Changelog entry