6940 Commits

Author SHA1 Message Date
Michelle Ark
1b7d9b5704 [Tidy first] move microbatch compilation to .compile method (#11063) 2024-11-27 19:08:36 -05:00
Quigley Malcolm
c3d87b89fb Add batch context object to microbatch jinja context (#11031)
* Add `batch_id` to jinja context of microbatch batches

* Add changie doc

* Update `format_batch_start` to assume `batch_start` is always provided

* Add "runtime only" property `batch_context` to `ModelNode`

By it being "runtime only" we mean that it doesn't exist on the artifact
and thus won't be written out to the manifest artifact.

* Begin populating `batch_context` during materialization execution for microbatch batches

* Fix circular import

* Fixup MicrobatchBuilder.batch_id property method

* Ensure MicrobatchModelRunner doesn't double compile batches

We were compiling the node for each batch _twice_. Besides making microbatch
models more expensive than they needed to be, double compiling wasn't
causing any issue. However the first compilation was happening _before_ we
had added the batch context information to the model node for the batch. This
was leading to models which try to access the `batch_context` information on the
model to blow up, which was undesirable. As such, we've now gone and skipped
the first compilation. We've done this similar to how SavedQuery nodes skip
compilation.

* Add `__post_serialize__` method to `BatchContext` to ensure correct dict shape

This is weird, but necessary, I apologize. Mashumaro handles the
dictification of this class via a compile time generated `to_dict`
method based off of the _typing_ of th class. By default `datetime`
types are converted to strings. We don't want that, we want them to
stay datetimes.

* Update tests to check for `batch_context`

* Update `resolve_event_time_filter` to use new `batch_context`

* Stop testing for batchless compiled code for microbatch models

In 45daec72f4 we stopped an extra compilation
that was happening per batch prior to the batch_context being loaded. Stopping
this extra compilation means that compiled sql for the microbatch model without
the event time filter / batch context is no longer produced. We have discussed
this and _believe_ it is okay given that this is a new node type that has not
hit GA yet.

* Rename `ModelNode.batch_context` to `ModelNode.batch`

* Rename `build_batch_context` to `build_jinja_context_for_batch`

The name `build_batch_context` was confusing as
1) We have a `BatchContext` object, which the method was not building
2) The method builds the jinja context for the batch
As such it felt appropriate to rename the method to more accurately
communicate what it does.

* Rename test macro `invalid_batch_context_macro_sql` to `invalid_batch_jinja_context_macro_sql`

This rename was to make it more clear that the jinja context for a
batch was being checked, as a batch_context has a slightly different
connotation.

* Update changie doc
2024-11-27 16:06:41 -06:00
Quigley Malcolm
0f084e16ca Rename internal batch_info variable to previous_batch_results (#11056)
* Rename `batch_info` to `previous_batch_results`

* Exclude `previous_batch_results` from serialization of model node to avoid jinja context bloat

* Drop `previous_batch_results` key from `test_manifest.py` unit tests

In 4050e377ec01c2f14dd9600fe704ddb34adb66fa we began excluding
`previous_batch_results` from the serialized representation of the
ModelNode. As such, we no longer need to check for it in `test_manifest.py`.
2024-11-27 10:46:45 -06:00
Apoorv Mehrotra
3464be7f70 Fixes dbt retry does not respect --threads (#10591) 2024-11-26 11:21:46 -08:00
Gerda Shank
407f6caa1c Pin mashumaro to <3.15 (#11046) 2024-11-25 10:49:17 -05:00
Peter Webb
ad575ec699 Add New Config Properties and Schema for Snapshot Hard Deletes (#10972)
* Add changelog entry.

* Update schemas and test fixtures for new snapshot meta-column

* Add back comment.
2024-11-21 18:15:30 -05:00
Kshitij Aranke
f582ac2488 Fix #11012: Catch DbtRuntimeError for hooks (#11023) 2024-11-21 22:27:45 +00:00
Gerda Shank
f5f0735d00 Bump libpq-dev to 13.18-0+deb11u1 in docker/Dockerfile (#11029) 2024-11-21 17:24:53 -05:00
FishtownBuildBot
3abf575fa6 Cleanup main after cutting new 1.9.latest branch (#11027)
* Clean up changelog on main

* Bumping version to 1.10.0a1

* Code quality cleanup

* add 1.8,1.9 link

---------

Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com>
2024-11-21 15:54:56 -06:00
Michelle Ark
a42303c3af make microbatch models skippable (#11020) 2024-11-21 12:40:37 -05:00
Jeremy Cohen
6fccfe84ea Fix plural of "partial success" (#11002) 2024-11-21 11:33:45 -05:00
Michelle Ark
fd6ec71dab Microbatch parallelism (#10958) 2024-11-21 00:31:47 -05:00
Gerda Shank
ae957599e1 Fix restrict-access to not restrict within same package (#11014) 2024-11-20 19:05:54 -05:00
Gerda Shank
f080346227 Use protobuf >=5.0,<=6.0 (#10969) 2024-11-19 17:37:19 -05:00
Doug Beatty
2a75dd4683 Parseable JSON and text output in quiet mode for dbt show and dbt compile (#9958)
* Allow `dbt show` and `dbt compile` to output JSON without extra logs

* Add `quiet` attribute for ShowNode and CompiledNode messages

* Output of protoc compiler

* Utilize the `quiet` attribute for ShowNode and CompiledNode

* Reuse the `dbt list` approach when the `--quiet` flag is used

* Use PrintEvent to get to stdout even if the logger is set to ERROR

* Functional tests for quiet compile

* Functional tests for quiet show

* Fire event same way regardless if LOG_FORMAT is json or not

* Switch back to firing ShowNode and CompiledNode events

* Make `--inline-direct` to be quiet-compatible

* Temporarily change to dev branch for dbt-common

* Remove extraneous newline

* Functional test for `--quiet` for `--inline-direct` flag

* Update changelog entry

* Update `core_types_pb2.py`

* Restore the original branch in `dev-requirements.txt`

---------

Co-authored-by: Kshitij Aranke <kshitij.aranke@dbtlabs.com>
2024-11-18 21:37:44 -07:00
Michelle Ark
945539e3ae add index.html to .gitignore (#11008) 2024-11-15 17:31:08 -05:00
bruno messias
84230ce333 fix: override materialization python models (#8538) 2024-11-14 10:31:23 -08:00
Michelle Ark
35c09203ad fire GenericExceptionOnRun for batch-level exception (#11003) 2024-11-14 12:50:16 -05:00
bruno messias
1625eb059a fix: unit tests with versioned refs (#10889) 2024-11-14 11:41:45 -05:00
Kshitij Aranke
2c43af897d Fix #10988: Validate manifest has group_map during group_lookup init (#10995) 2024-11-14 10:59:34 -05:00
Quigley Malcolm
6e1f64f8b4 Bump minimum dbt-adapters requirement to 1.9.0 (#10998)
This is needed for dbt-core + dbt-adapters to work properly in regards to
the microbatch project_flag/behavior flag `require_batched_execution_for_custom_microbatch_strategy`
2024-11-13 13:19:01 -06:00
Michelle Ark
e9a2b548cb fix deprecation firing for microbatch model w custom strategy (#10989) 2024-11-13 13:52:11 -05:00
Michelle Ark
89caa33fb4 Replace environment variable with a project flag to gate microbatch functionality (#10799)
* first pass: replace os env with project flag

* Fix `TestMicrobatchMultipleRetries` to not use `os.env`

* Turn off microbatch project flag for `TestMicrobatchCustomUserStrategyDefault` as it was prior to a9df50f

* Update `BaseMicrobatchTest` to turn on microbatch via project flags

* Add changie doc

* Fix functional tests after merging in main

* Add function to that determines whether the new microbatch functionality should be used

The new microbatch functionality is, unfortunately, potentially dangerous. That is
it adds a new materalization strategy `microbatch` which an end user could have
defined as a custom strategy previously. Additionally we added config keys to nodes,
and as `config` is just a Dict[str, Any], it could contain anything, thus meaning
people could already be using the configs we're adding for different purposes. Thus
we need some intellegent gating. Specifically something that adheres to the following:

cms = Custom Microbatch Strategy
abms = Adapter Builtin Microbatch Strategy
bf = Behavior flag
umb = Use Microbatch Batching
t/f/e = True/False/Error

| cms | abms | bf | umb |
| t   | t    | t  | t   |
| f   | t    | t  | t   |
| t   | f    | t  | t   |
| f   | f    | t  | e   |
| t   | t    | f  | f   |
| f   | t    | f  | t   |
| t   | f    | f  | f   |
| f   | f    | f  | e   |

(The above table assumes that there is a microbatch model present in the project)

In order to achieve this we need to check that either the microbatch behavior
flag is set to true OR microbatch materializaion being used is the _root_ microbatch
materialization (i.e. not custom). The function we added in this commit,
`use_microbatch_batches`, does just that.

* Gate microbatch functionality by `use_microbatch_batches` manifest function

* Rename microbatch behavior flag to `require_batched_execution_for_custom_microbatch_strategy`

* Extract logic of `find_macro_by_name` to `find_macro_candiate_by_name`

In 0349968c615444de05360509ddeaf6d75d41d826 I had done this for the function
`find_materialization_macro_by_name`, but that wasn't the right function to
do it to, and will be reverted shortly. `find_materialization_macro_by_name`
is used for finding the general materialization macro, whereas `find_macro_by_name`
is more general. For the work we're doing, we need to find the microbatch
macro, which is not a materialization macro.

* Use `find_macro_candidate_by_name` to find the microbatch macro

* Fix microbatch macro locality check to search for `core` locality instead of `root`

Previously were were checking for a locality of `root`. However, a locality
of `root` means it was provided by a `package`. We wnt to check for locality
of `core` which basically means `builtin via dbt-core/adapters`. There is
another locality `imported` which I beleive means it comes from another
package.

* Move the evaluation of `use_microbatch_batches` to the last position in boolean checks

The method `use_microbatch_batches` is always invoked to evaluate an `if`
statement. In most instances, it is part of a logic chain (i.e. there are
multiple things being evaluated in the `if` statement). In `if` statements
where there are multiple things being evaulated, `use_microbatch_batches`
should come _last_ (or as late as possible). This is because it is likely
the most costly thing to evaluate in the logic chain, and thus any shortcuts
cuts via other evaluations in the if statement failing (and thus avoiding
invoking `use_microbatch_batches) is desirable.

* Drop behavior flag setting for BaseMicrobatchTest tests

* Rename 'env_var' to 'project_flag' in test_microbatch.py

* Update microbatch tests to assert when we are/aren't running with batches

* Update `test_resolve_event_time_filter` to use `use_microbatch_batches`

* Fire deprecation warning for custom microbatch macros

* Add microbatch deprecation events to test_events.py

---------

Co-authored-by: Quigley Malcolm <quigley.malcolm@dbtlabs.com>
2024-11-11 08:49:17 -06:00
Michelle Ark
30b8a92e38 [Fix] assert resolver.model is ModelNode prior to resolving event_time_filter (#10975) 2024-11-06 16:02:41 -05:00
FishtownBuildBot
b95f7a7f2c [Automated] Merged prep-release/1.9.0b4_11711043647 into target main during release process 2024-11-06 15:37:57 -05:00
Michelle Ark
e451a371e6 Ensure inferred primary_key is a List[str] (#10984) 2024-11-06 15:31:54 -05:00
Tim Sturge
81067d4fc4 Support disabling unit tests (#10831) 2024-11-06 15:20:35 -05:00
Github Build Bot
3198ce4809 Bumping version to 1.9.0b4 and generate changelog v1.9.0b4 2024-11-06 20:08:59 +00:00
Emily Rockman
0c51985c83 upgrade macos version (#10974)
* upgrade to macos-latest

* force link
2024-11-06 11:56:08 -06:00
Devon Fulcher
e26af57989 Behavior change cumulative type param (#10909)
* Behavior change for mf timespinse without yaml config

* Flipping behavior flag causes parse error

* Added more tests

* Appending just one error
2024-11-05 14:22:56 -08:00
Gerda Shank
bdf28d7eff Support --empty option for 'snapshot' command (#10962) 2024-11-01 13:47:28 -04:00
Quigley Malcolm
289d2dd932 Ensure KeyboardInterrupt halts microbatch model execution (#10879) 2024-10-31 13:35:44 -05:00
Devon Fulcher
8a17a0d7e7 Behavior change for mf timespine without yaml configuration (#10857) 2024-10-31 11:40:39 -04:00
Quigley Malcolm
8c6bec4fb5 Emit ArtifactWritten event when artifacts are written (#10940)
* Add new `ArtifactWritten` event

* Emit ArtifactWritten event whenever an artifact is written

* Get artifact_type from class name for ArtifactWritten event

* Add changie docs

* Add test to check that ArtifactWritten events are being emitted

* Regenerate core_types_pb2.py using correct protobuf version

* Regen core_types_pb2 again, using a more correct protoc version
2024-10-30 15:05:50 -05:00
Quigley Malcolm
7f5abdc565 Ensure run results artifact get written during "after run hooks" (#10941)
* Add unit tests to check how `safe_run_hooks` handles exceptions

* Improve exception handling in `get_execution_status`

Previously in `get_execution_status` if a non `DbtRuntimeError` exception was
raised, the finally would be entered, but the `status`/`message` would not be
set, and thus a `status not defined` exception would get raised on attempting
to return. Tangentially, there is another issue where somehow the `node_status`
is becoming `None`. In all my playing with `get_execution_status` I found that
trying to return an undefined variable in the `finally` caused an undefined
variable exception. However, if in some python version, it instead just handed
back `None`, then this fix should also solve that.

* Add changie docs

* Ensure run_results get written if KeyboardInterrupt happens during end run hooks
2024-10-30 14:53:09 -05:00
FishtownBuildBot
f714e84282 [Automated] Merged prep-release/1.9.0b3_11600026512 into target main during release process 2024-10-30 15:39:06 -04:00
Github Build Bot
7f92c6e003 Bumping version to 1.9.0b3 and generate changelog v1.9.0b3 2024-10-30 19:12:00 +00:00
Quigley Malcolm
8de0229a04 Bump dbt adapters minior minimum to 1.8.0 (#10947)
* Bump minimum dbt-adpaters to 1.8.0

In https://github.com/dbt-labs/dbt-core/pull/10859 we started using the
`get_adapter_run_info` method provided by `dbt-adapters`. However that
function is only available in dbt-adapters >= 1.8.0. Thus 1.8.0 is our
new minimum for dbt-adapters.

* Add changie doc
2024-10-30 14:07:56 -05:00
Quigley Malcolm
dd77210756 Update microbatch end_time to the batch_size ceiling (#10883)
* Add function to MicrobatchBuilder to get ceiling of timestamp by batch_size

* Update `MicrobatchBuilder.build_end_time` to use `ceiling_timestamp`

* fix TestMicrobatchBuilder.test_build_end_time by specifying a BatchSize + asserting actual is a ceiling timestamp

* Add changie

---------

Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>
2024-10-29 17:26:28 -05:00
Quigley Malcolm
8df5c96f3d Make --event-time-start and --event-time-end mutually required (#10878)
* Stop validating that `--event-time-start` is before "current" time

In the next commit we'll be adding a validation that requires that `--event-time-start`
and `--event-time-end` are mutually required. That is, whenever one is specified,
the other is required. In that world, `--event-time-start` will never need to be compared
against the "current" time, because it'll never be run in conjunction with the "current"
time.

* Validate that `--event-time-start` and `--event-time-end` are mutually present

* Add changie doc for validation changes

* Alter functional microbatch tests to work with updated `event_time_start/end` reqs

We made it such that when `event_time_start` is specified, `event_time_end` must also
be specified (and vice versa). This broke numerous tests, in a few different ways:

1. There were tests that used `--event-time-start` without `--event-time-end` butg
were using event_time_start essentially as the `begin` time for models being initially
built or full refreshed. These tests could simply drop the `--event-time-start` and
instead rely on the `begin` value.

2. There was a test  that was trying to load a subset of the data _excluding_ some
data which would be captured by using `begin`. In this test we added an appropriate
`--event-time-end` as the `--event-time-start` was necessary to statisfy what the
test was testing

3. There was a test which was trying to ensure that two microbatch models would be
given the same "current" time. Because we wanted to ensure the "current" time code
path was used, we couldn't add `--event-time-end` to resolve the problem, thus we
needed to remove the `--event-time-start` that was being used. However, this led to
the test being incredibly slow. This was resolved by switching the relevant microbatch
models from having `batch_size`s of `day` to instead have `year`. This solution should
be good enough for roughly ~40 years? We'll figure out a better solution then, so see ya
in 2064. Assuming I haven't died before my 70th birthday, feel free to ping me to get
this taken care of.

---------

Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>
2024-10-29 15:31:19 -05:00
Michelle Ark
6b5db1796f raise MicrobatchModelNoEventTimeInputs warning when no microbatch input has event_time config (#10929) 2024-10-29 11:20:44 -04:00
Michelle Ark
3224589fe7 restore dev-requirements for dbt-adapters@main (#10930) 2024-10-28 17:56:30 -04:00
Michelle Ark
b71ceb3166 Microbatch: store model context var as dict, not ModelNode (#10917) 2024-10-28 17:26:49 -04:00
Mila Page
4d4b05effc Add adapter telemetry to snowplow event. (#10859)
* Add adapter telemetry to snowplow event.

* Temporary dev branch switch.

* Set tracking for overrideable adapter method.

* Do safer adapter ref.

* Improve comment.

* Code review comments.

* Don't call the asdict on a dict.

* Bump ci to pull in fix from base adapter.

* Add unit tests for coverage.

* Update field name from base adapter/schema change.

* remove breakpoint.
2024-10-28 14:21:42 -07:00
Michelle Ark
316ecfca28 Fix: Source quoting ignores global configuration (#10905) 2024-10-25 10:33:21 -04:00
Quigley Malcolm
d07bfda9df Change microbatch lookback default from 0 to 1 (#10876)
* Change `lookback` default from `0` to `1`

* Regen jsonschema manifest v12 to include `lookback` default change

* Regen saved state of v12 manifest for functional artifact testing

* Add changie doc for lookback default change
2024-10-24 17:16:32 -05:00
Doug Beatty
8ae689c674 Fix regression when an exposure references a deprecated model (#10915)
* Avoid a KeyError if `child_unique_id` is not found in the dictionary

* Changelog entry

* Functional test when an exposure references a deprecated model
2024-10-24 12:13:56 -06:00
Gerda Shank
bdb79e8626 Partial parse yaml snapshots (#10907) 2024-10-23 14:16:33 -04:00
Gerda Shank
f7b7935a97 Support multiple unique keys in snapshots (#10795) 2024-10-22 14:47:51 -04:00
Peter Webb
3d96b4e36c Loosen Type in TimingInfo (#10897) 2024-10-21 19:01:15 -04:00