dbt-core

Author	SHA1	Message	Date
Michelle Ark	1b7d9b5704	[Tidy first] move microbatch compilation to .compile method (#11063 )	2024-11-27 19:08:36 -05:00
Quigley Malcolm	c3d87b89fb	Add `batch` context object to microbatch jinja context (#11031 ) * Add `batch_id` to jinja context of microbatch batches * Add changie doc * Update `format_batch_start` to assume `batch_start` is always provided * Add "runtime only" property `batch_context` to `ModelNode` By it being "runtime only" we mean that it doesn't exist on the artifact and thus won't be written out to the manifest artifact. * Begin populating `batch_context` during materialization execution for microbatch batches * Fix circular import * Fixup MicrobatchBuilder.batch_id property method * Ensure MicrobatchModelRunner doesn't double compile batches We were compiling the node for each batch _twice_. Besides making microbatch models more expensive than they needed to be, double compiling wasn't causing any issue. However the first compilation was happening _before_ we had added the batch context information to the model node for the batch. This was leading to models which try to access the `batch_context` information on the model to blow up, which was undesirable. As such, we've now gone and skipped the first compilation. We've done this similar to how SavedQuery nodes skip compilation. * Add `__post_serialize__` method to `BatchContext` to ensure correct dict shape This is weird, but necessary, I apologize. Mashumaro handles the dictification of this class via a compile time generated `to_dict` method based off of the _typing_ of th class. By default `datetime` types are converted to strings. We don't want that, we want them to stay datetimes. * Update tests to check for `batch_context` * Update `resolve_event_time_filter` to use new `batch_context` * Stop testing for batchless compiled code for microbatch models In `45daec72f4` we stopped an extra compilation that was happening per batch prior to the batch_context being loaded. Stopping this extra compilation means that compiled sql for the microbatch model without the event time filter / batch context is no longer produced. We have discussed this and _believe_ it is okay given that this is a new node type that has not hit GA yet. * Rename `ModelNode.batch_context` to `ModelNode.batch` * Rename `build_batch_context` to `build_jinja_context_for_batch` The name `build_batch_context` was confusing as 1) We have a `BatchContext` object, which the method was not building 2) The method builds the jinja context for the batch As such it felt appropriate to rename the method to more accurately communicate what it does. * Rename test macro `invalid_batch_context_macro_sql` to `invalid_batch_jinja_context_macro_sql` This rename was to make it more clear that the jinja context for a batch was being checked, as a batch_context has a slightly different connotation. * Update changie doc	2024-11-27 16:06:41 -06:00
Quigley Malcolm	0f084e16ca	Rename internal `batch_info` variable to `previous_batch_results` (#11056 ) * Rename `batch_info` to `previous_batch_results` * Exclude `previous_batch_results` from serialization of model node to avoid jinja context bloat * Drop `previous_batch_results` key from `test_manifest.py` unit tests In 4050e377ec01c2f14dd9600fe704ddb34adb66fa we began excluding `previous_batch_results` from the serialized representation of the ModelNode. As such, we no longer need to check for it in `test_manifest.py`.	2024-11-27 10:46:45 -06:00
Apoorv Mehrotra	3464be7f70	Fixes dbt retry does not respect --threads (#10591 )	2024-11-26 11:21:46 -08:00
Gerda Shank	407f6caa1c	Pin mashumaro to <3.15 (#11046 )	2024-11-25 10:49:17 -05:00
Peter Webb	ad575ec699	Add New Config Properties and Schema for Snapshot Hard Deletes (#10972 ) * Add changelog entry. * Update schemas and test fixtures for new snapshot meta-column * Add back comment.	2024-11-21 18:15:30 -05:00
Kshitij Aranke	f582ac2488	Fix #11012 : Catch DbtRuntimeError for hooks (#11023 )	2024-11-21 22:27:45 +00:00
Gerda Shank	f5f0735d00	Bump libpq-dev to 13.18-0+deb11u1 in docker/Dockerfile (#11029 )	2024-11-21 17:24:53 -05:00
FishtownBuildBot	3abf575fa6	Cleanup main after cutting new 1.9.latest branch (#11027 ) * Clean up changelog on main * Bumping version to 1.10.0a1 * Code quality cleanup * add 1.8,1.9 link --------- Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com>	2024-11-21 15:54:56 -06:00
Michelle Ark	a42303c3af	make microbatch models skippable (#11020 )	2024-11-21 12:40:37 -05:00
Jeremy Cohen	6fccfe84ea	Fix plural of "partial success" (#11002 )	2024-11-21 11:33:45 -05:00
Michelle Ark	fd6ec71dab	Microbatch parallelism (#10958 )	2024-11-21 00:31:47 -05:00
Gerda Shank	ae957599e1	Fix restrict-access to not restrict within same package (#11014 )	2024-11-20 19:05:54 -05:00
Gerda Shank	f080346227	Use protobuf >=5.0,<=6.0 (#10969 )	2024-11-19 17:37:19 -05:00
Doug Beatty	2a75dd4683	Parseable JSON and text output in quiet mode for `dbt show` and `dbt compile` (#9958 ) * Allow `dbt show` and `dbt compile` to output JSON without extra logs * Add `quiet` attribute for ShowNode and CompiledNode messages * Output of protoc compiler * Utilize the `quiet` attribute for ShowNode and CompiledNode * Reuse the `dbt list` approach when the `--quiet` flag is used * Use PrintEvent to get to stdout even if the logger is set to ERROR * Functional tests for quiet compile * Functional tests for quiet show * Fire event same way regardless if LOG_FORMAT is json or not * Switch back to firing ShowNode and CompiledNode events * Make `--inline-direct` to be quiet-compatible * Temporarily change to dev branch for dbt-common * Remove extraneous newline * Functional test for `--quiet` for `--inline-direct` flag * Update changelog entry * Update `core_types_pb2.py` * Restore the original branch in `dev-requirements.txt` --------- Co-authored-by: Kshitij Aranke <kshitij.aranke@dbtlabs.com>	2024-11-18 21:37:44 -07:00
Michelle Ark	945539e3ae	add index.html to .gitignore (#11008 )	2024-11-15 17:31:08 -05:00
bruno messias	84230ce333	fix: override materialization python models (#8538 )	2024-11-14 10:31:23 -08:00
Michelle Ark	35c09203ad	fire GenericExceptionOnRun for batch-level exception (#11003 )	2024-11-14 12:50:16 -05:00
bruno messias	1625eb059a	fix: unit tests with versioned refs (#10889 )	2024-11-14 11:41:45 -05:00
Kshitij Aranke	2c43af897d	Fix #10988 : Validate manifest has group_map during group_lookup init (#10995 )	2024-11-14 10:59:34 -05:00
Quigley Malcolm	6e1f64f8b4	Bump minimum dbt-adapters requirement to 1.9.0 (#10998 ) This is needed for dbt-core + dbt-adapters to work properly in regards to the microbatch project_flag/behavior flag `require_batched_execution_for_custom_microbatch_strategy`	2024-11-13 13:19:01 -06:00
Michelle Ark	e9a2b548cb	fix deprecation firing for microbatch model w custom strategy (#10989 )	2024-11-13 13:52:11 -05:00
Michelle Ark	89caa33fb4	Replace environment variable with a project flag to gate microbatch functionality (#10799 ) * first pass: replace os env with project flag * Fix `TestMicrobatchMultipleRetries` to not use `os.env` * Turn off microbatch project flag for `TestMicrobatchCustomUserStrategyDefault` as it was prior to a9df50f * Update `BaseMicrobatchTest` to turn on microbatch via project flags * Add changie doc * Fix functional tests after merging in main * Add function to that determines whether the new microbatch functionality should be used The new microbatch functionality is, unfortunately, potentially dangerous. That is it adds a new materalization strategy `microbatch` which an end user could have defined as a custom strategy previously. Additionally we added config keys to nodes, and as `config` is just a Dict[str, Any], it could contain anything, thus meaning people could already be using the configs we're adding for different purposes. Thus we need some intellegent gating. Specifically something that adheres to the following: cms = Custom Microbatch Strategy abms = Adapter Builtin Microbatch Strategy bf = Behavior flag umb = Use Microbatch Batching t/f/e = True/False/Error \| cms \| abms \| bf \| umb \| \| t \| t \| t \| t \| \| f \| t \| t \| t \| \| t \| f \| t \| t \| \| f \| f \| t \| e \| \| t \| t \| f \| f \| \| f \| t \| f \| t \| \| t \| f \| f \| f \| \| f \| f \| f \| e \| (The above table assumes that there is a microbatch model present in the project) In order to achieve this we need to check that either the microbatch behavior flag is set to true OR microbatch materializaion being used is the _root_ microbatch materialization (i.e. not custom). The function we added in this commit, `use_microbatch_batches`, does just that. * Gate microbatch functionality by `use_microbatch_batches` manifest function * Rename microbatch behavior flag to `require_batched_execution_for_custom_microbatch_strategy` * Extract logic of `find_macro_by_name` to `find_macro_candiate_by_name` In 0349968c615444de05360509ddeaf6d75d41d826 I had done this for the function `find_materialization_macro_by_name`, but that wasn't the right function to do it to, and will be reverted shortly. `find_materialization_macro_by_name` is used for finding the general materialization macro, whereas `find_macro_by_name` is more general. For the work we're doing, we need to find the microbatch macro, which is not a materialization macro. * Use `find_macro_candidate_by_name` to find the microbatch macro * Fix microbatch macro locality check to search for `core` locality instead of `root` Previously were were checking for a locality of `root`. However, a locality of `root` means it was provided by a `package`. We wnt to check for locality of `core` which basically means `builtin via dbt-core/adapters`. There is another locality `imported` which I beleive means it comes from another package. * Move the evaluation of `use_microbatch_batches` to the last position in boolean checks The method `use_microbatch_batches` is always invoked to evaluate an `if` statement. In most instances, it is part of a logic chain (i.e. there are multiple things being evaluated in the `if` statement). In `if` statements where there are multiple things being evaulated, `use_microbatch_batches` should come _last_ (or as late as possible). This is because it is likely the most costly thing to evaluate in the logic chain, and thus any shortcuts cuts via other evaluations in the if statement failing (and thus avoiding invoking `use_microbatch_batches) is desirable. * Drop behavior flag setting for BaseMicrobatchTest tests * Rename 'env_var' to 'project_flag' in test_microbatch.py * Update microbatch tests to assert when we are/aren't running with batches * Update `test_resolve_event_time_filter` to use `use_microbatch_batches` * Fire deprecation warning for custom microbatch macros * Add microbatch deprecation events to test_events.py --------- Co-authored-by: Quigley Malcolm <quigley.malcolm@dbtlabs.com>	2024-11-11 08:49:17 -06:00
Michelle Ark	30b8a92e38	[Fix] assert resolver.model is ModelNode prior to resolving event_time_filter (#10975 )	2024-11-06 16:02:41 -05:00
FishtownBuildBot	b95f7a7f2c	[Automated] Merged prep-release/1.9.0b4_11711043647 into target main during release process	2024-11-06 15:37:57 -05:00
Michelle Ark	e451a371e6	Ensure inferred primary_key is a List[str] (#10984 )	2024-11-06 15:31:54 -05:00
Tim Sturge	81067d4fc4	Support disabling unit tests (#10831 )	2024-11-06 15:20:35 -05:00
Github Build Bot	3198ce4809	Bumping version to 1.9.0b4 and generate changelog v1.9.0b4	2024-11-06 20:08:59 +00:00
Emily Rockman	0c51985c83	upgrade macos version (#10974 ) * upgrade to macos-latest * force link	2024-11-06 11:56:08 -06:00
Devon Fulcher	e26af57989	Behavior change cumulative type param (#10909 ) * Behavior change for mf timespinse without yaml config * Flipping behavior flag causes parse error * Added more tests * Appending just one error	2024-11-05 14:22:56 -08:00
Gerda Shank	bdf28d7eff	Support --empty option for 'snapshot' command (#10962 )	2024-11-01 13:47:28 -04:00
Quigley Malcolm	289d2dd932	Ensure `KeyboardInterrupt` halts microbatch model execution (#10879 )	2024-10-31 13:35:44 -05:00
Devon Fulcher	8a17a0d7e7	Behavior change for mf timespine without yaml configuration (#10857 )	2024-10-31 11:40:39 -04:00
Quigley Malcolm	8c6bec4fb5	Emit `ArtifactWritten` event when artifacts are written (#10940 ) * Add new `ArtifactWritten` event * Emit ArtifactWritten event whenever an artifact is written * Get artifact_type from class name for ArtifactWritten event * Add changie docs * Add test to check that ArtifactWritten events are being emitted * Regenerate core_types_pb2.py using correct protobuf version * Regen core_types_pb2 again, using a more correct protoc version	2024-10-30 15:05:50 -05:00
Quigley Malcolm	7f5abdc565	Ensure run results artifact get written during "after run hooks" (#10941 ) * Add unit tests to check how `safe_run_hooks` handles exceptions * Improve exception handling in `get_execution_status` Previously in `get_execution_status` if a non `DbtRuntimeError` exception was raised, the finally would be entered, but the `status`/`message` would not be set, and thus a `status not defined` exception would get raised on attempting to return. Tangentially, there is another issue where somehow the `node_status` is becoming `None`. In all my playing with `get_execution_status` I found that trying to return an undefined variable in the `finally` caused an undefined variable exception. However, if in some python version, it instead just handed back `None`, then this fix should also solve that. * Add changie docs * Ensure run_results get written if KeyboardInterrupt happens during end run hooks	2024-10-30 14:53:09 -05:00
FishtownBuildBot	f714e84282	[Automated] Merged prep-release/1.9.0b3_11600026512 into target main during release process	2024-10-30 15:39:06 -04:00
Github Build Bot	7f92c6e003	Bumping version to 1.9.0b3 and generate changelog v1.9.0b3	2024-10-30 19:12:00 +00:00
Quigley Malcolm	8de0229a04	Bump dbt adapters minior minimum to 1.8.0 (#10947 ) * Bump minimum dbt-adpaters to 1.8.0 In https://github.com/dbt-labs/dbt-core/pull/10859 we started using the `get_adapter_run_info` method provided by `dbt-adapters`. However that function is only available in dbt-adapters >= 1.8.0. Thus 1.8.0 is our new minimum for dbt-adapters. * Add changie doc	2024-10-30 14:07:56 -05:00
Quigley Malcolm	dd77210756	Update microbatch `end_time` to the `batch_size` ceiling (#10883 ) * Add function to MicrobatchBuilder to get ceiling of timestamp by batch_size * Update `MicrobatchBuilder.build_end_time` to use `ceiling_timestamp` * fix TestMicrobatchBuilder.test_build_end_time by specifying a BatchSize + asserting actual is a ceiling timestamp * Add changie --------- Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>	2024-10-29 17:26:28 -05:00
Quigley Malcolm	8df5c96f3d	Make `--event-time-start` and `--event-time-end` mutually required (#10878 ) * Stop validating that `--event-time-start` is before "current" time In the next commit we'll be adding a validation that requires that `--event-time-start` and `--event-time-end` are mutually required. That is, whenever one is specified, the other is required. In that world, `--event-time-start` will never need to be compared against the "current" time, because it'll never be run in conjunction with the "current" time. * Validate that `--event-time-start` and `--event-time-end` are mutually present * Add changie doc for validation changes * Alter functional microbatch tests to work with updated `event_time_start/end` reqs We made it such that when `event_time_start` is specified, `event_time_end` must also be specified (and vice versa). This broke numerous tests, in a few different ways: 1. There were tests that used `--event-time-start` without `--event-time-end` butg were using event_time_start essentially as the `begin` time for models being initially built or full refreshed. These tests could simply drop the `--event-time-start` and instead rely on the `begin` value. 2. There was a test that was trying to load a subset of the data _excluding_ some data which would be captured by using `begin`. In this test we added an appropriate `--event-time-end` as the `--event-time-start` was necessary to statisfy what the test was testing 3. There was a test which was trying to ensure that two microbatch models would be given the same "current" time. Because we wanted to ensure the "current" time code path was used, we couldn't add `--event-time-end` to resolve the problem, thus we needed to remove the `--event-time-start` that was being used. However, this led to the test being incredibly slow. This was resolved by switching the relevant microbatch models from having `batch_size`s of `day` to instead have `year`. This solution should be good enough for roughly ~40 years? We'll figure out a better solution then, so see ya in 2064. Assuming I haven't died before my 70th birthday, feel free to ping me to get this taken care of. --------- Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>	2024-10-29 15:31:19 -05:00
Michelle Ark	6b5db1796f	raise MicrobatchModelNoEventTimeInputs warning when no microbatch input has event_time config (#10929 )	2024-10-29 11:20:44 -04:00
Michelle Ark	3224589fe7	restore dev-requirements for dbt-adapters@main (#10930 )	2024-10-28 17:56:30 -04:00
Michelle Ark	b71ceb3166	Microbatch: store `model` context var as dict, not ModelNode (#10917 )	2024-10-28 17:26:49 -04:00
Mila Page	4d4b05effc	Add adapter telemetry to snowplow event. (#10859 ) * Add adapter telemetry to snowplow event. * Temporary dev branch switch. * Set tracking for overrideable adapter method. * Do safer adapter ref. * Improve comment. * Code review comments. * Don't call the asdict on a dict. * Bump ci to pull in fix from base adapter. * Add unit tests for coverage. * Update field name from base adapter/schema change. * remove breakpoint.	2024-10-28 14:21:42 -07:00
Michelle Ark	316ecfca28	Fix: Source quoting ignores global configuration (#10905 )	2024-10-25 10:33:21 -04:00
Quigley Malcolm	d07bfda9df	Change microbatch `lookback` default from `0` to `1` (#10876 ) * Change `lookback` default from `0` to `1` * Regen jsonschema manifest v12 to include `lookback` default change * Regen saved state of v12 manifest for functional artifact testing * Add changie doc for lookback default change	2024-10-24 17:16:32 -05:00
Doug Beatty	8ae689c674	Fix regression when an exposure references a deprecated model (#10915 ) * Avoid a KeyError if `child_unique_id` is not found in the dictionary * Changelog entry * Functional test when an exposure references a deprecated model	2024-10-24 12:13:56 -06:00
Gerda Shank	bdb79e8626	Partial parse yaml snapshots (#10907 )	2024-10-23 14:16:33 -04:00
Gerda Shank	f7b7935a97	Support multiple unique keys in snapshots (#10795 )	2024-10-22 14:47:51 -04:00
Peter Webb	3d96b4e36c	Loosen Type in TimingInfo (#10897 )	2024-10-21 19:01:15 -04:00

1 2 3 4 5 ...

6940 Commits