dbt-core

mirror of https://github.com/dbt-labs/dbt-core synced 2025-12-17 19:31:34 +00:00

Author	SHA1	Message	Date
Quigley Malcolm	15722264aa	Correct Function Node Property Names (#12065 ) * Fix function node property names `return_type` -> `returns` `return_type.type` -> `returns.data_type` `arguments[x].type` -> `arguments[x].data_type` * Add changie doc	2025-10-02 13:46:57 -05:00
Quigley Malcolm	8c929c337e	Add `type` property to function nodes (#12057 ) * Add `FunctionType` enum * Add `type` property to `Function` resource * Add `type` property to `ParsedFunctionPatch` and `UnparsedFunctionUpdate` * Begin populating a function's `type` during patch parsing * Regnerate v12 manifest to include function `type` property * Add changie doc * Begin testing that function node `type` property is setable and accessible * Move comment about triggering the PathEncoder back to its proper place	2025-09-26 15:29:08 -05:00
Quigley Malcolm	538de17f78	Initial Implementation of UDFs (#12054 ) * Allow for the defining of basic SQL UDFs (#11957) * Add initial definiton of the `Function` resource * Add FunctionNode definition to graph contracts * Add test which checks whether basic UDFs can be parsed This test fails right now, which is intentional. This is test driven development. Now I do work to maket the test pass :) * Add basic function sql parser for UDFs, and plumb it through parsing code paths * Begin populating `functions` in the ref lookup * Begin patching `function` nodes with their yaml definitions Of note, presently `arguments` and `return_type` aren't populating properly. It's likely that we'll have to do additional work to the FunctionPatchParser to get this _fully_ working. * Increase responsibility of FunctionPatchParser to handle entire `parse_patch` of function nodes * Fix testing suite to accomodate addition of new `function` node * Add changie doc for new `function` node type * Minor refactoring of `NodePatchParser.parse_patch` to reduce code duplication in `FunctionPatchParser` * Ability to list and select function nodes (#11968) * Begin listing `function` nodes in `list` command * Add ability to run `list` specifying the `function` resource type * Function nodes are support selection via: name, file path, and resource type * Add changie doc * Core handles lifecycle of function nodes (#12008) * Add basic test to check that UDFs get created in data warehouse * Add functions to the runner map of \ operation * Add basic stub of `FunctionRunner` modeled after `SeedRunner` * Begin using `FunctionRunner` for running `function` nodes * Add stubbing of things to implement on `FunctionRunner` * Initial implementation of execution of function nodes This is largely a copy of the execution of model nodes (in run.py) but with some abstractions into helper methods to make the body of the `execute` function easier to follow. Of note, right now this appears to be getting the incorrect macro from the adapter. This is likely because for some reason the node's materialization config is being set to `view` by default. * Ensure parsed function nodes get the correct materialization type * Begin generating context for `function` materialization macro * Stub out adapter response in node result as it was causing some failures * Correct the adapter response in the run result for functions * Begin logging `LogFunctionResult` event for completed function nodes * Add changie doc * Temp update dev reqs to point at branch of dbt-adapters * Add test `LogFunctionResult` event to serialization test * Add `function` nodes to the `WritableManifest` * Fix tests * Remove no longer relevant `TODO`s from `function.py` * Add a new macro `function()` to the jinja context for using functions (#12031) * Update function tests to look for `functions` under `manifest.functions` * Begin storing funciton nodes in `Manifest.functions` instead of `Manifest.nodes` * Ensure function nodes are still included in nodes to run during `build` * Add ability to lookup functions on the manifest * Update patch parsing of function YAML files now that functions live on `Manifest.functions` * Mark function nodes as no longer refable * Ensure function nodes are still selectable * Add `function` macro! * Ensure functions nodes are correctly linked in the DAG * Update jinja context tests to expect `function` macro to exist * Fix unit tests in test suite to expect function nodes * Add changie doc * regen v12.json jsonschema * Fix test `TestVerifyArtifacts::test_run_and_generate` * Fix test `TestVerifyArtifactsReferences::test_references` * Fix test `TestVerifyArtifactsVersions::test_versions` * Regen manifest artifact for `TestPreviousVersionState::test_compare_state_current` * Update `_iterate_selected_nodes` to support function nodes * Ensure we process node functions to ensure they get added to the `depends_on` * Take functions into account for state modified * Regen data for `TestModifiedStateSchemaEvolution::test_modified_state_schema_evolution` test * Default `functions` property on `WritableManifest` to a dict I'm not sure if this is actually how we want to do this. However, without doing this the `WritableManifest` will break on loading of older manifests that don't have `functions`. The alternative to this would be to bump the schema version (v12 -> v13) and create an upgrade in `upgrade_manifest.py`. * Update UDF tests to use a more general purpose function * Add tests ensuring UDFs can be used in models and `--inline` queries * Correct `ParseFunctionResolver` so that the name isn't added twice to the function args spec * Drop `functions` from `Exposure` and `Metric` definitions * Regen v12 manifest schema * Remove unnecessary string interpolation * Point dev reqs back to dbt-adapters@main * Empty commit	2025-09-26 13:41:45 -05:00
Michelle Ark	2f842055f0	Add run_started_at to manifest.json metadata (#12047 )	2025-09-25 11:56:12 -04:00
Michelle Ark	972eb23d03	add config to columns (#11671 )	2025-05-26 21:06:09 -04:00
Ani Venkateshwaran	0db83d0abd	adding quoting to manifest metadata (#11666 )	2025-05-23 13:51:19 -07:00
Michelle Ark	4a8f9c181c	Support config.meta and description on groups + add to happy path testing (#11649 )	2025-05-22 20:09:05 -04:00
Gerda Shank	e60b41d9fa	Add invocation_started_at (#11291 )	2025-02-18 11:32:04 -05:00
Kshitij Aranke	f29836fcf3	Round 2: Add doc_blocks to manifest for nodes and columns (#11294 ) * Reapply "Add `doc_blocks` to manifest for nodes and columns (#11224)" (#11283) This reverts commit `55e0df181f`. * Expand doc_blocks backcompat test * Refactor to method, add docstring	2025-02-11 16:01:16 +00:00
Kshitij Aranke	55e0df181f	Revert "Add `doc_blocks` to manifest for nodes and columns (#11224 )" (#11283 ) This reverts commit `d71f309c1e`.	2025-02-07 17:12:06 +00:00
Chenyu Li	c0423707b0	loosen validation for freshness (#11253 )	2025-01-28 14:20:36 -08:00
Kshitij Aranke	d71f309c1e	Add `doc_blocks` to manifest for nodes and columns (#11224 )	2025-01-27 19:49:02 +00:00
Chenyu Li	a8702b8374	add model freshness for adaptive job (#11170 )	2025-01-07 10:02:52 -08:00
Chenyu Li	f2222d2621	Custom SQL for get source maxLoadedAt (#11163 )	2024-12-19 11:49:07 -08:00
Patrick Yost	97ffc37405	Add tags to SavedQueries (#10987 )	2024-12-19 10:18:50 -08:00
Chenyu Li	4b1f1c4029	add allow additional property for Model and SourceDefinition (#11138 )	2024-12-15 23:30:48 -08:00
William Deng	ff6745c795	Update core to support DSI 0.8.3 (#10990 ) Co-authored-by: Courtney Holcomb <courtneyeholcomb@gmail.com>	2024-12-05 09:48:33 -08:00
Peter Webb	ad575ec699	Add New Config Properties and Schema for Snapshot Hard Deletes (#10972 ) * Add changelog entry. * Update schemas and test fixtures for new snapshot meta-column * Add back comment.	2024-11-21 18:15:30 -05:00
Michelle Ark	fd6ec71dab	Microbatch parallelism (#10958 )	2024-11-21 00:31:47 -05:00
Tim Sturge	81067d4fc4	Support disabling unit tests (#10831 )	2024-11-06 15:20:35 -05:00
Quigley Malcolm	d07bfda9df	Change microbatch `lookback` default from `0` to `1` (#10876 ) * Change `lookback` default from `0` to `1` * Regen jsonschema manifest v12 to include `lookback` default change * Regen saved state of v12 manifest for functional artifact testing * Add changie doc for lookback default change	2024-10-24 17:16:32 -05:00
Gerda Shank	f7b7935a97	Support multiple unique keys in snapshots (#10795 )	2024-10-22 14:47:51 -04:00
Peter Webb	3d96b4e36c	Loosen Type in TimingInfo (#10897 )	2024-10-21 19:01:15 -04:00
Kshitij Aranke	ba6c7baf1d	[Tidy-First]: Fix `timings` object for hooks and macros, and make types of timings explicit (#10882 ) * [Tidy-First]: Fix `timings` object for hooks and macros, and make types of timings explicit * cast literal to str * change test * change jsonschema to enum * Discard changes to schemas/dbt/manifest/v12.json * nits --------- Co-authored-by: Chenyu Li <chenyu.li@dbtlabs.com>	2024-10-18 17:28:58 -04:00
Paul Yang	8be063502b	Add `order_by` and `limit` fields to saved queries (#10532 ) * Add `order_by` and `limit` fields to saved queries. * Update JSON schema * Add change log for #10531. * Check order by / limit in saved-query parsing test.	2024-10-17 10:54:30 -07:00
Gerda Shank	c7d8693f70	Enable setting datetime value for dbt_valid_to when the record is current (#10780 )	2024-10-10 18:41:03 -04:00
Kshitij Aranke	6b9c1da1ae	Revert "state:modified vars, behind "state_modified_compare_vars" behaviour flag" (#10793 ) (#10813 )	2024-10-02 21:00:48 +01:00
Michelle Ark	a86e2b4ffc	[state:modified] store unrendered_database and unrendered_schema on source definition for state:modified comparisons (#10675 )	2024-09-30 17:50:33 +02:00
Michelle Ark	d1857b39ca	state:modified vars, behind "state_modified_compare_vars" behaviour flag (#10793 )	2024-09-30 16:32:37 +02:00
Quigley Malcolm	1fd4d2eae6	Enable `retry` support for Microbatch models (#10751 ) * Add `PartialSuccess` status type and use it for microbatch models with mixed results * Handle `PartialSuccess` in `interpret_run_result` * Add `BatchResults` object to `BaseResult` and begin tracking during microbatch runs * Ensure batch_results being propagated to `run_results` artifact * Move `batch_results` from `BaseResult` class to `RunResult` class * Move `BatchResults` and `BatchType` to separate arifacts file to avoid circular imports In our next commit we're gonna modify `dbt/contracts/graph/nodes.py` to import the `BatchType` as part of our work to implement dbt retry for microbatch model nodes. Unfortunately, the import in `nodes.py` creates a circular dependency because `dbt/artifacts/schemas/results.py` imports from `nodes.py` and `dbt/artifacts/schemas/run/v5/run.py` imports from that `results.py`. Thus the new import creates a circular import. Now this _shouldn't_ be necessary as nothing in artifacts should import from the rest of dbt-core. However, we do. We should fix this, but this is also out of scope for this segement of work. * Add `PartialSuccess` as a retry-able status, and use batches to retry microbatch models * Fix BatchType type so that the first datetime is no longer Optional * Ensure `PartialSuccess` causes skipping of downstream nodes * Alter `PartialSuccess` status to be considered an error in `interpret_run_result` * Update schemas and test artifacts to include new batch_results run results key * Add functional test to check that 'dbt retry' retries 'PartialSuccess' models * Update partition failure test to assert downstream models are skipped * Improve `success`/`error`/`partial success` messaging for microbatch models * Include `PartialSuccess` in status that `--fail-fast` counts as a failure * Update `LogModelResult` to handle partial successes * Update `EndOfRunSummary` to handle partial successes * Cleanup TODO comment * Raise a DbtInternalError if we get a batch run result without `batch_results` * When running a microbatch model with supplied batches, force non full-refresh behavior This is necessary because of retry. Say on the initial run the microbatch model succeeds on 97% of it's batches. Then on retry it does the last 3%. If the retry of the microbatch model executes in full refresh mode it _might_ blow away the 97% of work that has been done. This edge case seems to be adapter specific. * Only pass batches to retry for microbatch model when there was a PartialSuccess In the previous commit we made it so that retries of microbatch models wouldn't run in full refresh mode when the microbatch model to retry has batches already specified from the prior run. This is only problematic when the run being retried was a full refresh AND all the batches for a given microbatch model failed. In that case WE DO want to do a full refresh for the given microbatch model. To better outline the problem, consider the following: * a microbatch model had a begin of `2020-01-01` and has been running this way for awhile * the begin config has changed to `2024-01-01` and dbt run --full-refresh gets run * every batch for an microbatch model fails * on dbt retry the the relation is said to exist, and the now out of range data (2020-01-01 through 2023-12-31) is never purged To avoid this, all we have to do is ONLY pass the batch information for partially successful microbatch models. Note: microbatch models only have a partially successful status IFF they have both successful and failed batches. * Fix test_manifest unit tests to know about model 'batches' key * Add some console output assertions to microbatch functional tests * add batch_results: None to expected_run_results * Add changie doc for microbatch retry functionality * maintain protoc version 5.26.1 * Cleanup extraneous comment in LogModelResult --------- Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>	2024-09-26 08:45:47 -05:00
Gerda Shank	db694731c9	Allow configuration of snapshot column names (#10608 )	2024-09-20 19:31:05 -04:00
Courtney Holcomb	c6b8f7e595	Add custom granularities to YAML spec (#10664 ) * Add custom granularities to YAML spec * Changelog * Add tests * Remove unneeded duplicate classes	2024-09-17 13:02:56 -05:00
Michelle Ark	cc8541c05f	Microbatch: event_time ref + source filtering (#10594 )	2024-09-12 18:16:04 -04:00
aliceliu	3c55806203	Fix state:modified check for exports (#10565 )	2024-08-23 15:22:38 -04:00
Courtney Holcomb	0a160fc27a	Support time spine configs for sub-daily granularity (#10483 )	2024-07-29 13:39:39 -04:00
Courtney Holcomb	f9c2b9398f	Remove newlines from JSON schema files (#10486 )	2024-07-26 13:36:13 -04:00
Courtney Holcomb	4c7d922a6d	Add `Metric.time_granularity` to metric spec (#10378 )	2024-07-16 13:35:20 -04:00
Courtney Holcomb	a94027acea	Add `CumulativeTypeParams` & sub-daily granularities to semantic manifest (#10350 )	2024-07-01 14:54:23 -07:00
dave-connors-3	8fe7d652ab	Add `primary_key` to manifest (#10096 )	2024-05-10 15:14:20 -04:00
Gerda Shank	b5a0c4c228	Unit testing feature branch pull request (#8411 ) * Initial implementation of unit testing (from pr #2911) Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com> * 8295 unit testing artifacts (#8477) * unit test config: tags & meta (#8565) * Add additional functional test for unit testing selection, artifacts, etc (#8639) * Enable inline csv format in unit testing (#8743) * Support unit testing incremental models (#8891) * update unit test key: unit -> unit-tests (#8988) * convert to use unit test name at top level key (#8966) * csv file fixtures (#9044) * Unit test support for `state:modified` and `--defer` (#9032) Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com> * Allow use of sources as unit testing inputs (#9059) * Use daff for diff formatting in unit testing (#8984) * Fix #8652: Use seed file from disk for unit testing if rows not specified in YAML config (#9064) Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com> Fix #8652: Use seed value if rows not specified * Move unit testing to test and build commands (#9108) * Enable unit testing in non-root packages (#9184) * convert test to data_test (#9201) * Make fixtures files full-fledged members of manifest and enable partial parsing (#9225) * In build command run unit tests before models (#9273) --------- Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com> Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com> Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com> Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com> Co-authored-by: Kshitij Aranke <kshitij.aranke@dbtlabs.com>	2024-01-16 17:37:43 -05:00
William Deng	1740df534b	Update parser to support conversion metrics (#9173 ) * added ConversionTypeParams classes * updated parser for ConversionTypeParams * added step to populate input_measure for conversion metrics * version bump on DSI * comment back manifest generating line * updated v12 schemas * added tests * added changelog	2023-12-07 10:09:20 -08:00
Quigley Malcolm	0ab954e1af	Fix ensuring we produce valid jsonschema artifacts for manifest, catalog, sources, and run-results (#9155 ) * Drop `all_refs=True` from jsonschema-ization build process Passing `all_refs=True` makes it so that Everything is a ref, even the top level schema. In jsonschema land, this essentially makes the produced artifact not a full schema, but a fractal object to be included in a schema. Thus when `$id` is passed in, jsonschema tools blow up because `$id` is for identifying a schema, which we explicitly weren't creating. The alternative was to drop the inclusion of `$id`. Howver, we're intending to create a schema, and having an `$id` is recommended best practice. Additionally since we were intending to create a schema, not a fractal, it seemed best to create to full schema. * Explicity produce jsonschemas using DRAFT_2020_12 dialect Previously were were implicitly using the `DRAFT_2020_12` dialect through mashumaro. It felt wise to begin explicitly specifying this. First, it is closest in available mashumaro provided dialects to what we produced pre 1.7. Secondly, if mashumaro changes its default for whatever reason (say a new dialect is added, and mashumaro moves to that), we don't want to automatically inherit that. * Bump manifest version to v12 Core 1.7 released with manifest v11, and we don't want to be overriding that with 1.8. It'd be weird for 1.7 and 1.8 to both have v11 manifests, but for them to be different, right? * Begin including schema dialect specification in produced jsonschema In jsonschema's documentation they state > It's not always easy to tell which draft a JSON Schema is using. > You can use the $schema keyword to declare which version of the JSON Schema specification the schema is written to. > It's generally good practice to include it, though it is not required. and > For brevity, the $schema keyword isn't included in most of the examples in this book, but it should always be used in the real world. Basically, to know how to parse a schema, it's important to include what schema dialect is being used for the schema specification. The change in this commit ensures we include that information. * Create manifest v12 jsonschema specification * Add change documentation for jsonschema schema production fix * Bump run-results version to v6 * Generate new v6 run-results jsonschema * Regenerate catalog v1 and sources v3 with fixed jsonschema production * Update tests to handle bumped versions of manifest and run-results	2023-12-05 17:36:43 -08:00
Quigley Malcolm	ac972948b8	DSI 0.4.0 and Saved Query Exports (#8950 )	2023-10-31 18:34:41 -07:00
Quigley Malcolm	af0cbcb6a5	Add SavedQuery nodes (#8798 ) * Bump to dbt-semantic-interfaces 0.3.0b1 * Update import path of `WhereFilterParser` from `dbt-semantic-interfaces` In 0.3.x of `dbt-semantic-intefaces` the location of the WhereFilterParser moved to be grouped in with a bunch of new adjacent code. As such, we needed to correct our import path of it. * Create basic `SavedQuery` node type based on `SavedQuery` protocol from DSI * Add ability to add SavedQueries to the manifest * Define unparsed SavedQuery node * Begin parsing saved_query objects to manifest * Skip jinja rendering of `SavedQuery.where` property * Begin propagating `SavedQueries` on the manifest to the semantic manifest * Add tests for basic saved query parsing * Add custom pluralization handling of SavedQuery node type * Add a config subclass to SavedQuery node * Move the SavedQuery node to nodes.py Unfortunately things are a bit too intertwined currently for SavedQuery to be in it's own file. We need to add the SavedQuery node to the GraphMemberNode, unfortunately with SavedQuery in it's own file, importing it would have caused a circular dependency. We'll need to separately come in and split things up as a cleanup portion of work. * Add basic plumbing of saved query configs to projects * Add basic lookup utility for saved queries, SavedQueryLookup * Handle disabled SavedQuery nodes in parsing and lookups * Add SavedQuery nodes to grouping process Our grouping logic seems to be in a weird spot. It seems liek we're moving to setting the `group` for a node in the node's `config` however, all of the logic around grouping is still focused on the top level `group` property on a nodes. To get group stuff plumbed I've thus added `group` as a top level property of the `SavedQuery` node, and populated it from the config group value. * Plumb through saved query in a lot more places I don't like making scatter shot commits like this. However, a lot of this commit was written ~4am, soooo yea. Things were broken, I wanted things to be unbroken. I mostly searched for `semantic_models` and added the equivalent necessary `saved_queries`. Some stuff is in support of writing out the manifest, some stuff helps with node selection, it's a lot of miscelaneous stuff that I don't fully understand. * Add `depends_on` to `SavedQuery` nodes and populate from `metrics` property * Add partial parsing support to SavedQuery nodes * Add `docs` support for SavedQuery descriptions * Support selctor methods for SavedQuery nodes * Add `refs` property to SavedQuery node We don't actually append anything to `refs` for SavedQuery nodes currently. I'm not sure if anything needs to be appended to them. Regardless, we access the `refs` property throughout the codebase while iterating over nodes. It seems wise to support this attribute as to not accidently blow something up with it not existing. * Support `saved_queries` when upgrading from manifests <= v10 (and regenerate v11) * Add changie doc for saved query node support * Pin to dbt-semantic-interfaces 0.3.0b1 for saved query work We're gonna release DSI 0.3.0, and if this PR automatically pulls that in things will break. But the things that need fixing should be handled separately from this PR. After releasing DSI 0.3.0 I'm going to create a branch off/ontop of this one, and open a stacked PR with the associated changes. * Bump supported DSI version to 0.3.x * Switch metric filters and saved query where to use ne WhereFilterIntersection * Update schema yaml readers to create WhereFilterInterfaces * Expand metric filters and saved query where property to handle both str and list of strs * Update tests which were broken by where filter changes * Regeneate v11 manifest * Fixup: Update `SavedQueryLookup.perform_lookup` to operate on saved queries I missed this when I was copy and pasting 🤦	2023-10-11 15:54:11 -07:00
Emily Rockman	bf10a29f06	update v10 manifest on main (#8834 ) * update manifest * add changelog	2023-10-11 14:52:01 -05:00
Gerda Shank	4391dc1a63	Type aliasing for model contract column data_type (#8592 )	2023-10-10 11:43:26 -04:00
Emily Rockman	c6ff3abecd	remove top level meta attribute (#8766 )	2023-10-04 13:23:09 -05:00
Emily Rockman	eac13e3bd3	Add meta to SemanticModels (#8754 ) * WIP * changelog	2023-10-03 13:08:37 -05:00
Emily Rockman	46ee3f3d9c	rebuild manifest missed fields (#8755 ) * rebuild manifest missed fields * changelogs	2023-10-02 09:38:50 -05:00
Gerda Shank	f5baeeea1c	Allow setting access in config in addition to properties (#8635 )	2023-09-14 11:09:41 -04:00

1 2

84 Commits