84 Commits

Author SHA1 Message Date
Quigley Malcolm
15722264aa Correct Function Node Property Names (#12065)
* Fix function node property names

`return_type` -> `returns`
`return_type.type` -> `returns.data_type`
`arguments[x].type` -> `arguments[x].data_type`

* Add changie doc
2025-10-02 13:46:57 -05:00
Quigley Malcolm
8c929c337e Add type property to function nodes (#12057)
* Add `FunctionType` enum

* Add `type` property to `Function` resource

* Add `type` property to `ParsedFunctionPatch` and `UnparsedFunctionUpdate`

* Begin populating a function's `type` during patch parsing

* Regnerate v12 manifest to include function `type` property

* Add changie doc

* Begin testing that function node `type` property is setable and accessible

* Move comment about triggering the PathEncoder back to its proper place
2025-09-26 15:29:08 -05:00
Quigley Malcolm
538de17f78 Initial Implementation of UDFs (#12054)
* Allow for the defining of basic SQL UDFs (#11957)

* Add initial definiton of the `Function` resource

* Add FunctionNode definition to graph contracts

* Add test which checks whether basic UDFs can be parsed

This test fails right now, which is intentional. This is test driven
development. Now I do work to maket the test pass :)

* Add basic function sql parser for UDFs, and plumb it through parsing code paths

* Begin populating `functions` in the ref lookup

* Begin patching `function` nodes with their yaml definitions

Of note, presently `arguments` and `return_type` aren't populating properly.
It's likely that we'll have to do additional work to the FunctionPatchParser
to get this _fully_ working.

* Increase responsibility of FunctionPatchParser to handle entire `parse_patch` of function nodes

* Fix testing suite to accomodate addition of new `function` node

* Add changie doc for new `function` node type

* Minor refactoring of `NodePatchParser.parse_patch` to reduce code duplication in `FunctionPatchParser`

* Ability to list and select function nodes (#11968)

* Begin listing `function` nodes in `list` command

* Add ability to run `list` specifying the `function` resource type

* Function nodes are support selection via: name, file path, and resource type

* Add changie doc

* Core handles lifecycle of function nodes (#12008)

* Add basic test to check that UDFs get created in data warehouse

* Add functions to the runner map of \ operation

* Add basic stub of `FunctionRunner` modeled after `SeedRunner`

* Begin using `FunctionRunner` for running `function` nodes

* Add stubbing of things to implement on `FunctionRunner`

* Initial implementation of execution of function nodes

This is largely a copy of the execution of model nodes (in run.py) but
with some abstractions into helper methods to make the body of the
`execute` function easier to follow. Of note, right now this appears to
be getting the incorrect macro from the adapter. This is likely because
for some reason the node's materialization config is being set to `view`
by default.

* Ensure parsed function nodes get the correct materialization type

* Begin generating context for `function` materialization macro

* Stub out adapter response in node result as it was causing some failures

* Correct the adapter response in the run result for functions

* Begin logging `LogFunctionResult` event for completed function nodes

* Add changie doc

* Temp update dev reqs to point at branch of dbt-adapters

* Add test `LogFunctionResult` event to serialization test

* Add `function` nodes to the `WritableManifest`

* Fix tests

* Remove no longer relevant `TODO`s from `function.py`

* Add a new macro `function()` to the jinja context for using functions (#12031)

* Update function tests to look for `functions` under `manifest.functions`

* Begin storing funciton nodes in `Manifest.functions` instead of `Manifest.nodes`

* Ensure function nodes are still included in nodes to run during `build`

* Add ability to lookup functions on the manifest

* Update patch parsing of function YAML files now that functions live on `Manifest.functions`

* Mark function nodes as no longer refable

* Ensure function nodes are still selectable

* Add `function` macro!

* Ensure functions nodes are correctly linked in the DAG

* Update jinja context tests to expect `function` macro to exist

* Fix unit tests in test suite to expect function nodes

* Add changie doc

* regen v12.json jsonschema

* Fix test `TestVerifyArtifacts::test_run_and_generate`

* Fix test `TestVerifyArtifactsReferences::test_references`

* Fix test `TestVerifyArtifactsVersions::test_versions`

* Regen manifest artifact for `TestPreviousVersionState::test_compare_state_current`

* Update `_iterate_selected_nodes` to support function nodes

* Ensure we process node functions to ensure they get added to the `depends_on`

* Take functions into account for state modified

* Regen data for `TestModifiedStateSchemaEvolution::test_modified_state_schema_evolution` test

* Default `functions` property on `WritableManifest` to a dict

I'm not sure if this is actually how we want to do this. However, without
doing this the `WritableManifest` will break on loading of older manifests
that don't have `functions`. The alternative to this would be to bump
the schema version (v12 -> v13) and create an upgrade in `upgrade_manifest.py`.

* Update UDF tests to use a more general purpose function

* Add tests ensuring UDFs can be used in models and `--inline` queries

* Correct `ParseFunctionResolver` so that the name isn't added twice to the function args spec

* Drop `functions` from `Exposure` and `Metric` definitions

* Regen v12 manifest schema

* Remove unnecessary string interpolation

* Point dev reqs back to dbt-adapters@main

* Empty commit
2025-09-26 13:41:45 -05:00
Michelle Ark
2f842055f0 Add run_started_at to manifest.json metadata (#12047) 2025-09-25 11:56:12 -04:00
Michelle Ark
972eb23d03 add config to columns (#11671) 2025-05-26 21:06:09 -04:00
Ani Venkateshwaran
0db83d0abd adding quoting to manifest metadata (#11666) 2025-05-23 13:51:19 -07:00
Michelle Ark
4a8f9c181c Support config.meta and description on groups + add to happy path testing (#11649) 2025-05-22 20:09:05 -04:00
Gerda Shank
e60b41d9fa Add invocation_started_at (#11291) 2025-02-18 11:32:04 -05:00
Kshitij Aranke
f29836fcf3 Round 2: Add doc_blocks to manifest for nodes and columns (#11294)
* Reapply "Add `doc_blocks` to manifest for nodes and columns (#11224)" (#11283)

This reverts commit 55e0df181f.

* Expand doc_blocks backcompat test

* Refactor to method, add docstring
2025-02-11 16:01:16 +00:00
Kshitij Aranke
55e0df181f Revert "Add doc_blocks to manifest for nodes and columns (#11224)" (#11283)
This reverts commit d71f309c1e.
2025-02-07 17:12:06 +00:00
Chenyu Li
c0423707b0 loosen validation for freshness (#11253) 2025-01-28 14:20:36 -08:00
Kshitij Aranke
d71f309c1e Add doc_blocks to manifest for nodes and columns (#11224) 2025-01-27 19:49:02 +00:00
Chenyu Li
a8702b8374 add model freshness for adaptive job (#11170) 2025-01-07 10:02:52 -08:00
Chenyu Li
f2222d2621 Custom SQL for get source maxLoadedAt (#11163) 2024-12-19 11:49:07 -08:00
Patrick Yost
97ffc37405 Add tags to SavedQueries (#10987) 2024-12-19 10:18:50 -08:00
Chenyu Li
4b1f1c4029 add allow additional property for Model and SourceDefinition (#11138) 2024-12-15 23:30:48 -08:00
William Deng
ff6745c795 Update core to support DSI 0.8.3 (#10990)
Co-authored-by: Courtney Holcomb <courtneyeholcomb@gmail.com>
2024-12-05 09:48:33 -08:00
Peter Webb
ad575ec699 Add New Config Properties and Schema for Snapshot Hard Deletes (#10972)
* Add changelog entry.

* Update schemas and test fixtures for new snapshot meta-column

* Add back comment.
2024-11-21 18:15:30 -05:00
Michelle Ark
fd6ec71dab Microbatch parallelism (#10958) 2024-11-21 00:31:47 -05:00
Tim Sturge
81067d4fc4 Support disabling unit tests (#10831) 2024-11-06 15:20:35 -05:00
Quigley Malcolm
d07bfda9df Change microbatch lookback default from 0 to 1 (#10876)
* Change `lookback` default from `0` to `1`

* Regen jsonschema manifest v12 to include `lookback` default change

* Regen saved state of v12 manifest for functional artifact testing

* Add changie doc for lookback default change
2024-10-24 17:16:32 -05:00
Gerda Shank
f7b7935a97 Support multiple unique keys in snapshots (#10795) 2024-10-22 14:47:51 -04:00
Peter Webb
3d96b4e36c Loosen Type in TimingInfo (#10897) 2024-10-21 19:01:15 -04:00
Kshitij Aranke
ba6c7baf1d [Tidy-First]: Fix timings object for hooks and macros, and make types of timings explicit (#10882)
* [Tidy-First]: Fix `timings` object for hooks and macros, and make types of timings explicit

* cast literal to str

* change test

* change jsonschema to enum

* Discard changes to schemas/dbt/manifest/v12.json

* nits

---------

Co-authored-by: Chenyu Li <chenyu.li@dbtlabs.com>
2024-10-18 17:28:58 -04:00
Paul Yang
8be063502b Add order_by and limit fields to saved queries (#10532)
* Add `order_by` and `limit` fields to saved queries.

* Update JSON schema

* Add change log for #10531.

* Check order by / limit in saved-query parsing test.
2024-10-17 10:54:30 -07:00
Gerda Shank
c7d8693f70 Enable setting datetime value for dbt_valid_to when the record is current (#10780) 2024-10-10 18:41:03 -04:00
Kshitij Aranke
6b9c1da1ae Revert "state:modified vars, behind "state_modified_compare_vars" behaviour flag" (#10793) (#10813) 2024-10-02 21:00:48 +01:00
Michelle Ark
a86e2b4ffc [state:modified] store unrendered_database and unrendered_schema on source definition for state:modified comparisons (#10675) 2024-09-30 17:50:33 +02:00
Michelle Ark
d1857b39ca state:modified vars, behind "state_modified_compare_vars" behaviour flag (#10793) 2024-09-30 16:32:37 +02:00
Quigley Malcolm
1fd4d2eae6 Enable retry support for Microbatch models (#10751)
* Add `PartialSuccess` status type and use it for microbatch models with mixed results

* Handle `PartialSuccess` in `interpret_run_result`

* Add `BatchResults` object to `BaseResult` and begin tracking during microbatch runs

* Ensure batch_results being propagated to `run_results` artifact

* Move `batch_results` from `BaseResult` class to `RunResult` class

* Move `BatchResults` and `BatchType` to separate arifacts file to avoid circular imports

In our next commit we're gonna modify `dbt/contracts/graph/nodes.py` to import the
`BatchType` as part of our work to implement dbt retry for microbatch model nodes.
Unfortunately, the import in `nodes.py` creates a circular dependency because
`dbt/artifacts/schemas/results.py` imports from `nodes.py` and `dbt/artifacts/schemas/run/v5/run.py`
imports from that `results.py`. Thus the new import creates a circular import. Now this
_shouldn't_ be necessary as nothing in artifacts should import from the rest of dbt-core.
However, we do. We should fix this, but this is also out of scope for this segement of work.

* Add `PartialSuccess` as a retry-able status, and use batches to retry microbatch models

* Fix BatchType type so that the first datetime is no longer Optional

* Ensure `PartialSuccess` causes skipping of downstream nodes

* Alter `PartialSuccess` status to be considered an error in `interpret_run_result`

* Update schemas and test artifacts to include new batch_results run results key

* Add functional test to check that 'dbt retry' retries 'PartialSuccess' models

* Update partition failure test to assert downstream models are skipped

* Improve `success`/`error`/`partial success` messaging for microbatch models

* Include `PartialSuccess` in status that `--fail-fast` counts as a failure

* Update `LogModelResult` to handle partial successes

* Update `EndOfRunSummary` to handle partial successes

* Cleanup TODO comment

* Raise a DbtInternalError if we get a batch run result without `batch_results`

* When running a microbatch model with supplied batches, force non full-refresh behavior

This is necessary because of retry. Say on the initial run the microbatch model
succeeds on 97% of it's batches. Then on retry it does the last 3%. If the retry
of the microbatch model executes in full refresh mode it _might_ blow away the
97% of work that has been done. This edge case seems to be adapter specific.

* Only pass batches to retry for microbatch model when there was a PartialSuccess

In the previous commit we made it so that retries of microbatch models wouldn't
run in full refresh mode when the microbatch model to retry has batches already
specified from the prior run. This is only problematic when the run being retried
was a full refresh AND all the batches for a given microbatch model failed. In
that case WE DO want to do a full refresh for the given microbatch model. To better
outline the problem, consider the following:

* a microbatch model had a begin of `2020-01-01` and has been running this way for awhile
* the begin config has changed to `2024-01-01` and  dbt run --full-refresh gets run
* every batch for an microbatch model fails
* on dbt retry the the relation is said to exist, and the now out of range data (2020-01-01 through 2023-12-31) is never purged

To avoid this, all we have to do is ONLY pass the batch information for partially successful microbatch
models. Note: microbatch models only have a partially successful status IFF they have both
successful and failed batches.

* Fix test_manifest unit tests to know about model 'batches' key

* Add some console output assertions to microbatch functional tests

* add batch_results: None to expected_run_results

* Add changie doc for microbatch retry functionality

* maintain protoc version 5.26.1

* Cleanup extraneous comment in LogModelResult

---------

Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>
2024-09-26 08:45:47 -05:00
Gerda Shank
db694731c9 Allow configuration of snapshot column names (#10608) 2024-09-20 19:31:05 -04:00
Courtney Holcomb
c6b8f7e595 Add custom granularities to YAML spec (#10664)
* Add custom granularities to YAML spec

* Changelog

* Add tests

* Remove unneeded duplicate classes
2024-09-17 13:02:56 -05:00
Michelle Ark
cc8541c05f Microbatch: event_time ref + source filtering (#10594) 2024-09-12 18:16:04 -04:00
aliceliu
3c55806203 Fix state:modified check for exports (#10565) 2024-08-23 15:22:38 -04:00
Courtney Holcomb
0a160fc27a Support time spine configs for sub-daily granularity (#10483) 2024-07-29 13:39:39 -04:00
Courtney Holcomb
f9c2b9398f Remove newlines from JSON schema files (#10486) 2024-07-26 13:36:13 -04:00
Courtney Holcomb
4c7d922a6d Add Metric.time_granularity to metric spec (#10378) 2024-07-16 13:35:20 -04:00
Courtney Holcomb
a94027acea Add CumulativeTypeParams & sub-daily granularities to semantic manifest (#10350) 2024-07-01 14:54:23 -07:00
dave-connors-3
8fe7d652ab Add primary_key to manifest (#10096) 2024-05-10 15:14:20 -04:00
Gerda Shank
b5a0c4c228 Unit testing feature branch pull request (#8411)
* Initial implementation of unit testing (from pr #2911)

Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>

* 8295 unit testing artifacts (#8477)

* unit test config: tags & meta (#8565)

* Add additional functional test for unit testing selection, artifacts, etc (#8639)

* Enable inline csv format in unit testing (#8743)

* Support unit testing incremental models (#8891)

* update unit test key: unit -> unit-tests (#8988)


* convert to use unit test name at top level key (#8966)

* csv file fixtures (#9044)

* Unit test support for `state:modified` and `--defer` (#9032)

Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>

* Allow use of sources as unit testing inputs (#9059)

* Use daff for diff formatting in unit testing (#8984)

* Fix #8652: Use seed file from disk for unit testing if rows not specified in YAML config (#9064)

Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com>
Fix #8652: Use seed value if rows not specified

* Move unit testing to test and build commands (#9108)

* Enable unit testing in non-root packages (#9184)

* convert test to data_test (#9201)

* Make fixtures files full-fledged members of manifest and enable partial parsing (#9225)

* In build command run unit tests before models (#9273)

---------

Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>
Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com>
Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com>
Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
Co-authored-by: Kshitij Aranke <kshitij.aranke@dbtlabs.com>
2024-01-16 17:37:43 -05:00
William Deng
1740df534b Update parser to support conversion metrics (#9173)
* added ConversionTypeParams classes

* updated parser for ConversionTypeParams

* added step to populate input_measure for conversion metrics

* version bump on DSI

* comment back manifest generating line

* updated v12 schemas

* added tests

* added changelog
2023-12-07 10:09:20 -08:00
Quigley Malcolm
0ab954e1af Fix ensuring we produce valid jsonschema artifacts for manifest, catalog, sources, and run-results (#9155)
* Drop `all_refs=True` from jsonschema-ization build process

Passing `all_refs=True` makes it so that Everything is a ref, even
the top level schema. In jsonschema land, this essentially makes the
produced artifact not a full schema, but a fractal object to be included
in a schema. Thus when `$id` is passed in, jsonschema tools blow up
because `$id` is for identifying a schema, which we explicitly weren't
creating. The alternative was to drop the inclusion of `$id`. Howver, we're
intending to create a schema, and having an `$id` is recommended best
practice. Additionally since we were intending to create a schema,
not a fractal, it seemed best to create to full schema.

* Explicity produce jsonschemas using DRAFT_2020_12 dialect

Previously were were implicitly using the `DRAFT_2020_12` dialect through
mashumaro. It felt wise to begin explicitly specifying this. First, it
is closest in available mashumaro provided dialects to what we produced
pre 1.7. Secondly, if mashumaro changes its default for whatever reason
(say a new dialect is added, and mashumaro moves to that), we don't want
to automatically inherit that.

* Bump manifest version to v12

Core 1.7 released with manifest v11, and we don't want to be overriding
that with 1.8. It'd be weird for 1.7 and 1.8 to both have v11 manifests,
but for them to be different, right?

* Begin including schema dialect specification in produced jsonschema

In jsonschema's documentation they state
> It's not always easy to tell which draft a JSON Schema is using.
> You can use the $schema keyword to declare which version of the JSON Schema specification the schema is written to.
> It's generally good practice to include it, though it is not required.

and

> For brevity, the $schema keyword isn't included in most of the examples in this book, but it should always be used in the real world.

Basically, to know how to parse a schema, it's important to include what
schema dialect is being used for the schema specification. The change in
this commit ensures we include that information.

* Create manifest v12 jsonschema specification

* Add change documentation for jsonschema schema production fix

* Bump run-results version to v6

* Generate new v6 run-results jsonschema

* Regenerate catalog v1 and sources v3 with fixed jsonschema production

* Update tests to handle bumped versions of manifest and run-results
2023-12-05 17:36:43 -08:00
Quigley Malcolm
ac972948b8 DSI 0.4.0 and Saved Query Exports (#8950) 2023-10-31 18:34:41 -07:00
Quigley Malcolm
af0cbcb6a5 Add SavedQuery nodes (#8798)
* Bump to dbt-semantic-interfaces 0.3.0b1

* Update import path of `WhereFilterParser` from `dbt-semantic-interfaces`

In 0.3.x of `dbt-semantic-intefaces` the location of the WhereFilterParser
moved to be grouped in with a bunch of new adjacent code. As such,
we needed to correct our import path of it.

* Create basic `SavedQuery` node type based on `SavedQuery` protocol from DSI

* Add ability to add SavedQueries to the manifest

* Define unparsed SavedQuery node

* Begin parsing saved_query objects to manifest

* Skip jinja rendering of `SavedQuery.where` property

* Begin propagating `SavedQueries` on the manifest to the semantic manifest

* Add tests for basic saved query parsing

* Add custom pluralization handling of SavedQuery node type

* Add a config subclass to SavedQuery node

* Move the SavedQuery node to nodes.py

Unfortunately things are a bit too intertwined currently for SavedQuery
to be in it's own file. We need to add the SavedQuery node to the
GraphMemberNode, unfortunately with SavedQuery in it's own file,
importing it would have caused a circular dependency. We'll need
to separately come in and split things up as a cleanup portion of
work.

* Add basic plumbing of saved query configs to projects

* Add basic lookup utility for saved queries, SavedQueryLookup

* Handle disabled SavedQuery nodes in parsing and lookups

* Add SavedQuery nodes to grouping process

Our grouping logic seems to be in a weird spot. It seems liek we're
moving to setting the `group` for a node in the node's `config` however,
all of the logic around grouping is still focused on the top level `group`
property on a nodes. To get group stuff plumbed I've thus added `group`
as a top level property of the `SavedQuery` node, and populated it from
the config group value.

* Plumb through saved query in a lot more places

I don't like making scatter shot commits like this. However, a lot
of this commit was written ~4am, soooo yea. Things were broken, I wanted
things to be unbroken. I mostly searched for `semantic_models` and added
the equivalent necessary `saved_queries`. Some stuff is in support of
writing out the manifest, some stuff helps with node selection, it's a
lot of miscelaneous stuff that I don't fully understand.

* Add `depends_on` to `SavedQuery` nodes and populate from `metrics` property

* Add partial parsing support to SavedQuery nodes

* Add `docs` support for SavedQuery descriptions

* Support selctor methods for SavedQuery nodes

* Add `refs` property to SavedQuery node

We don't actually append anything to `refs` for SavedQuery nodes currently.
I'm not sure if anything needs to be appended to them. Regardless, we
access the `refs` property throughout the codebase while iterating over
nodes. It seems wise to support this attribute as to not accidently blow
something up with it not existing.

* Support `saved_queries` when upgrading from manifests <= v10 (and regenerate v11)

* Add changie doc for saved query node support

* Pin to dbt-semantic-interfaces 0.3.0b1 for saved query work

We're gonna release DSI 0.3.0, and if this PR automatically pulls that
in things will break. But the things that need fixing should be handled
separately from this PR. After releasing DSI 0.3.0 I'm going to create
a branch off/ontop of this one, and open a stacked PR with the associated
changes.

* Bump supported DSI version to 0.3.x

* Switch metric filters and saved query where to use ne WhereFilterIntersection

* Update schema yaml readers to create WhereFilterInterfaces

* Expand metric filters and saved query where property to handle both str and list of strs

* Update tests which were broken by where filter changes

* Regeneate v11 manifest

* Fixup: Update `SavedQueryLookup.perform_lookup` to operate on saved queries

I missed this when I was copy and pasting 🤦
2023-10-11 15:54:11 -07:00
Emily Rockman
bf10a29f06 update v10 manifest on main (#8834)
* update manifest

* add changelog
2023-10-11 14:52:01 -05:00
Gerda Shank
4391dc1a63 Type aliasing for model contract column data_type (#8592) 2023-10-10 11:43:26 -04:00
Emily Rockman
c6ff3abecd remove top level meta attribute (#8766) 2023-10-04 13:23:09 -05:00
Emily Rockman
eac13e3bd3 Add meta to SemanticModels (#8754)
* WIP

* changelog
2023-10-03 13:08:37 -05:00
Emily Rockman
46ee3f3d9c rebuild manifest missed fields (#8755)
* rebuild manifest missed fields

* changelogs
2023-10-02 09:38:50 -05:00
Gerda Shank
f5baeeea1c Allow setting access in config in addition to properties (#8635) 2023-09-14 11:09:41 -04:00