* Add `FunctionType` enum
* Add `type` property to `Function` resource
* Add `type` property to `ParsedFunctionPatch` and `UnparsedFunctionUpdate`
* Begin populating a function's `type` during patch parsing
* Regnerate v12 manifest to include function `type` property
* Add changie doc
* Begin testing that function node `type` property is setable and accessible
* Move comment about triggering the PathEncoder back to its proper place
* Allow for the defining of basic SQL UDFs (#11957)
* Add initial definiton of the `Function` resource
* Add FunctionNode definition to graph contracts
* Add test which checks whether basic UDFs can be parsed
This test fails right now, which is intentional. This is test driven
development. Now I do work to maket the test pass :)
* Add basic function sql parser for UDFs, and plumb it through parsing code paths
* Begin populating `functions` in the ref lookup
* Begin patching `function` nodes with their yaml definitions
Of note, presently `arguments` and `return_type` aren't populating properly.
It's likely that we'll have to do additional work to the FunctionPatchParser
to get this _fully_ working.
* Increase responsibility of FunctionPatchParser to handle entire `parse_patch` of function nodes
* Fix testing suite to accomodate addition of new `function` node
* Add changie doc for new `function` node type
* Minor refactoring of `NodePatchParser.parse_patch` to reduce code duplication in `FunctionPatchParser`
* Ability to list and select function nodes (#11968)
* Begin listing `function` nodes in `list` command
* Add ability to run `list` specifying the `function` resource type
* Function nodes are support selection via: name, file path, and resource type
* Add changie doc
* Core handles lifecycle of function nodes (#12008)
* Add basic test to check that UDFs get created in data warehouse
* Add functions to the runner map of \ operation
* Add basic stub of `FunctionRunner` modeled after `SeedRunner`
* Begin using `FunctionRunner` for running `function` nodes
* Add stubbing of things to implement on `FunctionRunner`
* Initial implementation of execution of function nodes
This is largely a copy of the execution of model nodes (in run.py) but
with some abstractions into helper methods to make the body of the
`execute` function easier to follow. Of note, right now this appears to
be getting the incorrect macro from the adapter. This is likely because
for some reason the node's materialization config is being set to `view`
by default.
* Ensure parsed function nodes get the correct materialization type
* Begin generating context for `function` materialization macro
* Stub out adapter response in node result as it was causing some failures
* Correct the adapter response in the run result for functions
* Begin logging `LogFunctionResult` event for completed function nodes
* Add changie doc
* Temp update dev reqs to point at branch of dbt-adapters
* Add test `LogFunctionResult` event to serialization test
* Add `function` nodes to the `WritableManifest`
* Fix tests
* Remove no longer relevant `TODO`s from `function.py`
* Add a new macro `function()` to the jinja context for using functions (#12031)
* Update function tests to look for `functions` under `manifest.functions`
* Begin storing funciton nodes in `Manifest.functions` instead of `Manifest.nodes`
* Ensure function nodes are still included in nodes to run during `build`
* Add ability to lookup functions on the manifest
* Update patch parsing of function YAML files now that functions live on `Manifest.functions`
* Mark function nodes as no longer refable
* Ensure function nodes are still selectable
* Add `function` macro!
* Ensure functions nodes are correctly linked in the DAG
* Update jinja context tests to expect `function` macro to exist
* Fix unit tests in test suite to expect function nodes
* Add changie doc
* regen v12.json jsonschema
* Fix test `TestVerifyArtifacts::test_run_and_generate`
* Fix test `TestVerifyArtifactsReferences::test_references`
* Fix test `TestVerifyArtifactsVersions::test_versions`
* Regen manifest artifact for `TestPreviousVersionState::test_compare_state_current`
* Update `_iterate_selected_nodes` to support function nodes
* Ensure we process node functions to ensure they get added to the `depends_on`
* Take functions into account for state modified
* Regen data for `TestModifiedStateSchemaEvolution::test_modified_state_schema_evolution` test
* Default `functions` property on `WritableManifest` to a dict
I'm not sure if this is actually how we want to do this. However, without
doing this the `WritableManifest` will break on loading of older manifests
that don't have `functions`. The alternative to this would be to bump
the schema version (v12 -> v13) and create an upgrade in `upgrade_manifest.py`.
* Update UDF tests to use a more general purpose function
* Add tests ensuring UDFs can be used in models and `--inline` queries
* Correct `ParseFunctionResolver` so that the name isn't added twice to the function args spec
* Drop `functions` from `Exposure` and `Metric` definitions
* Regen v12 manifest schema
* Remove unnecessary string interpolation
* Point dev reqs back to dbt-adapters@main
* Empty commit
* Reapply "Add `doc_blocks` to manifest for nodes and columns (#11224)" (#11283)
This reverts commit 55e0df181f.
* Expand doc_blocks backcompat test
* Refactor to method, add docstring
* Change `lookback` default from `0` to `1`
* Regen jsonschema manifest v12 to include `lookback` default change
* Regen saved state of v12 manifest for functional artifact testing
* Add changie doc for lookback default change
* [Tidy-First]: Fix `timings` object for hooks and macros, and make types of timings explicit
* cast literal to str
* change test
* change jsonschema to enum
* Discard changes to schemas/dbt/manifest/v12.json
* nits
---------
Co-authored-by: Chenyu Li <chenyu.li@dbtlabs.com>
* Add `order_by` and `limit` fields to saved queries.
* Update JSON schema
* Add change log for #10531.
* Check order by / limit in saved-query parsing test.
* Add `PartialSuccess` status type and use it for microbatch models with mixed results
* Handle `PartialSuccess` in `interpret_run_result`
* Add `BatchResults` object to `BaseResult` and begin tracking during microbatch runs
* Ensure batch_results being propagated to `run_results` artifact
* Move `batch_results` from `BaseResult` class to `RunResult` class
* Move `BatchResults` and `BatchType` to separate arifacts file to avoid circular imports
In our next commit we're gonna modify `dbt/contracts/graph/nodes.py` to import the
`BatchType` as part of our work to implement dbt retry for microbatch model nodes.
Unfortunately, the import in `nodes.py` creates a circular dependency because
`dbt/artifacts/schemas/results.py` imports from `nodes.py` and `dbt/artifacts/schemas/run/v5/run.py`
imports from that `results.py`. Thus the new import creates a circular import. Now this
_shouldn't_ be necessary as nothing in artifacts should import from the rest of dbt-core.
However, we do. We should fix this, but this is also out of scope for this segement of work.
* Add `PartialSuccess` as a retry-able status, and use batches to retry microbatch models
* Fix BatchType type so that the first datetime is no longer Optional
* Ensure `PartialSuccess` causes skipping of downstream nodes
* Alter `PartialSuccess` status to be considered an error in `interpret_run_result`
* Update schemas and test artifacts to include new batch_results run results key
* Add functional test to check that 'dbt retry' retries 'PartialSuccess' models
* Update partition failure test to assert downstream models are skipped
* Improve `success`/`error`/`partial success` messaging for microbatch models
* Include `PartialSuccess` in status that `--fail-fast` counts as a failure
* Update `LogModelResult` to handle partial successes
* Update `EndOfRunSummary` to handle partial successes
* Cleanup TODO comment
* Raise a DbtInternalError if we get a batch run result without `batch_results`
* When running a microbatch model with supplied batches, force non full-refresh behavior
This is necessary because of retry. Say on the initial run the microbatch model
succeeds on 97% of it's batches. Then on retry it does the last 3%. If the retry
of the microbatch model executes in full refresh mode it _might_ blow away the
97% of work that has been done. This edge case seems to be adapter specific.
* Only pass batches to retry for microbatch model when there was a PartialSuccess
In the previous commit we made it so that retries of microbatch models wouldn't
run in full refresh mode when the microbatch model to retry has batches already
specified from the prior run. This is only problematic when the run being retried
was a full refresh AND all the batches for a given microbatch model failed. In
that case WE DO want to do a full refresh for the given microbatch model. To better
outline the problem, consider the following:
* a microbatch model had a begin of `2020-01-01` and has been running this way for awhile
* the begin config has changed to `2024-01-01` and dbt run --full-refresh gets run
* every batch for an microbatch model fails
* on dbt retry the the relation is said to exist, and the now out of range data (2020-01-01 through 2023-12-31) is never purged
To avoid this, all we have to do is ONLY pass the batch information for partially successful microbatch
models. Note: microbatch models only have a partially successful status IFF they have both
successful and failed batches.
* Fix test_manifest unit tests to know about model 'batches' key
* Add some console output assertions to microbatch functional tests
* add batch_results: None to expected_run_results
* Add changie doc for microbatch retry functionality
* maintain protoc version 5.26.1
* Cleanup extraneous comment in LogModelResult
---------
Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>
* Initial implementation of unit testing (from pr #2911)
Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>
* 8295 unit testing artifacts (#8477)
* unit test config: tags & meta (#8565)
* Add additional functional test for unit testing selection, artifacts, etc (#8639)
* Enable inline csv format in unit testing (#8743)
* Support unit testing incremental models (#8891)
* update unit test key: unit -> unit-tests (#8988)
* convert to use unit test name at top level key (#8966)
* csv file fixtures (#9044)
* Unit test support for `state:modified` and `--defer` (#9032)
Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>
* Allow use of sources as unit testing inputs (#9059)
* Use daff for diff formatting in unit testing (#8984)
* Fix#8652: Use seed file from disk for unit testing if rows not specified in YAML config (#9064)
Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com>
Fix#8652: Use seed value if rows not specified
* Move unit testing to test and build commands (#9108)
* Enable unit testing in non-root packages (#9184)
* convert test to data_test (#9201)
* Make fixtures files full-fledged members of manifest and enable partial parsing (#9225)
* In build command run unit tests before models (#9273)
---------
Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>
Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com>
Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com>
Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
Co-authored-by: Kshitij Aranke <kshitij.aranke@dbtlabs.com>
* Drop `all_refs=True` from jsonschema-ization build process
Passing `all_refs=True` makes it so that Everything is a ref, even
the top level schema. In jsonschema land, this essentially makes the
produced artifact not a full schema, but a fractal object to be included
in a schema. Thus when `$id` is passed in, jsonschema tools blow up
because `$id` is for identifying a schema, which we explicitly weren't
creating. The alternative was to drop the inclusion of `$id`. Howver, we're
intending to create a schema, and having an `$id` is recommended best
practice. Additionally since we were intending to create a schema,
not a fractal, it seemed best to create to full schema.
* Explicity produce jsonschemas using DRAFT_2020_12 dialect
Previously were were implicitly using the `DRAFT_2020_12` dialect through
mashumaro. It felt wise to begin explicitly specifying this. First, it
is closest in available mashumaro provided dialects to what we produced
pre 1.7. Secondly, if mashumaro changes its default for whatever reason
(say a new dialect is added, and mashumaro moves to that), we don't want
to automatically inherit that.
* Bump manifest version to v12
Core 1.7 released with manifest v11, and we don't want to be overriding
that with 1.8. It'd be weird for 1.7 and 1.8 to both have v11 manifests,
but for them to be different, right?
* Begin including schema dialect specification in produced jsonschema
In jsonschema's documentation they state
> It's not always easy to tell which draft a JSON Schema is using.
> You can use the $schema keyword to declare which version of the JSON Schema specification the schema is written to.
> It's generally good practice to include it, though it is not required.
and
> For brevity, the $schema keyword isn't included in most of the examples in this book, but it should always be used in the real world.
Basically, to know how to parse a schema, it's important to include what
schema dialect is being used for the schema specification. The change in
this commit ensures we include that information.
* Create manifest v12 jsonschema specification
* Add change documentation for jsonschema schema production fix
* Bump run-results version to v6
* Generate new v6 run-results jsonschema
* Regenerate catalog v1 and sources v3 with fixed jsonschema production
* Update tests to handle bumped versions of manifest and run-results
* Bump to dbt-semantic-interfaces 0.3.0b1
* Update import path of `WhereFilterParser` from `dbt-semantic-interfaces`
In 0.3.x of `dbt-semantic-intefaces` the location of the WhereFilterParser
moved to be grouped in with a bunch of new adjacent code. As such,
we needed to correct our import path of it.
* Create basic `SavedQuery` node type based on `SavedQuery` protocol from DSI
* Add ability to add SavedQueries to the manifest
* Define unparsed SavedQuery node
* Begin parsing saved_query objects to manifest
* Skip jinja rendering of `SavedQuery.where` property
* Begin propagating `SavedQueries` on the manifest to the semantic manifest
* Add tests for basic saved query parsing
* Add custom pluralization handling of SavedQuery node type
* Add a config subclass to SavedQuery node
* Move the SavedQuery node to nodes.py
Unfortunately things are a bit too intertwined currently for SavedQuery
to be in it's own file. We need to add the SavedQuery node to the
GraphMemberNode, unfortunately with SavedQuery in it's own file,
importing it would have caused a circular dependency. We'll need
to separately come in and split things up as a cleanup portion of
work.
* Add basic plumbing of saved query configs to projects
* Add basic lookup utility for saved queries, SavedQueryLookup
* Handle disabled SavedQuery nodes in parsing and lookups
* Add SavedQuery nodes to grouping process
Our grouping logic seems to be in a weird spot. It seems liek we're
moving to setting the `group` for a node in the node's `config` however,
all of the logic around grouping is still focused on the top level `group`
property on a nodes. To get group stuff plumbed I've thus added `group`
as a top level property of the `SavedQuery` node, and populated it from
the config group value.
* Plumb through saved query in a lot more places
I don't like making scatter shot commits like this. However, a lot
of this commit was written ~4am, soooo yea. Things were broken, I wanted
things to be unbroken. I mostly searched for `semantic_models` and added
the equivalent necessary `saved_queries`. Some stuff is in support of
writing out the manifest, some stuff helps with node selection, it's a
lot of miscelaneous stuff that I don't fully understand.
* Add `depends_on` to `SavedQuery` nodes and populate from `metrics` property
* Add partial parsing support to SavedQuery nodes
* Add `docs` support for SavedQuery descriptions
* Support selctor methods for SavedQuery nodes
* Add `refs` property to SavedQuery node
We don't actually append anything to `refs` for SavedQuery nodes currently.
I'm not sure if anything needs to be appended to them. Regardless, we
access the `refs` property throughout the codebase while iterating over
nodes. It seems wise to support this attribute as to not accidently blow
something up with it not existing.
* Support `saved_queries` when upgrading from manifests <= v10 (and regenerate v11)
* Add changie doc for saved query node support
* Pin to dbt-semantic-interfaces 0.3.0b1 for saved query work
We're gonna release DSI 0.3.0, and if this PR automatically pulls that
in things will break. But the things that need fixing should be handled
separately from this PR. After releasing DSI 0.3.0 I'm going to create
a branch off/ontop of this one, and open a stacked PR with the associated
changes.
* Bump supported DSI version to 0.3.x
* Switch metric filters and saved query where to use ne WhereFilterIntersection
* Update schema yaml readers to create WhereFilterInterfaces
* Expand metric filters and saved query where property to handle both str and list of strs
* Update tests which were broken by where filter changes
* Regeneate v11 manifest
* Fixup: Update `SavedQueryLookup.perform_lookup` to operate on saved queries
I missed this when I was copy and pasting 🤦