An issue preventing the use of default_value on inputs has been resolved. Previously, a defensive error that did not take default_value in to account was thrown.
[dagster-aws] Fixed issue where re-emitting log records from the pyspark_step_launcher would occasionally cause a failure.
[dagit] The asset catalog now displays entries for materialized assets when only a subset of repositories were selected. Previously, it only showed the software-defined assets unless all repositories were selected in Dagit.
Fixed an invariant check in the databricks step launcher that was causing failures when setting the local_dagster_job_package_path config option (Thanks Iswariya Manivannan!)
Software-defined assets, which offer a declarative approach to data orchestration on top of Dagster’s core job/op/graph APIs, have matured significantly. Improvements include partitioned assets, a revamped asset details page in Dagit, a cross-repository asset graph view in Dagit, Dagster types on assets, structured metadata on assets, and the ability to materialize ad-hoc selections of assets without defining jobs. Users can expect the APIs to only undergo minor changes before being declared fully stable in Dagster’s next major release. For more information, view the software-defined assets concepts page here.
We’ve made it easier to define a set of software-defined assets where each Dagster asset maps to a dbt model. All of the dependency information between the dbt models will be reflected in the Dagster asset graph, while still running your dbt project in a single step.
Dagit has a new homepage, dubbed the “factory floor” view, that provides an overview of recent runs of all the jobs. From it, you can monitor the status of each job’s latest run or quickly re-execute a job. The new timeline view reports the status of all recent runs in a convenient gantt chart.
You can now write schedules and sensors that default to running as soon as they are loaded in your workspace, without needing to be started manually in Dagit. For example, you can create a sensor like this:
from dagster import sensor, DefaultSensorStatus
@sensor(job=my_job, default_status=DefaultSensorStatus.RUNNING)defmy_running_sensor():...
As soon as schedules or sensors with the default_status field set to RUNNING are included in the workspace loaded by your Dagster Daemon, they will begin creating ticks and submitting runs.
Op selection now supports selecting ops inside subgraphs. For example, to select an op my_op inside a subgraph my_graph, you can now specify the query as my_graph.my_op. This is supported in both Dagit and Python APIs.
Dagster Types can now have attached metadata. This allows TableSchema objects to be attached to Dagster Types via TableSchemaMetadata. A Dagster Type with a TableSchema will have the schema rendered in Dagit.
A new Pandera integration (dagster-pandera) allows you to use Pandera’s dataframe validation library to wrap dataframe schemas in Dagster types. This provides two main benefits: (1) Pandera’s rich schema validation can be used for runtime data validation of Pandas dataframes in Dagster ops/assets; (2) Pandera schema information is displayed in Dagit using a new TableSchema API for representing arbitrary table schemas.
The new AssetObservation event enables recording metadata about an asset without indicating that the asset has been updated.
AssetMaterializations, ExpectationResults, and AssetObservations can be logged via the context of an op using the OpExecutionContext.log_event method. Output metadata can also be logged using the OpExecutionContext.add_output_metadata method. Previously, Dagster expected these events to be yielded within the body of an op, which caused lint errors for many users, made it difficult to add mypy types to ops, and also forced usage of the verbose Output API. Here’s an example of the new invocations:
from dagster import op, AssetMaterialization
@opdefthe_op(context):
context.log_event(AssetMaterialization(...))
context.add_output_metadata({"foo":"bar"})...
A new Airbyte integration (dagster-airbyte) allows you to kick off and monitor Airbyte syncs from within Dagster. The original contribution from @airbytehq’s own @marcosmarxm includes a resource implementation as well as a pre-built op for this purpose, and we’ve extended this library to support software-defined asset use cases as well. Regardless of which interface you use, Dagster will automatically capture the Airbyte log output (in the compute logs for the relevant steps) and track the created tables over time (via AssetMaterializations).
The ECSRunLauncher (introduced in Dagster 0.11.15) is no longer considered experimental. You can bootstrap your own Dagster deployment on ECS using our docker compose example or you can use it in conjunction with a managed Dagster Cloud deployment. Since its introduction, we’ve added the ability to customize Fargate container memory and CPU, mount secrets from AWS SecretsManager, and run with a variety of AWS networking configurations. Join us in #dagster-ecs in Slack!
[Helm] The default liveness and startup probes for Dagit and user deployments have been replaced with readiness probes. The liveness and startup probe for the Daemon has been removed. We observed and heard from users that under load, Dagit could fail the liveness probe which would result in the pod restarting. With the new readiness probe, the pod will not restart but will stop serving new traffic until it recovers. If you experience issues with any of the probe changes, you can revert to the old behavior by specifying liveness and startup probes in your Helm values (and reach out via an issue or Slack).
The Dagster Daemon now uses the same workspace.yaml file as Dagit to locate your Dagster code. You should ensure that if you make any changes to your workspace.yaml file, they are included in both Dagit’s copy and the Dagster Daemon’s copy. When you make changes to the workspace.yaml file, you don’t need to restart either Dagit or the Dagster Daemon - in Dagit, you can reload the workspace from the Workspace tab, and the Dagster Daemon will periodically check the workspace.yaml file for changes every 60 seconds. If you are using the Dagster Helm chart, no changes are required to include the workspace in the Dagster Daemon.
Dagster’s metadata API has undergone a signficant overhaul. Changes include:
To reflect the fact that metadata can be specified on definitions in addition to events, the following names are changing. The old names are deprecated, and will function as aliases for the new names until 0.15.0:
The metadata_entries keyword argument to events and Dagster types is deprecated. Instead, users should use the metadata keyword argument, which takes a dictionary mapping string labels to MetadataValues.
Arbitrary metadata on In/InputDefinition and Out/OutputDefinition is deprecated. In 0.15.0, metadata passed for these classes will need to be resolvable to MetadataValue (i.e. function like metadata everywhere else in Dagster).
The description attribute of EventMetadataEntry is deprecated.
The static API of EventMetadataEntry (e.g. EventMetadataEntry.text) is deprecated. In 0.15.0, users should avoid constructing EventMetadataEntry objects directly, instead utilizing the metadata dictionary keyword argument, which maps string labels to MetadataValues.
In previous releases, it was possible to supply either an AssetKey, or a function that produced an AssetKey from an OutputContext as the asset_key argument to an Out/OutputDefinition. The latter behavior makes it impossible to gain information about these relationships without running a job, and has been deprecated. However, we still support supplying a static AssetKey as an argument.
We have renamed many of the core APIs that interact with ScheduleStorage, which keeps track of sensor/schedule state and ticks. The old term for the generic schedule/sensor “job” has been replaced by the term “instigator” in order to avoid confusion with the execution API introduced in 0.12.0. If you have implemented your own schedule storage, you may need to change your method signatures appropriately.
Dagit is now powered by Starlette instead of Flask. If you have implemented a custom run coordinator, you may need to make the following change:
Dagit no longer allows non-software-defined asset materializations to be be graphed or grouped by partition. This feature could render in incorrect / incomplete ways because no partition space was defined for the asset.
Dagit’s “Jobs” sidebar now collapses by default on Instance, Job, and Asset pages. To show the left sidebar, click the “hamburger” icon in the upper left.
“Step Execution Time” is no longer graphed on the asset details page in Dagit, which significantly improves page load time. To view this graph, go to the asset graph for the job, uncheck “View as Asset Graph” and click the step to view its details.
The “experimental asset UI” feature flag has been removed from Dagit, this feature is shipped in 0.14.0!
The Dagster Daemon now requires a workspace.yaml file, much like Dagit.
Ellipsis (“...”) is now an invalid substring of a partition key. This is because Dagit accepts an ellipsis to specify partition ranges.
[Helm] The Dagster Helm chart now only supported Kubernetes clusters above version 1.18.
In Dagit, the Asset Catalog now offers a third display mode - a global graph of your software-defined assets.
The Asset Catalog now allows you to filter by repository to see a subset of your assets, and offers a “View in Asset Graph” button for quickly seeing software-defined assets in context.
The Asset page in Dagit has been split into two tabs, “Activity” and “Definition”.
Dagit now displays a warning on the Asset page if the most recent run including the asset’s step key failed without yielding a materialization, making it easier to jump to error logs.
Dagit now gives you the option to view jobs with software-defined assets as an Asset Graph (default) or as an Op Graph, and displays asset \<-> op relationships more prominently when a single op yields multiple assets.
You can now include your assets in a repository with the use of an AssetGroup. Each repository can only have one AssetGroup, and it can provide a jumping off point for creating the jobs you plan on using from your assets.
AssetGroup.build_job supports a selection syntax similar to that found in op selection.
Asset Observations:
You can now yield AssetObservations to log metadata about a particular asset from beyond its materialization site. AssetObservations appear on the asset details page alongside materializations and numerical metadata is graphed. For assets with software-defined partitions, materialized and observed metadata about each partition is rolled up and presented together. For more information, view the docs page here.
Added an asset_observations_for_node method to ExecuteInProcessResult for fetching the AssetObservations from an in-process execution.
Dagster Types with an attached TableSchemaMetadataValue now render the schema in Dagit UI.
[dagster-pandera] New integration library dagster-pandera provides runtime validation from the Pandera dataframe validation library and renders table schema information in Dagit.
OpExecutionContext.log_event provides a way to log AssetMaterializations, ExpectationResults, and AssetObservations from the body of an op without having to yield anything. Likewise, you can use OpExecutionContext.add_output_metadata to attach metadata to an output without having to explicitly use the Output object.
OutputContext.log_event provides a way to log AssetMaterializations from within the handle_output method of an IO manager without yielding. Likewise, output metadata can be added using OutputContext.add_output_metadata.
[dagster-dbt] The load_assets_from_dbt_project function now returns a set of assets that map to a single dbt run command (rather than compiling each dbt model into a separate step). It also supports a new node_info_to_asset_key argument which allows you to customize the asset key that will be used for each dbt node.
[dagster-airbyte] The dagster-airbyte integration now collects the Airbyte log output for each run as compute logs, and generates AssetMaterializations for each table that Airbyte updates or creates.
[dagster-airbyte] The dagster-airbyte integration now supports the creation of software-defined assets, with the build_airbyte_assets function.
[dagster-fivetran] The dagster-fivetran integration now supports the creation of software-defined assets with the build_fivetran_assets function.
The multiprocess executor now supports choosing between spawn or forkserver for how its subprocesses are created. When using forkserver we attempt to intelligently preload modules to reduce the per-op overhead.
[Helm] Labels can now be set on the Dagit and daemon deployments.
[Helm] The default liveness and startup probes for Dagit and user deployments have been replaced with readiness probes. The liveness and startup probe for the Daemon has been removed. We observed and heard from users that under load, Dagit could fail the liveness probe which would result in the pod restarting. With the new readiness probe, the pod will not restart but will stop serving new traffic until it recovers. If you experience issues with any of the probe changes, you can revert to the old behavior by specifying liveness and startup probes in your Helm values (and reach out via an issue or Slack).
Fixed an issue where long job names were truncated prematurely in the Jobs page in Dagit.
Fixed an issue where loading the sensor timeline would sometimes load slowly or fail with a timeout error.
Fixed an issue where the first time a run_status_sensor executed, it would sometimes run very slowly or time out.
Fixed an issue where Launchpad mistakenly defaulted with invalid subset error in Dagit.
Multi-component asset keys can now be used in the asset graph filter bar.
Increased the storage query statement timeout to better handle more complex batch queries.
Added fallback support for older versions of sqlite to service top-level repository views in Dagit (e.g. the top-level jobs, schedule, and sensor pages).
Images in the documentation now enlarge when clicked.
New example in examples/bollinger demonstrates dagster-pandera and TableSchema , and software-defined assets in the context of analyzing stock price data.
[dagit] Various performance improvements for asset graph views.
[dagster-aws] The EcsRunLauncher can now override the secrets_tag parameter to None, which will cause it to not look for any secrets to be included in the tasks for the run. This can be useful in situations where the run launcher does not have permissions to query AWS Secretsmanager.
[dagster-mysql] For instances using MySQL for their run storage, runs created using dagster versions 0.13.17 / 0.13.18 might display an incorrect timestamp for its start time on the Runs page. Running the dagster instance migrate CLI command should resolve the issue.
Op selection now supports selecting ops inside subgraphs. For example, to select an op my_op inside a subgraph my_graph, you can now specify the query as "my_graph.my_op".
The error message raised on failed Dagster type check on an output now includes the description provided on the TypeCheck object.
The dagster asset wipe CLI command now takes a --noprompt option.
Added the new Map config type, used to represent mappings between arbitrary scalar keys and typed values. For more information, see the Map ConfigType docs.
The dagster-daemon process now creates fewer log entries when no actions are taken (for example, if the run queue is empty)
[dagster-k8s] When upgrading the Dagster helm chart, the old dagster-daemon pod will now spin down completely before the new dagster-daemon pod is started.
[dagster-k8s] A flag can now be set in the Dagster helm chart to control whether the Kubernetes Jobs and Pods created by the K8sRunLauncher should fail if the Dagster run fails. To enable this flag, set the `failPodOnRunFailure key to true in the run launcher portion of the Helm chart.
[dagster-dbt] Fixed compatibility issues with dbt 1.0. The schema and data arguments on the DbtCliResource.test function no longer need to be set to False to avoid errors, and the dbt output will be no longer be displayed in json format in the event logs.
Dagster Types can now have metadata entries attached to them.
DagsterGraphQLClient now supports submitting runs with op/solid sub-selections.
[dagit] The Asset Catalog view will now include information from both AssetMaterializations and AssetObservation events for each asset.
[dagit][software-defined-assets] A warning will now be displayed if you attempt to backfill partitions of an asset whose upstream dependencies are missing.
When Dagit fails to load a list of ops, the error message used the legacy term “solids”. Now it uses “ops”.
Runs created using dagster versions 0.13.15 / 0.13.16 / 0.13.17 might display an incorrect timestamp for its start time on the Runs page. This would only happen if you had run a schema migration (using one of those versions) with the dagster instance migrate CLI command. Running the dagster instance reindex command should run a data migration that resolves this issue.
When attempting to invoke run status sensors or run failure sensors, it will now incur an error. Run status/failure sensor invocation is not yet supported.
[dagster-k8s] Fixed a bug in the sanitization of K8s label values with uppercase characters and underscores
[software-defined-assets] The namespace argument of the @asset decorator now accepts a list of strings in addition to a single string.
[memoization] Added a missing space to the error thrown when trying to use memoization without a persistent Dagster instance.
[metadata] Two new metadata types, TableSchemaMetadataEntryData and TableMetadataEntryData allow you to emit metadata representing the schema / contents of a table, to be displayed in Dagit.