Changelog#

1.3.3 (core) / 0.19.3 (libraries)#

New#

load_assets_from_package_module and the other core load_assets_from_ methods now accept a source_key_prefix argument, which allows applying a key prefix to all the source assets that are loaded.
OpExecutionContext now has an asset_partitions_time_window_for_input method.
RunFailureSensorContext now has a get_step_failure_events method.

The Pythonic resource system now supports a set of lifecycle hooks which can be used to manage setup and teardown:


class MyAPIClientResource(ConfigurableResource):
    api_key: str
    _internal_client: MyAPIClient = PrivateAttr()

    def setup_for_execution(self, context):
        self._internal_client = MyAPIClient(self.api_key)

    def get_all_items(self):
        return self._internal_client.items.get()

Added support for specifying input and output config on ConfigurableIOManager.
QueuedRunCoordinator and SubmitRunContext are now exposed as public dagster exports.
[ui] Downstream cross-location dependencies of all source assets are now visible on the asset graph. Previously these dependencies were only displayed if the source asset was defined as a regular asset.
[ui] A new filtering experience is available on the Runs page after enabling feature flag “Experimental Runs table view with filtering”.
[dagster-aws] Allow the S3 compute log manager to specify a show_url_only: true config option, which will display a URL to the S3 file in dagit, instead of the contents of the log file.
[dagster-aws] PickledObjectS3IOManager now fully supports loading partitioned inputs.
[dagster-azure] PickedObjectADLS2IOManager now fully supports loading partitioned inputs.
[dagster-gcp] New GCSResource and ConfigurablePickledObjectGCSIOManager follow the Pythonic resource system. The existing gcs_resource and gcs_pickle_io_manager remain supported.
[dagster-gcp] New BigQueryResource follows the Pythonic resource system. The existing bigquery_resource remains supported.
[dagster-gcp] PickledObjectGCSIOManager now fully supports loading partitioned inputs.
[dagster-postgres] The event watching implementation has been moved from listen/notify based to the polling watcher used by MySQL and SQLite.
[dagster-slack] Add monitor_all_repositories to make_slack_on_run_failure_sensor, thanks @danielgafni!
[dagster-snowflake] New SnowflakeResource follows the Pythonic resource system. The existing snowflake_resource remains supported.

Bugfixes#

Multi-asset sensor context methods for partitions now work when partitioned source assets are targeted.
Previously, the asset backfill page would incorrectly display negative counts for assets with upstream failures. This has been fixed.
In cases where there is an asset which is upstream of one asset produced by a subsettable multi-asset, but downstream of another, Dagster will automatically subset the multi-asset to resolve the underlying cycle in the op dependency graph. In some cases, this process could omit some of the op dependencies, resulting in incorrect execution order. This has been fixed.
Fixed an issue with AssetMetadataValue.value that would cause an infinite recursion error.
Fixed an issue where observable source assets would show up in the asset graph of jobs that did not include them.
Fixed an issue with directly invoking an op or asset with a Pythonic config object with a discriminated union did not work properly.
Fixes a bug where sensors attached to jobs that rely on resources from Definitions were not provided with the required resource definition.

Dagster Cloud#

volumes and volumeMounts values have been added to the agent helm chart.

Experimental#

[dagster-airbyte] load_assets_from_airbyte_instance and load_assets_from_airbyte_project now take a connection_to_auto_materialize_policy_fn for setting AutoMaterializePolicys on Airbyte assets
[dagster-airbyte] Introduced experimental support for Airbyte Cloud. See the using Dagster with Airbyte Cloud docs for more information.

Documentation#

Ever wanted to know more about the files in Dagster projects, including where to put them in your project? Check out the new Dagster project files reference for more info!
We’ve made some improvements to the sidenav / information architecture of our docs!
- The Guides section now contains several new categories, including Working with data assets and Working with tasks
- The Community section is now under About
The Backfills concepts page now includes instructions on how to launch backfills that target ranges of partitions in a single run.

1.3.2 (core) / 0.19.2 (libraries)#

New#

Added performance improvements for yielding time-partitioned run requests.
The asset backfill page now displays targeted assets in topological order.
Replicas can now be specified on Hybrid ECS and K8s agents. In ECS, use the NumReplicas parameter on the agent template in CloudFormation, or the dagsterCloudAgent.replicas field in Helm.
Zero-downtime agent updates can now be configured for the ECS agent. Just set the enableZeroDowntimeDeploys parameter to true in the CloudFormation stack for your agent.
The AssetsDefinition.from_graph, as well as the@graph_asset and @graph_multi_asset decorators now support specifying AutoMaterializePolicys.
[dagstermill] Pythonic resource variant of the dagstermill I/O manager is now available.
[dagster-duckdb] New DuckDBResource for connecting to and querying DuckDB databases.
[ui] Sensor / Schedule overview pages now allow you to select and start/stop multiple sensors/schedules at once.
[ui] Performance improvements to global search for big workspaces.

Bugfixes#

async def ops/assets no longer prematurely finalize async generators during execution.
In some cases, the AutoMaterialize Daemon (and the build_asset_reconciliation_sensor) could incorrectly launch new runs for partitions that already had an in-progress run. This has been fixed.

Breaking Changes#

Yielding run requests for experimental dynamic partitions via run_request_for_partition now throws an error. Instead, users should yield directly instantiated run requests via RunRequest(partition_key=...).
graph_asset and graph_multi_asset now support specifying resource_defs directly (thanks @kmontag42)!

Community Contributions#

A new node_info_to_auto_materialize_policy_fn param added to load_assets_from_dbt_* functions. (thanks @askvinni)!
Added partition_key field to RunStatusSensorContext (thanks @pdstrnadJC)!

Experimental#

For multi-partitioned assets with a time dimension, the auto-materialize policy now only kicks off materializations for the latest time partition window. Previously, all partitions would be targeted.
Added performance improvements to the multi-asset sensor context’s latest_materialization_records_by_key method.
The GraphQL API for launching a backfill no longer errors when the backfill targets assets instead of a job and the allPartitions argument is provided.

Documentation#

Fixed a few typos in various guides.
Fixed a formatting issue in the Automating pipelines guide that was causing a 404.

1.3.1 (core) / 0.19.1 (libraries)#

New#

Performance improvements when evaluating time-partitioned run requests within sensors and schedules.
[ui] Performance improvements when loading the asset catalog and launchpad for deployments with many time-partitioned assets.

Bugfixes#

Fixed an issue where loading a Definitions object that included sensors attached to multiple jobs would raise an error.
Fixed a bug in which Pythonic resources would produce underlying resource values that would fail reference equality checks. This would lead to a conflicting resource version error when using the same Pythonic resource in multiple places.

1.3.0 (core) / 0.19.0 (libraries) "Smooth Operator"#

Major Changes since 1.2.0 (core) / 0.18.0 (libraries)#

Core#

Auto-materialize policies replace the asset reconciliation sensor - We significantly renovated the APIs used for specifying which assets are scheduled declaratively. Compared to build_asset_reconciliation_sensors , AutoMaterializePolicy works across code locations, as well as allow you to customize the conditions under which each asset is auto-materialized. [docs]
Asset backfill page - A new page in the UI for monitoring asset backfills shows the progress of each asset in the backfill.
Clearer labels for tracking changes to data and code - Instead of the opaque “stale” indicator, Dagster’s UI now indicates whether code, upstream data, or dependencies have changed. When assets are in violation of their FreshnessPolicys, Dagster’s UI now marks them as “overdue” instead of “late”.
Auto-materialization and observable source assets - Assets downstream of an observable source asset now use the source asset observations to determine whether upstream data has changed and assets need to be materialized.
Pythonic Config and Resources - The set of APIs introduced in 1.2 is no longer experimental [community memo]. Examples, integrations, and documentation have largely ported to the new APIs. Existing resources and config APIs will continue to be supported for the foreseeable future. Check out migration guide to learn how to incrementally adopt the new APIs.

Docs#

Improved run concurrency docs - You asked (in support), and we answered! This new guide is a one-stop-shop for understanding and implementing run concurrency, whether you’re on Dagster Cloud or deploying to your own infrastructure.
Additions to the Intro to Assets tutorial - We’ve added two new sections to the assets tutorial, focused on scheduling and I/O. While we’re close to wrapping things up for the tutorial revamp, we still have a few topics to cover - stay tuned!
New guide about building machine learning pipelines - Many of our users learn best by example - this guide is one way we’re expanding our library of examples. In this guide, we walk you through building a simple machine learning pipeline using Dagster.
Re-organized Dagster Cloud docs - We overhauled how the Dagster Cloud docs are organized, bringing them more in line with the UI.

Since 1.2.7 (core) / 0.18.7 (libraries)#

New#

Long-running runs can now be terminated after going over a set runtime. See the run termination docs to learn more.
Adds a performance improvement to partition status caching for multi-partitioned assets containing a time dimension.
[ui] Asset groups are now included in global search.
[ui] Assets in the asset catalog have richer status information that matches what is displayed on the asset graph.
[dagster-aws] New AthenaClientResource, ECRPublicResource, RedshiftClientResource, S3Resource, S3FileManagerResource, ConfigurablePickledObjectS3IOManager, SecretsManagerResource follow Pythonic resource system. The existing APIs remain supported.
[dagster-datadog] New DatadogResource follows Pythonic resource system. The existing datadog_resource remains supported.
[dagster-ge] New GEContextResource follows Pythonic resource system. The existing ge_context_resource remains supported.
[dagster-github] New GithubResource follows Pythonic resource system. The existing github_resource remains supported.
[dagster-msteams] New MSTeamsResource follows Pythonic resource system. The existing msteams_resource remains supported.
[dagster-slack] New SlackResource follows Pythonic resource system. The existing slack_resource remains supported.

Bugfixes#

Fixed an issue where using pdb.set_trace no longer worked when running Dagster locally using dagster dev or dagit.
Fixed a regression where passing custom metadata on @asset or Out caused an error to be thrown.
Fixed a regression where certain states of the asset graph would cause GQL errors.
[ui] Fixed a bug where assets downstream of source assets would sometimes incorrectly display a “New data” (previously “stale”) tag for assets with materializations generated from ops (as opposed to SDA materializations).
[ui] Fixed a bug where URLs for code locations named pipelines or jobs could lead to blank pages.
[ui] When configuring a partition-mapped asset backfill, helpful context no longer appears nested within the “warnings” section
[ui] For observable source assets,the asset sidebar now shows a “latest observation” instead of a “latest materialization”

Breaking Changes#

By default, resources defined on Definitions are now automatically bound to jobs. This will only result in a change in behavior if you a) have a job with no "io_manager" defined in its resource_defs and b) have supplied an IOManager with key "io_manager" to the resource_defs argument of your Definitions. Prior to 1.3.0, this would result in the job using the default filesystem-based IOManager for the key "io_manager". In 1.3.0, this will result in the "io_manager" supplied to your Definitions being used instead. The BindResourcesToJobs wrapper, introduced in 1.2 to simulate this behavior, no longer has any effect.
[dagster-celery-k8s] The default kubernetes namespace for run pods when using the Dagster Helm chart with the CeleryK8sRunLauncher is now the same namespace as the Helm chart, instead of the default namespace. To restore the previous behavior, you can set the celeryK8sRunLauncher.jobNamespace field to the string default.
[dagster-snowflake-pandas] Due to a longstanding issue storing Pandas Timestamps in Snowflake tables, the SnowflakePandasIOManager has historically converted all timestamp data to strings before storing it in Snowflake. Now, it will instead ensure that timestamp data has a timezone, and if not, attach the UTC timezone. This allows the timestamp data to be stored as timestamps in Snowflake. If you have been storing timestamp data using the SnowflakePandasIOManager you can set the store_timestamps_as_strings=True configuration to continue storing timestamps as strings. For more information, and instructions for migrating Snowflake tables to use timestamp types, see the Migration Guide.

Changes to experimental APIs

Pythonic Resources and Config
- Enabled passing RunConfig to many APIs which previously would only accept a config dictionary.
- Enabled passing raw Python objects as resources to many APIs which previously would only accept ResourceDefinition.
- Added the ability to pass execution config when constructing a RunConfig object.
- Introduced more clear error messages when trying to mutate state on a Pythonic config or resource object.
- Improved direct invocation experience for assets, ops, schedules and sensors using Pythonic config and resources. Config and resources can now be passed directly as args or kwargs.
The minutes_late and previous_minutes_late properties on the experimental FreshnesPolicySensorContext have been renamed to minutes_overdue and previous_minutes_overdue, respectively.

Removal of deprecated APIs

[previously deprecated, 0.15.0] metadata_entries arguments to event constructors have been removed. While MetadataEntry still exists and will only be removed in 2.0, it is no longer passable to any Dagster public API — users should always pass a dictionary of metadata values instead.

Experimental#

Adds a performance improvement to the multi-asset sensor context’s latest_materialization_records_by_key function.

Documentation#

The Google BigQuery tutorial and reference pages have been updated to use the new BigQueryPandasIOManager and BigQueryPySparkIOManager.
The Snowflake tutorial and reference pages have been updated to use the new SnowflakePandasIOManager and SnowflakePySparkIOManager.

Dagster Cloud#

Previously, when deprovisioning an agent, code location servers were cleaned up in serial. Now, they’re cleaned up in parallel.

1.2.7 (core) / 0.18.7 (libraries)#

New#

Resource access (via both required_resource_keys and Pythonic resources) are now supported in observable source assets.
[ui] The asset graph now shows how many partitions of each asset are currently materializing, and blue bands appear on the partition health bar.
[ui] Added a new page to monitor an asset backfill.
[ui] Performance improvement for Runs page for runs that materialize large numbers of assets.
[ui] Performance improvements for Run timeline and left navigation for users with large numbers of jobs or assets.
[ui] In the run timeline, consolidate “Ad hoc materializations” rows into a single row.
[dagster-aws] The EcsRunLauncher now allows you to customize volumes and mount points for the launched ECS task. See the API docs for more information.
[dagster-duckdb, dagster-duckdb-pandas, dagster-duckdb-pyspark] New DuckDBPandasIOManager and DuckDBPySparkIOManager follow Pythonic resource system. The existing duckdb_pandas_io_manager and duckdb_pyspark_io_manager remain supported.
[dagster-gcp, dagster-gcp-pandas, dagster-gcp-pyspark] New BigQueryPandasIOManager and BigQueryPySparkIOManager follow Pythonic resource system. The existing bigquery_pandas_io_manager and bigquery_pyspark_io_manager remain supported.
[dagster-gcp] The BigQuery resource now accepts authentication credentials as configuration. If you pass GCP authentication credentials to gcp_crentials , a temporary file to store the credentials will be created and the GOOGLE_APPLICATION_CREDENTIALS environment variable will be set to the temporary file. When the BigQuery resource is garbage collected, the environment variable will be unset and the temporary file deleted.
[dagster-snowflake, dagster-snowflake-pandas, dagster-snowflake-pyspark] New SnowflakePandasIOManager and SnowflakePySparkIOManager follow Pythonic resource system. The existing snowflake_pandas_io_manager and snowflake_pyspark_io_manager remain supported.

Bugfixes#

Fixed an issue where dagster dev would periodically emit a harmless but annoying warning every few minutes about a gRPC server being shut down.
Fixed a schedule evaluation error that occurred when schedules returned a RunRequest(partition_key=...) object.
Fixed a bug that caused errors in the asset reconciliation sensor when the event log includes asset materializations with partitions that aren’t part of the asset’s PartitionsDefinition.
Fixed a bug that caused errors in the asset reconciliation sensor when a partitioned asset is removed.
Fixed an issue where run_request_for_partition would incorrectly raise an error for a job with a DynamicPartitionsDefinition that was defined with a function.
Fixed an issue where defining a partitioned job with unpartitioned assets via define_asset_job would raise an error.
Fixed a bug where source asset observations could not be launched from dagit when the asset graph contained partitioned assets.
Fixed a bug that caused __ASSET_JOB has no op named ... errors when using automatic run retries.
[ui] The asset partition health bar now correctly renders partial failed partitions of multi-dimensional assets in a striped red color.
[ui] Fixed an issue where steps that were skipped due to an upstream dependency failure were incorrectly listed as “Preparing” in the right-hand column of the runs timeline.
[ui] Fixed markdown base64 image embeds.
[ui] Guard against localStorage quota errors when storing launchpad config tabs.
[dagster-aws] Fixed an issue where the EcsRunLauncher would fail to launch runs if the use_current_ecs_task_config field was set to False but no task_definition field was set.
[dagster-k8s] Fixed an issue introduced in 1.2.6 where older versions of the kubernetes Python package were unable to import the package.

Community Contributions#

The EcsRunLauncher now allows you to set a capacity provider strategy and customize the ephemeral storage used for launched ECS tasks. See the docs for details. Thanks AranVinkItility!
Fixed an issue where freshness policies were not being correctly applied to assets with key prefixes defined via AssetsDefinition.from_op. Thanks @tghanken for the fix!
Added the minimum_interval_seconds parameter to enable customizing the evaluation interval on the slack run failure sensor, thanks @ldnicolasmay!
Fixed a docs example and updated references, thanks @NicolasPA!

Experimental#

The Resource annotation for Pythonic resource inputs has been renamed to ResourceParam in preparation for the release of the feature in 1.3.
When invoking ops and assets that request resources via parameters directly, resources can now be specified as arguments.
Improved various error messages related to Pythonic config and resources.
If the Resources Dagit feature flag is enabled, they will now show up in the overview page and search.

Documentation#

Learn how to limit concurrency in your data pipelines with our new guide!
Need some help managing a run queue? Check out the new customizing run queue priority guide.
New tutorial section that adds I/O managers to the tutorial project.

You are viewing an unreleased or outdated version of the documentation

Changelog#

1.3.3 (core) / 0.19.3 (libraries)#

New#

Bugfixes#

Dagster Cloud#

Experimental#

Documentation#

1.3.2 (core) / 0.19.2 (libraries)#

New#

Bugfixes#

Breaking Changes#

Community Contributions#

Experimental#

Documentation#

1.3.1 (core) / 0.19.1 (libraries)#

New#

Bugfixes#

1.3.0 (core) / 0.19.0 (libraries) "Smooth Operator"#

Major Changes since 1.2.0 (core) / 0.18.0 (libraries)#

Core#

Docs#

Since 1.2.7 (core) / 0.18.7 (libraries)#

New#

Bugfixes#

Breaking Changes#

Experimental#

Documentation#

Dagster Cloud#

1.2.7 (core) / 0.18.7 (libraries)#

New#

Bugfixes#

Community Contributions#

Experimental#

Documentation#