[dagster] Sensors can now specify a minimum_interval_seconds argument, which determines the minimum amount of time between sensor evaluations.
[dagit] After manually reloading the current repository, users will now be prompted to regenerate preset-based or partition-set based run configs in the Playground view. This helps ensure that the generated run config is up to date when launching new runs. The prompt does not occur when the repository is automatically reloaded.
Bugfixes
Updated the -n/--max_workers default value for the dagster api grpc command to be None. When set to None, the gRPC server will use the default number of workers which is based on the CPU count. If you were previously setting this value to 1, we recommend removing the argument or increasing the number.
Fixed issue loading the schedule tick history graph for new schedules that have not been turned on.
In Dagit, newly launched runs will open in the current tab instead of a new tab.
Dagit bugfixes and improvements, including changes to loading state spinners.
When a user specifies both an intermediate storage and an IO manager for a particular output, we no longer silently ignore the IO manager
[dagster-azure] Fix for listing files on ADL example (thanks @ericct!)
New
[dagstermill] Users can now specify custom tags & descriptions for notebook solids.
[dagster-pagerduty / dagster-slack] Added built-in hook integrations to create pagerduty/slack alerts when solids fail.
[dagit] Added ability to preview runs for upcoming schedule ticks.
Bugfixes
Fixed an issue where run start times and end times were displayed in the wrong timezone in Dagit when using Postgres storage.
Schedules with partitions that weren’t able to execute due to not being able to find a partition will now display the name of the partition they were unable to find on the “Last tick” entry for that schedule.
Improved timing information display for queued and canceled runs within the Runs table view and on individual Run pages in Dagit.
Improvements to the tick history view for schedules and sensors.
Fixed formatting issues on the Dagit instance configuration page.
Miscellaneous Dagit bugfixes and improvements.
The dagster pipeline launch command will now respect run concurrency limits if they are applied on your instance.
Fixed an issue where re-executing a run created by a sensor would cause the daemon to stop executing any additional runs from that sensor.
Sensor runs with invalid run configuration will no longer create a failed run - instead, an error will appear on the page for the sensor, allowing you to fix the configuration issue.
General dagstermill housekeeping: test refactoring & type annotations, as well as repinning ipykernel to solve #3401
Documentation
Improved dagster-dbt example.
Added examples to demonstrate experimental features, including Memoized Development and Dynamic Graph.
Added a PR template and how to pick an issue for the first time contributors
A native scheduler with support for exactly-once, fault tolerant, timezone-aware scheduling.
A new Dagster daemon process has been added to manage your schedules and sensors with a
reconciliation loop, ensuring that all runs are executed exactly once, even if the Dagster daemon
experiences occasional failure. See the
Migration Guide for
instructions on moving from SystemCronScheduler or K8sScheduler to the new scheduler.
First-class sensors, built on the new Dagster daemon, allow you to instigate runs based on
changes in external state - for example, files on S3 or assets materialized by other Dagster
pipelines. See the Sensors Overview
for more information.
Dagster now supports pipeline run queueing. You can apply instance-level run concurrency
limits and prioritization rules by adding the QueuedRunCoordinator to your Dagster instance. See
the Run Concurrency Overview
for more information.
The IOManager abstraction provides a new, streamlined primitive for granular control over where
and how solid outputs are stored and loaded. This is intended to replace the (deprecated)
intermediate/system storage abstractions, See the
IO Manager Overview for more
information.
A new Partitions page in Dagit lets you view your your pipeline runs organized by partition.
You can also launch backfills from Dagit and monitor them from this page.
A new Instance Status page in Dagit lets you monitor the health of your Dagster instance,
with repository location information, daemon statuses, instance-level schedule and sensor
information, and linkable instance configuration.
Resources can now declare their dependencies on other resources via the
required_resource_keys parameter on @resource.
Our support for deploying on Kubernetes is now mature and battle-tested Our Helm chart is
now easier to configure and deploy, and we’ve made big investments in observability and
reliability. You can view Kubernetes interactions in the structured event log and use Dagit to
help you understand what’s happening in your deployment. The defaults in the Helm chart will
give you graceful degradation and failure recovery right out of the box.
Experimental support for dynamic orchestration with the new DynamicOutputDefinition API.
Dagster can now map the downstream dependencies over a dynamic output at runtime.
We’ve dropped support for Python 2.7, based on community usage and enthusiasm for Python 3-native
public APIs.
Removal of deprecated APIs
These APIs were marked for deprecation with warnings in the 0.9.0 release, and have been removed in
the 0.10.0 release.
The decorator input_hydration_config has been removed. Use the dagster_type_loader decorator
instead.
The decorator output_materialization_config has been removed. Use dagster_type_materializer
instead.
The system storage subsystem has been removed. This includes SystemStorageDefinition,
@system_storage, and default_system_storage_defs . Use the new IOManagers API instead. See
the IO Manager Overview for more
information.
The config_field argument on decorators and definitions classes has been removed and replaced
with config_schema. This is a drop-in rename.
The argument step_keys_to_execute to the functions reexecute_pipeline and
reexecute_pipeline_iterator has been removed. Use the step_selection argument to select
subsets for execution instead.
Repositories can no longer be loaded using the legacy repository key in your workspace.yaml;
use load_from instead. See the
Workspaces Overview for
documentation about how to define a workspace.
Breaking API Changes
SolidExecutionResult.compute_output_event_dict has been renamed to
SolidExecutionResult.compute_output_events_dict. A solid execution result is returned from
methods such as result_for_solid. Any call sites will need to be updated.
The .compute suffix is no longer applied to step keys. Step keys that were previously named
my_solid.compute will now be named my_solid. If you are using any API method that takes a
step_selection argument, you will need to update the step keys accordingly.
The pipeline_def property has been removed from the InitResourceContext passed to functions
decorated with @resource.
Dagstermill
If you are using define_dagstermill_solid with the output_notebook parameter set to True,
you will now need to provide a file manager resource (subclass of
dagster.core.storage.FileManager) on your pipeline mode under the resource key "file_manager",
e.g.:
from dagster import ModeDefinition, local_file_manager, pipeline
from dagstermill import define_dagstermill_solid
my_dagstermill_solid = define_dagstermill_solid("my_dagstermill_solid", output_notebook=True,...)@pipeline(mode_defs=[ModeDefinition(resource_defs={"file_manager": local_file_manager})])defmy_dagstermill_pipeline():
my_dagstermill_solid(...)
Helm Chart
The schema for the scheduler values in the helm chart has changed. Instead of a simple toggle
on/off, we now require an explicit scheduler.type to specify usage of the
DagsterDaemonScheduler, K8sScheduler, or otherwise. If your specified scheduler.type has
required config, these fields must be specified under scheduler.config.
snake_case fields have been changed to camelCase. Please update your values.yaml as follows:
pipeline_run → pipelineRun
dagster_home → dagsterHome
env_secrets → envSecrets
env_config_maps → envConfigMaps
The Helm values celery and k8sRunLauncher have now been consolidated under the Helm value
runLauncher for simplicity. Use the field runLauncher.type to specify usage of the
K8sRunLauncher, CeleryK8sRunLauncher, or otherwise. By default, the K8sRunLauncher is
enabled.
All Celery message brokers (i.e. RabbitMQ and Redis) are disabled by default. If you are using
the CeleryK8sRunLauncher, you should explicitly enable your message broker of choice.
Event log messages streamed to stdout and stderr have been streamlined to be a single line
per event.
Experimental support for memoization and versioning lets you execute pipelines incrementally,
selecting which solids need to be rerun based on runtime criteria and versioning their outputs
with configurable identifiers that capture their upstream dependencies.
To set up memoized step selection, users can provide a MemoizableIOManager, whose has_output
function decides whether a given solid output needs to be computed or already exists. To execute
a pipeline with memoized step selection, users can supply the dagster/is_memoized_run run tag
to execute_pipeline.
To set the version on a solid or resource, users can supply the version field on the definition.
To access the derived version for a step output, users can access the version field on the
OutputContext passed to the handle_output and load_input methods of IOManager and the
has_output method of MemoizableIOManager.
Schedules that are executed using the new DagsterDaemonScheduler can now execute in any
timezone by adding an execution_timezone parameter to the schedule. Daylight Savings Time
transitions are also supported. See the
Schedules Overview for
more information and examples.
Countdown and refresh buttons have been added for pages with regular polling queries (e.g. Runs,
Schedules).
Confirmation and progress dialogs are now presented when performing run terminations and
deletions. Additionally, hanging/orphaned runs can now be forced to terminate, by selecting
"Force termination immediately" in the run termination dialog.
The Runs page now shows counts for "Queued" and "In progress" tabs, and individual run pages
show timing, tags, and configuration metadata.
The backfill experience has been improved with means to view progress and terminate the entire
backfill via the partition set page. Additionally, errors related to backfills are now surfaced
more clearly.
Shortcut hints are no longer displayed when attempting to use the screen capture command.
The asset page has been revamped to include a table of events and enable organizing events by
partition. Asset key escaping issues in other views have been fixed as well.
Miscellaneous bug fixes, frontend performance tweaks, and other improvements are also included.
Added a new dagster-docker library with a DockerRunLauncher that launches each run in its own
Docker container. (See Deploying with Docker docs
for an example.)
Added support for AWS Athena. (Thanks @jmsanders!)
Added mocks for AWS S3, Athena, and Cloudwatch in tests. (Thanks @jmsanders!)
Allow setting of S3 endpoint through env variables. (Thanks @marksteve!)
Various bug fixes and new features for the Azure, Databricks, and Dask integrations.
Added a create_databricks_job_solid for creating solids that launch Databricks jobs.