default_value in Field no longer accepts native instances of python enums. Instead
the underlying string representation in the config system must be used.
default_value in Field no longer accepts callables.
The dagster_aws imports have been reorganized; you should now import resources from
dagster_aws.<AWS service name>. dagster_aws provides s3, emr, redshift, and cloudwatch
modules.
The dagster_aws S3 resource no longer attempts to model the underlying boto3 API, and you can
now just use any boto3 S3 API directly on a S3 resource, e.g.
context.resources.s3.list_objects_v2. (#2292)
New
New Playground view in dagit showing an interactive config map
Improved storage and UI for showing schedule attempts
Added the ability to set default values in InputDefinition
Added CLI command dagster pipeline launch to launch runs using a configured RunLauncher
Added ability to specify pipeline run tags using the CLI
Added a pdb utility to SolidExecutionContext to help with debugging, available within a solid
as context.pdb
Added PresetDefinition.with_additional_config to allow for config overrides
Added resource name to log messages generated during resource initialization
Added grouping tags for runs that have been retried / reexecuted.
Bugfix
Fixed a bug where date range partitions with a specified end date was clipping the last day
Fixed an issue where some schedule attempts that failed to start would be marked running forever.
Fixed the @weekly partitioned schedule decorator
Fixed timezone inconsistencies between the runs view and the schedules view
Integers are now accepted as valid values for Float config fields
Fixed an issue when executing dagstermill solids with config that contained quote characters.
dagstermill
The Jupyter kernel to use may now be specified when creating dagster notebooks with the --kernel
flag.
dagster-dbt
dbt_solid now has a Nothing input to allow for sequencing
dagster-k8s
Added get_celery_engine_config to select celery engine, leveraging Celery infrastructure
Documentation
Improvements to the airline and bay bikes demos
Improvements to our dask deployment docs (Thanks jswaney!!)
Added the IntSource type, which lets integers be set from environment variables in config.
You may now set tags on pipeline definitions. These will resolve in the following cases:
Loading in the playground view in Dagit will pre-populate the tag container.
Loading partition sets from the preset/config picker will pre-populate the tag container with
the union of pipeline tags and partition tags, with partition tags taking precedence.
Executing from the CLI will generate runs with the pipeline tags.
Executing programmatically using the execute_pipeline api will create a run with the union
of pipeline tags and RunConfig tags, with RunConfig tags taking precedence.
Scheduled runs (both launched and executed) will have the union of pipeline tags and the
schedule tags function, with the schedule tags taking precedence.
Output materialization configs may now yield multiple Materializations, and the tutorial has
been updated to reflect this.
We now export the SolidExecutionContext in the public API so that users can correctly type hint
solid compute functions.
Dagit
Pipeline run tags are now preserved when resuming/retrying from Dagit.
Scheduled run stats are now grouped by partition.
A "preparing" section has been added to the execution viewer. This shows steps that are in
progress of starting execution.
Markers emitted by the underlying execution engines are now visualized in the Dagit execution
timeline.
Bugfix
Resume/retry now works as expected in the presence of solids that yield optional outputs.
Fixed an issue where dagster-celery workers were failing to start in the presence of config
values that were None.
Fixed an issue with attempting to set threads_per_worker on Dask distributed clusters.
dagster-postgres
All postgres config may now be set using environment variables in config.
dagster-aws
The s3_resource now exposes a list_objects_v2 method corresponding to the underlying boto3
API. (Thanks, @basilvetas!)
Added the redshift_resource to access Redshift databases.
dagster-k8s
The K8sRunLauncher config now includes the load_kubeconfig and kubeconfig_file options.
Documentation
Fixes and improvements.
Dependencies
dagster-airflow no longer pins its werkzeug dependency.
Community
We've added opt-in telemetry to Dagster so we can collect usage statistics in order to inform
development priorities. Telemetry data will motivate projects such as adding features in
frequently-used parts of the CLI and adding more examples in the docs in areas where users
encounter more errors.
We will not see or store solid definitions (including generated context) or pipeline definitions
(including modes and resources). We will not see or store any data that is processed within solids
and pipelines.
If you'd like to opt in to telemetry, please add the following to $DAGSTER_HOME/dagster.yaml:
telemetry:
enabled: true
Thanks to @basilvetas and @hspak for their contributions!
It is now possible to use Postgres to back schedule storage by configuring
dagster_postgres.PostgresScheduleStorage on the instance.
Added the execute_pipeline_with_mode API to allow executing a pipeline in test with a specific
mode without having to specify RunConfig.
Experimental support for retries in the Celery executor.
It is now possible to set run-level priorities for backfills run using the Celery executor by
passing --celery-base-priority to dagster pipeline backfill.
Added the @weekly schedule decorator.
Deprecations
The dagster-ge library has been removed from this release due to drift from the underlying
Great Expectations implementation.
dagster-pandas
PandasColumn now includes an is_optional flag, replacing the previous
ColumnExistsConstraint.
You can now pass the ignore_missing_values flag to PandasColumn in order to apply column
constraints only to the non-missing rows in a column.
dagster-k8s
The Helm chart now includes provision for an Ingress and for multiple Celery queues.
It is now possible to configure a Dagit instance to disable executing pipeline runs in a local
subprocess.
Resource initialization, teardown, and associated failure states now emit structured events
visible in Dagit. Structured events for pipeline errors and multiprocess execution have been
consolidated and rationalized.
Support Redis queue provider in dagster-k8s Helm chart.
Support external postgresql in dagster-k8s Helm chart.
Bugfix
Fixed an issue with inaccurate timings on some resource initializations.
Fixed an issue that could cause the multiprocess engine to spin forever.
Fixed an issue with default value resolution when a config value was set using SourceString.
Fixed an issue when loading logs from a pipeline belonging to a different repository in Dagit.
Fixed an issue with where the CLI command dagster schedule up would fail in certain scenarios
with the SystemCronScheduler.
Pandas
Column constraints can now be configured to permit NaN values.
Dagstermill
Removed a spurious dependency on sklearn.
Docs
Improvements and fixes to docs.
Restored dagster.readthedocs.io.
Experimental
An initial implementation of solid retries, throwing a RetryRequested exception, was added.
This API is experimental and likely to change.
Other
Renamed property runtime_type to dagster_type in definitions. The following are deprecated
and will be removed in a future version.
InputDefinition.runtime_type is deprecated. Use InputDefinition.dagster_type instead.
OutputDefinition.runtime_type is deprecated. Use OutputDefinition.dagster_type instead.
CompositeSolidDefinition.all_runtime_types is deprecated. Use
CompositeSolidDefinition.all_dagster_types instead.
SolidDefinition.all_runtime_types is deprecated. Use SolidDefinition.all_dagster_types
instead.
PipelineDefinition.has_runtime_type is deprecated. Use PipelineDefinition.has_dagster_type
instead.
PipelineDefinition.runtime_type_named is deprecated. Use
PipelineDefinition.dagster_type_named instead.
PipelineDefinition.all_runtime_types is deprecated. Use
PipelineDefinition.all_dagster_types instead.
dagster.readthedocs.io is currently stale due to availability issues.
New
Improvements to S3 Resource. (Thanks @dwallace0723!)
Better error messages in Dagit.
Better font/styling support in Dagit.
Changed OutputDefinition to take is_required rather than is_optional argument. This is to
remain consistent with changes to Field in 0.7.1 and to avoid confusion
with python's typing and dagster's definition of Optional, which indicates None-ability,
rather than existence. is_optional is deprecated and will be removed in a future version.
Added support for Flower in dagster-k8s.
Added support for environment variable config in dagster-snowflake.
Bugfixes
Improved performance in Dagit waterfall view.
Fixed bug when executing solids downstream of a skipped solid.
Improved navigation experience for pipelines in Dagit.
Fixed for the dagster-aws CLI tool.
Fixed issue starting Dagit without DAGSTER_HOME set on windows.
Fixed pipeline subset execution in partition-based schedules.