Ask AI

You are viewing an unreleased or outdated version of the documentation

Changelog#

0.12.12#

Community Contributions#

  • [dagster-msteams] Introduced a new integration with Microsoft Teams, which includes a connection resource and support for sending messages to Microsoft Teams. See details in the API Docs (thanks @iswariyam!).
  • Fixed a mistake in the sensors docs (thanks @vitorbaptista)!

Bugfixes#

  • Fixed a bug that caused run status sensors to sometimes repeatedly fire alerts.
  • Fixed a bug that caused the emr_pyspark_step_launcher to fail when stderr included non-Log4J-formatted lines.
  • Fixed a bug that caused applyPerUniqueValue config on the QueuedRunCoordinator to fail Helm schema validation.
  • [dagster-shell] Fixed an issue where a failure while executing a shell command sometimes didn’t raise a clear explanation for the failure.

Experimental#

  • Added experimental @asset decorator and build_assets_job APIs to construct asset-based jobs, along with Dagit support.
  • Added load_assets_from_dbt_project and load_assets_from_dbt_manifest, which enable constructing asset-based jobs from DBT models.

0.12.11#

Community Contributions#

  • [helm] The ingress now supports TLS (thanks @cpmoser!)
  • [helm] Fixed an issue where dagit could not be configured with an empty workspace (thanks @yamrzou!)

New#

  • [dagstermill] You can now have more precise IO control over the output notebooks by specifying output_notebook_name in define_dagstermill_solid and providing your own IO manager via "output_notebook_io_manager" resource key.

  • We've deprecated output_notebook argument in define_dagstermill_solid in favor of output_notebook_name.

  • Previously, the output notebook functionality requires “file_manager“ resource and result in a FileHandle output. Now, when specifying output_notebook_name, it requires "output_notebook_io_manager" resource and results in a bytes output.

  • You can now customize your own "output_notebook_io_manager" by extending OutputNotebookIOManager. A built-in local_output_notebook_io_manager is provided for handling local output notebook materialization.

  • See detailed migration guide in https://github.com/dagster-io/dagster/pull/4490.

  • Dagit fonts have been updated.

Bugfixes#

  • Fixed a bug where log messages of the form context.log.info("foo %s", "bar") would not get formatted as expected.
  • Fixed a bug that caused the QueuedRunCoordinator’s tag_concurrency_limits to not be respected in some cases
  • When loading a Run with a large volume of logs in Dagit, a loading state is shown while logs are retrieved, clarifying the loading experience and improving render performance of the Gantt chart.
  • Using solid selection with pipelines containing dynamic outputs no longer causes unexpected errors.

Experimental#

  • You can now set tags on a graph by passing in a dictionary to the tags argument of the @graph decorator or GraphDefinition constructor. These tags will be set on any runs of jobs are built from invoking to_job on the graph.
  • You can now set separate images per solid when using the k8s_job_executor or celery_k8s_job_executor. Use the key image inside the container_config block of the k8s solid tag.
  • You can now target multiple jobs with a single sensor, by using the jobs argument. Each RunRequest emitted from a multi-job sensor’s evaluation function must specify a job_name.

0.12.10#

Community Contributions#

  • [helm] The KubernetesRunLauncher image pull policy is now configurable in a separate field (thanks @yamrzou!).
  • The dagster-github package is now usable for GitHub Enterprise users (thanks @metinsenturk!) A hostname can now be provided via config to the dagster-github resource with the key github_hostname:
execute_pipeline(
      github_pipeline, {'resources': {'github': {'config': {
           "github_app_id": os.getenv('GITHUB_APP_ID'),
           "github_app_private_rsa_key": os.getenv('GITHUB_PRIVATE_KEY'),
           "github_installation_id": os.getenv('GITHUB_INSTALLATION_ID'),
           "github_hostname": os.getenv('GITHUB_HOSTNAME'),
      }}}}
)

New#

  • Added a database index over the event log to improve the performance of pipeline_failure_sensor and run_status_sensor queries. To take advantage of these performance gains, run a schema migration with the CLI command: dagster instance migrate.

Bugfixes#

  • Performance improvements have been made to allow dagit to more gracefully load a run that has a large number of events.
  • Fixed an issue where DockerRunLauncher would raise an exception when no networks were specified in its configuration.

Breaking Changes#

  • dagster-slack has migrated off of deprecated slackclient (deprecated) and now uses [slack_sdk](https://slack.dev/python-slack-sdk/v3-migration/).

Experimental#

  • OpDefinition, the replacement for SolidDefinition which is the type produced by the @op decorator, is now part of the public API.
  • The daily_partitioned_config, hourly_partitioned_config, weekly_partitioned_config, and monthly_partitioned_config now accept an end_offset parameter, which allows extending the set of partitions so that the last partition ends after the current time.

0.12.9#

Community Contributions#

  • A service account can now be specified via Kubernetes tag configuration (thanks @skirino) !

New#

  • Previously in Dagit, when a repository location had an error when reloaded, the user could end up on an empty page with no context about the error. Now, we immediately show a dialog with the error and stack trace, with a button to try reloading the location again when the error is fixed.

  • Dagster is now compatible with Python’s logging module. In your config YAML file, you can configure log handlers and formatters that apply to the entire Dagster instance. Configuration instructions and examples detailed in the docs: https://docs.dagster.io/concepts/logging/python-logging

  • [helm] The timeout of database statements sent to the Dagster instance can now be configured using .dagit.dbStatementTimeout.

  • The QueuedRunCoordinator now supports setting separate limits for each unique value with a certain key. In the below example, 5 runs with the tag (backfill: first) could run concurrently with 5 other runs with the tag (backfill: second).

run_coordinator:
  module: dagster.core.run_coordinator
  class: QueuedRunCoordinator
  config:
    tag_concurrency_limits:
      - key: backfill
        value:
          applyLimitPerUniqueValue: True
        limit: 5

Bugfixes#

  • Previously, when specifying hooks on a pipeline, resource-to-resource dependencies on those hooks would not be resolved. This is now fixed, so resources with dependencies on other resources can be used with hooks.
  • When viewing a run in Dagit, the run status panel to the right of the Gantt chart did not always allow scrolling behavior. The entire panel is now scrollable, and sections of the panel are collapsible.
  • Previously, attempting to directly invoke a solid with Nothing inputs would fail. Now, the defined behavior is that Nothing inputs should not be provided to an invocation, and the invocation will not error.
  • Skip and fan-in behavior during execution now works correctly when solids with dynamic outputs are skipped. Previously solids downstream of a dynamic output would never execute.
  • [helm] Fixed an issue where the image tag wasn’t set when running an instance migration job via .migrate.enabled=True.

0.12.8#

New#

  • Added instance on RunStatusSensorContext for accessing the Dagster Instance from within the run status sensors.

  • The inputs of a Dagstermill solid now are loaded the same way all other inputs are loaded in the framework. This allows rerunning output notebooks with properly loaded inputs outside Dagster context. Previously, the IO handling depended on temporary marshal directory.

  • Previously, the Dagit CLI could not target a bare graph in a file, like so:

    from dagster import op, graph
    
    @op
    def my_op():
        pass
    
    @graph
    def my_graph():
        my_op()
    

    This has been remedied. Now, a file foo.py containing just a graph can be targeted by the dagit CLI: dagit -f foo.py.

  • When a solid, pipeline, schedule, etc. description or event metadata entry contains a markdown-formatted table, that table is now rendered in Dagit with better spacing between elements.

  • The hacker-news example now includes instructions on how to deploy the repository in a Kubernetes cluster using the Dagster Helm chart.

  • [dagster-dbt] The dbt_cli_resource now supports the dbt source snapshot-freshness command (thanks @emilyhawkins-drizly!)

  • [helm] Labels are now configurable on user code deployments.

Bugfixes

  • Dagit’s dependency on graphql-ws is now pinned to < 0.4.0 to avoid a breaking change in its latest release. We expect to remove this dependency entirely in a future Dagster release.
  • Execution steps downstream of a solid that emits multiple dynamic outputs now correctly resolve without error.
  • In Dagit, when repositories are loaded asynchronously, pipelines/jobs now appear immediately in the left navigation.
  • Pipeline/job descriptions with markdown are now rendered correctly in Dagit, and styling is improved for markdown-based tables.
  • The Dagit favicon now updates correctly during navigation to and from Run pages.
  • In Dagit, navigating to assets with keys that contain slashes would sometimes fail due to a lack of URL encoding. This has been fixed.
  • When viewing the Runs list on a smaller viewport, tooltips on run tags no longer flash.
  • Dragging the split panel view in the Solid/Op explorer in Dagit would sometimes leave a broken rendered state. This has been fixed.
  • Dagstermill notebook previews now works with remote user code deployment.
  • [dagster-shell] When a pipeline run fails, subprocesses spawned from dagster-shell utilities will now be properly terminated.
  • Fixed an issue associated with using EventMetadata.asset and EventMetadata.pipeline_run in AssetMaterialization metadata. (Thanks @ymrzkrrs and @drewsonne!)

Breaking Changes

  • Dagstermill solids now require a shared-memory io manager, e.g. fs_io_manager, which allows data to be passed out of the Jupyter process boundary.

Community Contributions

  • [helm] Added missing documentation to fields in the Dagster User Deployments subchart (thanks @jrouly!)

Documentation