Long-running conversations often involve multiple participating systems. The more participants there are and the more they need to communicate, the more important it is to ensure loose temporal coupling. This has different dimensions:
- partial availability: do not require the availability of all participating applications at the same time. Instead, make it possible to start / stop systems independently from each another.
- independent scalability: do not require all participants to process work at the same rate.
- local failure: a failure in one system should not not fail the whole conversation. Instead, make it possible to suspend processing and repair failures locally. Meanwhile, allow processing of other parts of the conversation which are not impacted by the failure to continue.
Using workflow for implementing long running conversations makes it easy to ensure these properties. The reason for this is inherent to the way workflow works: since the communication among the different systems is routed through the workflow engine, the workflow engine acts as a kind of buffer between these systems.
As such, it can
- wait until a previously unavailable systems becomes available,
- throttle processing based on back pressure,
- in case of failure, wait until the problem is repaired. Meanwhile, other work continues.
The workflow engine can take advantage of well known patterns such as periodically retrying failed requests, circuit breaker, polling consumer. Workflow can also degrade gracefully by skipping optional steps.
Most important, when using BPMN for modeling workflows, incidents can be visualized on a graphical diagram. This enables programmers and business people to analyse and assess incidents in collaboration. They are then able to prioritize them according to business impact and evaluate resolution strategies together.