Having explored specialization of state diagrams in some detail, we now turn to dataflow diagrams. Dataflow diagrams are intended to show the functionality of a system: the various processes, and the flows of information and material that link them to each other, to inventories (data stores), and to various agents external to the system. A dataflow diagram (DFD) consists of a collection of processes, stores, and terminators linked by flows. A simple example taken from Yourdon (1989, p. 141) is given in figure 5.8. This discussion follows the approach taken by Yourdon (1989, ch. 9), to which the interested reader is directed for a more detailed exposition.
Figure 5.8: Example of a dataflow diagram: Order processing
Processes, shown as circles in the DFD, are the component actions or subprocesses which together constitute the overall process or system being represented in the diagram. Stores, represented by pairs of parallel lines in the DFD, are repositories of the data or material carried in the flows. Terminators, shown as rectangles in the DFD, represent the actors, external to the system being modeled, that interact with the various system processes. Flows, shown as arrows in the DFD, represent the movement of information or material between processes, terminators, and stores.
Before discussing specialization of dataflow diagrams, we must be more precise about the set of behaviors described by a dataflow diagram. While the DFD approach as usually presented does not specify what such a ''DFD behavior''would look like, it seems reasonable to describe it as a sequence of processes and flows.[10]An immediate consequence of this approach under maximal execution set semantics is that executions of a particular DFD may only include processes and flows contained in that DFD. Note that terminators and stores are implicitly included in executions as the endpoints of flows.
A dataflow diagram does more, however, than simply list what processes and flows may occur in an instance. It also says something about the relationship between those flows and processes. For example, for each segment of a process that occurs in a DFD instance, one would expect some and possibly all of the flows into and out of that process to also occur.
In attempting to state these constraints precisely, one must take a position on certain questions about how a DFD is to be interpreted. For example, in the present discussion we will assume that in general, all flows into or out of a store or terminator may occur independently of each other.[11] We will also assume that each process instance must be accompanied by at least one inflow and one outflow, but that (without extending the dataflow representation) one cannot, in general, say more about which flows accompany a process execution without appealing to the semantics of the domain being modeled. For example, in figure 5.8, any instance of Ship books must involve all three flows: an incoming shipping memo and books, which are transformed into an outgoing shipment to the customer. However, in the same diagram the flow of an order into Receive order may result in a flow of order details into the Orders store or the flow of an invalid order back to the customer, but (presumably) not both. This latter issue appears to represent a fundamental ambiguity in the dataflow representation: it would seem that there is no domain independent interpretation of a DFD that permits a consistent definition of its class membership.
Since we have no domain independent interpretation of a DFD as defining a class, specialization cannot be extended to DFDs in a domain independent fashion. That is, in general, one cannot determine whether one DFD is a valid specialization of another without explicating which flows are mandatory and which are optional, and under what circumstances, information that is not captured in the DFD itself.
These ambiguities in the dataflow diagramming technique are well known and resolutions have been proposed (France 1992). We can proceed without such extensions, however, by limiting ourselves to transformations that neither add flows to nor delete flows from a process component. These transformations will then be specializing under any interpretation of process flows, because such flows are left intact under the transformation. Interestingly, even under this constraint we obtain a set of transformations that is rich enough to be useful, as will be illustrated in section 5.6.
We are now in a position to specify what executions are in the maximal execution set of a dataflow diagram. We can then identify transformations which result in a restriction on the maximal execution set and thus (as argued in section 5.2) result in a specialization. The maximal execution set of a dataflow diagram includes all sequences of processes and flows that satisfy the following constraints:
All processes and flows in the sequence appear as components of the dataflow diagram.
Each input flow or output flow to a process that appears in the sequence must be associated with at least one instance of that process in the sequence.
Each process that appears in the sequence must have at least one associated input flow and one associated output flow.
We can now give a definition of specialization for dataflow diagrams that follows directly from section 5.2. We can define a dataflow diagram D' to be a specialization of dataflow diagram D if and only if either:
The set of sequences permitted by D' is a subset of the set of sequences permitted by D.
Either D or D' can be refined such that condition 1 holds. (This essentially amounts to resolving differences in the granularity of the two process descriptions by decomposing process components.)
Having defined the relationship between a dataflow diagram and its execution set in terms of the constraints above, we are now in a position to identify a set of specializing transformations which operationalize the above definition. For this it will be useful to first introduce a set of refining/abstracting transformations, and this in turn requires a formal definition of the dataflow diagram and its attribute space. The formal definitions and analysis are given in appendixes F and G. In the discussion that follows we will summarize and briefly motivate the results.
For purposes of the current analysis of dataflow diagrams, we need only consider a single refinement—exhaustive process decomposition—and its corresponding abstraction—total process aggregation. Intuitively we achieve exhaustive process decomposition by replacing a component process with a set of subprocesses (including a generic process so that the decomposition is exhaustive) interconnected by all possible generic flows and with a copy of each ''external''input and output flow linked in turn to each of the subprocesses. The presence of all possible flows and the generic process insures that the decomposed process represents a true refinement (i.e., does not restrict the extension of the original dataflow diagram in any way). In practice, of course, decomposition of processes in dataflow diagrams does not include all possible flows and subprocesses, for such decomposition involves both a refinement and a specialization (restriction of extension) of the original dataflow diagram (which is also consistent with other decompositions). The exhaustive process decomposition is thus of primarily theoretical interest: a kind of refinement benchmark against which specializations that involve decomposition can be analyzed.
As noted above, in developing a set of specializing transformations we will limit ourselves to transformations which preserve flows in and out of processes. Note that any such transformation must be specializing, because any executions in the maximal execution set of the resulting dataflow diagram must satisfy the MES conditions for the original dataflow diagram as well, since these only involve the relationship between flows and their associated processes, and these are not affected if we preserve flows in and out of processes. We can identify several useful specializing transformations which are consistent with this constraint:
Deletion[figures 5.11, 5.12, and 5.13 in section 5.6). Note that this transformation preserves the constraint on flows since all remaining processes have all their flows intact (the only components with deleted flows are terminators and stores). The intuitive justification for this transformation is that we take stores and terminators to be asynchronous and thus an execution may be restricted to one side or the other of a boundary defined by these components. Decomposition of a Process Any process in a DFD can be decomposed into a lower-level DFD as long as the flows into and out of the decomposition are consistent with the flows in the top-level diagram. Note that this kind of decomposition is not exhaustive in the sense of exhaustive process decomposition (which we argued above is a refinement). This ''nonexhaustive''decomposition can be thought of as a refinement (exhaustive process decomposition) composed with a specialization (deleting some subprocesses and decomposed flows). Note that our constraint that flows associated with a process are preserved is satisfied by the ''flow consistency''aspect of this form of decomposition. That is, we require that for each flow into or out of the decomposed process, there be at least one identical flow into or out of one of the resulting subprocesses.
Specialization of a Component If one specializes any component (terminator, store, process, or flow) of a dataflow diagram, the resulting diagram will be a specialization of the original diagram. Note that here again we preserve flows associated with each process. In particular, a specialized process can be thought of as a kind of subset of the original process, which is to say, we replace the original process with a subprocess, and this means that specialization of a process is nothing more than a kind of process decomposition. A similar argument might be made for the specialization of flows. Finally, under the semantics we are employing for dataflow diagrams, terminators and stores figure into the maximal execution set of a dataflow diagram only as endpoints of a flow, which is to say, they are essentially attributes of some flow, and this means that specialization of terminators and stores is a kind of flow specialization. As we have just noted, this flow specialization can be understood as a kind of flow decomposition. section 5.8 for fuller discussion of this issue.)