The general strategy of checking the type of a datum and calling an appropriate function is called dispatching on type. This is a powerful strategy for obtaining modularity in system design. On the other hand, implementing the dispatch as in section 2.4.2 has two significant weaknesses. One weakness is that the generic interface functions (real_part, imag_part, magnitude, and angle) must know about all the different representations. For instance, suppose we wanted to incorporate a new representation for complex numbers into our complex-number system. We would need to identify this new representation with a type, and then add a clause to each of the generic interface functions to check for the new type and apply the appropriate selector for that representation.
Another weakness of the technique is that even though the individual representations can be designed separately, we must guarantee that no two functions in the entire system have the same name. This is why Ben and Alyssa had to change the names of their original functions from section 2.4.1.
The issue underlying both of these weaknesses is that the technique for implementing generic interfaces is not additive. The person implementing the generic selector functions must modify those functions each time a new representation is installed, and the people interfacing the individual representations must modify their code to avoid name conflicts. In each of these cases, the changes that must be made to the code are straightforward, but they must be made nonetheless, and this is a source of inconvenience and error. This is not much of a problem for the complex-number system as it stands, but suppose there were not two but hundreds of different representations for complex numbers. And suppose that there were many generic selectors to be maintained in the abstract-data interface. Suppose, in fact, that no one programmer knew all the interface functions or all the representations. The problem is real and must be addressed in such programs as large-scale data-base-management systems.
What we need is a means for modularizing the system design even further. This is provided by the programming technique known as data-directed programming. To understand how data-directed programming works, begin with the observation that whenever we deal with a set of generic operations that are common to a set of different types we are, in effect, dealing with a two-dimensional table that contains the possible operations on one axis and the possible types on the other axis. The entries in the table are the functions that implement each operation for each type of argument presented. In the complex-number system developed in the previous section, the correspondence between operation name, data type, and actual function was spread out among the various conditional clauses in the generic interface functions. But the same information could have been organized in a table, as shown in figure 2.22.
Data-directed programming is the technique of designing programs to work with such a table directly. Previously, we implemented the mechanism that interfaces the complex-arithmetic code with the two representation packages as a set of functions that each perform an explicit dispatch on type. Here we will implement the interface as a single function that looks up the combination of the operation name and argument type in the table to find the correct function to apply, and then applies it to the contents of the argument. If we do this, then to add a new representation package to the system we need not change any existing functions; we need only add new entries to the table.
Here is how data-directed programming can be used in the complex-number system. Ben, who developed the rectangular representation, implements his code just as he did originally. He defines a collection of functions, or a package, and interfaces these to the rest of the system by adding entries to the table that tell the system how to operate on rectangular numbers. This is accomplished by calling the following function:
function install_rectangular_package() { function real_part(z) { return head(z); } function imag_part(z) { return tail(z); } function make_from_real_imag(x, y) { return pair(x, y); } function magnitude(z) { return math_sqrt(square(real_part(z)) + square(imag_part(z))); } function angle(z) { return math_atan(imag_part(z), real_part(z)); } function make_from_mag_ang(r, a) { return pair(r * math_cos(a), r * math_sin(a)); } // interface to the rest of the system function tag(x) { return attach_tag("rectangular", x); } put("real_part", list("rectangular"), real_part); put("imag_part", list("rectangular"), imag_part); put("magnitude", list("rectangular"), magnitude); put("angle", list("rectangular"), angle); put("make_from_real_imag", "rectangular", (x, y) => tag(make_from_real_imag(x, y))); put("make_from_mag_ang", "rectangular", (r, a) => tag(make_from_mag_ang(r, a))); return "done"; } install_rectangular_package();
function install_polar_package() { // internal functions function magnitude(z) { return head(z); } function angle(z) { return tail(z); } function make_from_mag_ang(r, a) { return pair(r, a); } function real_part(z) { return magnitude(z) * math_cos(angle(z)); } function imag_part(z) { return magnitude(z) * math_sin(angle(z)); } function make_from_real_imag(x, y) { return pair(math_sqrt(square(x) + square(y)), math_atan(y, x)); } // interface to the rest of the system function tag(x) { return attach_tag("polar", x); } put("real_part", list("polar"), real_part); put("imag_part", list("polar"), imag_part); put("magnitude", list("polar"), magnitude); put("angle", list("polar"), angle); put("make_from_real_imag", "polar", (x, y) => tag(make_from_real_imag(x, y))); put("make_from_mag_ang", "polar", (r, a) => tag(make_from_mag_ang(r, a))); return "done"; } install_polar_package();
Even though Ben and Alyssa both still use their original functions defined with the same names as each other's (e.g., real_part), these definitions are now internal to different functions (see section Cound not find label for sec:block-structure), so there is no name conflict.
The complex-arithmetic selectors access the table by means of a
general operation
function
called apply_generic, which
applies a generic operation to some arguments. The function apply_generic
looks in the table under the name of the operation and the types of the
arguments and applies the resulting
function
if one is present:[3]
function apply_generic(op, args) { const type_tags = map(type_tag, args); const fun = get(op, type_tags); return fun !== undefined ? apply(fun, map(contents, args)) : Error("No method for these types in apply_generic", list(op, type_tags)); }
function real_part(z) { return apply_generic("real_part", list(z)); } function imag_part(z) { return apply_generic("imag_part", list(z)); } function magnitude(z) { return apply_generic("magnitude", list(z)); } function angle(z) { return apply_generic("angle", list(z)); }
We can also extract from the table the constructors to be used by the programs external to the packages in making complex numbers from real and imaginary parts and from magnitudes and angles. As in section 2.4.2, we construct rectangular numbers whenever we have real and imaginary parts, and polar numbers whenever we have magnitudes and angles:
function make_from_real_imag(x, y) { return get("make_from_real_imag", "rectangular")(x, y); } function make_from_mag_ang(r, a) { return get("make_from_mag_ang", "polar")(r, a); }
function deriv(exp, variable) { return is_number(exp) ? 0 : is_variable(exp) ? (is_same_variable(exp, variable)) ? 1 : 0 : is_sum(exp) ? make_sum(deriv(addend(exp), variable), deriv(augend(exp), variable)) : is_product(exp) ? make_sum(make_product(multiplier(exp), deriv(multiplicand(exp), variable)), make_product(deriv(multiplier(exp), variable), multiplicand(exp))) // more rules can be added here : Error("unknown expression type in deriv", exp); }
deriv(list("*", list("*", "x", "y"), list("+", "x", 4)), "x"); // [ "+", // [["*", [["*", ["x", ["y", null]]], // [["+", [1, [0, null]]], null]]], // [["*", // [["+", // [["*", ["x", [0, null]]], // [["*", [1, ["y", null]]], null]]], // [["+", ["x", [4, null]]], null] ] ], // null ]]]
type tagof the datum is the algebraic operator symbol (such as +) and the operation being performed is deriv. We can transform this program into data-directed style by rewriting the basic derivative function as
function deriv(exp, variable) { return is_number(exp) ? 0 : is_variable(exp) ? (is_same_variable(exp, variable) ? 1 : 0) : get("deriv", operator(exp))(operands(exp), variable); } function operator(exp) { return head(exp); } function operands(exp) { return tail(exp); }
get(operator(exp), "deriv")(operands(exp), variable);
typekeys in the operator table. For numbers and variables, there aren't such obvious keys, although we could introduce names for those types of expressions, as well, if we change the way expressions are represented as lists.
function deriv_sum(operands, variable) { return make_sum(deriv(addend(operands), variable), deriv(augend(operands), variable)); } function deriv_product(operands, variable) { return make_sum(make_product(multiplier(operands), deriv(multiplicand(operands), variable)), make_product(deriv(multiplier( operands), variable), multiplicand(operands))); } function install_deriv() { put("deriv", "+", deriv_sum); put("deriv", "*", deriv_product); return "done"; } install_deriv();
function deriv_exponentiation(operands, variable) { const bas = base(operands); const exp = exponent(operands); return make_product(exp, make_product(make_exponentiation(bas, make_sum(exp, -1)), deriv(bas, variable))); } function install_exponentiation_extension() { put("deriv", "**", deriv_exponentiation); return "done"; } install_exponentiation_extension();
get(operator(exp), "deriv")(operands(exp), variable);
put("+", "deriv", deriv_sum); put("*", "deriv", deriv_product); put("**", "deriv", deriv_exponentiation);
function make_insatiable_file(division, file) { return pair(division, file); } function insatiable_file_division(insatiable_file) { return head(insatiable_file); } function insatiable_file_content(insatiable_file) { return tail(insatiable_file); } function get_record(employee_name, insatiable_file) { const the_division = insatiable_file_division(insatiable_file); const division_record = get("get_record", the_division) (employee_name, insatiable_file_content( insatiable_file); return record !== undefined ? attach_tag(the_division, division_record) : undefined; }
function make_insatiable_record(division, record) { return pair(division, record); } function insatiable_record_division(insatiable_record) { return head(insatiable_record); } function insatiable_record_content(insatiable_record) { return tail(insatiable_record); } function get_salary(insatiable_record) { const the_division = insatiable_record_division(insatiable_record); return get("get_salary", the_division) (insatiable_record_content); }
function find_employee_record(employee_name, personnel_files) { if (is_null(personell_files)) { return undefined; } else { const insatiable_record = get_record(employee_name, head(personell_files)); return insatiable_record !== undefined ? insatiable_record : find_employee_record(employee_name, tail(personell_files)); } }
destructiveoperation—similar to the extension of operations tables—in that the data structure is permanently and irrevocably modified; section 3.3 explains this concept in detail.
The key idea of data-directed programming is to handle generic operations in programs by dealing explicitly with operation-and-type tables, such as the table in figure 2.22. The style of programming we used in section 2.4.2 organized the required dispatching on type by having each operation take care of its own dispatching. In effect, this decomposes the operation-and-type table into rows, with each generic operation function representing a row of the table.
An alternative implementation strategy is to decompose the table into
columns and, instead of using intelligent operations
that dispatch
on data types, to work with intelligent data objects
that dispatch
on operation names. We can do this by arranging things so that a data
object, such as a rectangular number, is represented as a
function
that takes as input the required operation name and performs the
operation indicated. In such a discipline, make_from_real_imag
could be written as
function make_from_real_imag(x, y) { function dispatch(op) { return op === "real_part" ? x : op === "imag_part" ? y : op === "magnitude" ? math_sqrt(square(x) + square(y)) : op === "angle" ? math_atan(y, x) : Error("Unknown op in make_from_real_imag", op); } return dispatch; }
function apply_generic(op, arg) { return head(arg)(op); }
This style of programming is called message passing. The name
comes from the image that a data object is an entity that receives the
requested operation name as a message.
We have already seen an
example of message passing in section 2.1.3, where we saw
how pair, head, and tail could be defined with no data
objects but only
functions. Here we see that message passing is not
a mathematical trick but a useful technique for organizing systems
with generic operations. In the remainder of this chapter we will
continue to use data-directed programming, rather than message
passing, to discuss generic arithmetic operations. In chapter 3 we
will return to message passing, and we will see that it can be a
powerful tool for structuring simulation programs.
function make_from_mag_ang(r, a) { function dispatch(op) { return op === "real_part" ? r * math_cos(a) : op === "imag_part" ? r * math_sin(a) : op === "magnitude" ? r : op === "angle" ? a : Error("Unknown op in make_from_real_imag", op); } return dispatch; }
installlibraries for each new type. We can also have
installlibraries for new operations.
apply(sum_of_squares, list(1, 3))