http://www.spinellis.gr/pubs/jrnl/2001-SIGPLAN-access/html/phobia.html This is an HTML rendering of a working paper draft that led to a publication. The publication should always be cited in preference to this draft using the following reference:
|
Diomidis Spinellis
Department of Management Science and Technology
Athens University of Economics and Business
Patision 76, GR-104 34 Athens, Greece
email: dds@aueb.gr
In the following sections we outline problems associated with this practice and propose a modest language extension to support an alternative programming style.
Consider a typical object definition:
class Point { private: double x; // X coordinate double y; // Y coordinate public: // Return X coordinate double GetX(void) { return (x); } // Set X coordinate void SetX(double val) { x = val; } // ... }Some code style formatting conventions dictate at least 13 lines of code for every object field; the style followed by the authors of Java would require 11 lines of code, while the compact style found in the canonical C++ description [Str97] would require two additional lines of code for every field. In C++, to make the code available for efficient inline expansion, the code is placed in a program area with a very high real-estate value: the class declaration. This is the place where programmers will look to find a class's interface and contractual obligations. The class declaration is also the code that will be included by all files that use this class. Regrettably, we populate this valuable real-estate with trivia that both programmers and compilers could do without. In large, carefully-crafted systems, the number of accessor methods can be significant. As an example, Rose and Rose [RR00] calculated that accessor methods used to obtain the value of a field (getter methods) account for about one fourth of the total number of methods in the standard Java ``java.*'' package source classes.
To add insult to injury, the definition of these accessor methods, forces us to abandon the assignment and field access syntax provided by the language in favour of less natural forms of expression. Thus, instead of writing:
Point a; a.x = a.y = 0; screen.Moveto(a.x, a.y);we have to write:
Point a; a.SetX(0); a.SetY(0); screen.Moveto(a.GetX(), a.GetY());Although in pure object-oriented languages, such as Smalltalk, the use of messages for accessing and modifying fields is natural, in languages supporting direct access to fields, such as C++ and Java, accessor methods just hinder code readability.
We believe that the amount of code bloat and the unnatural expression style that result from the definition and use of accessor methods justify a small language extension to provide the accessor method functionality in a more friendly fashion. A syntactic construct to express both a method and a field access in the same way is available in Ada, Eiffel, and Microsoft's C# [Lar01]. Components built around Microsoft's ActiveX technology [JGHJ96] also provide this functionality in the form of set and get methods that are then transparently mapped to field access in languages such as Visual Basic. In addition, the proposal in reference [RR00] integrates field access into the Java type system to eliminate the accessor methods used to enforce read-only access.
We propose an addition to the C++ language definition that transparently maps plain field access into accessor methods-when such methods are provided; a similar change could also be incorporated into Java when the language is extended to support some form of operator overloading. With the change we propose fields are typically created as public. When implementation or interface changes dictate a different realisation, suitable accessor methods can be written to bind the new implementation to the rest of the system without other code modifications. In addition, object properties that are typically accessed through a get/set interface can be made to appear as fields promoting a more consistent expression style.
operator-function-id: | |
operator operator | |
operator. id-expression |
By the above definition the sequence operator.id-expression is now a legitimate operator-function-id that can be overloaded to create accessor methods for pseudo-fields. Following the declaration of such accessor methods the sequence .id-expression can be legitimately applied to a postfix-expression as a shortcut for calling the accessor methods defined with the given operator-function-id name.
To provide read access to a given pseudo-field a of type T a method with signature ``T operator.a()'' needs to be defined; correspondingly for write access to the pseudo-field a of type T a method with signature ``T operator.a(T)'' must be defined. To enforce read-only access, the second method is omitted; for write-only access the first method is omitted, while the return type of the second method becomes void. For reasons that have to do with the efficient implementation of pseudo-field pointers no other overloading based on these method arguments is allowed. In the form they are given they provide the default behaviour of a real object field.
Following these extensions we can define accessor methods for pseudo-fields as in the following example:
// Now defined in polar coordinates class Point { public: // Public since we can define pseudo-fields double angle; double distance; // Pseudo fields for Cartesian implementation // x pseudo-field double operator.x(double x); // Set double operator.x(); // Get // y pseudo-field double operator.y(double y); // Set double operator.y(); // Get } f() { Point a; // Will map to (a).operator.x(0) a.x = 0; a.y = 0; // Will map to (a).operator.x(), ... screen.Moveto(a.x, a.y); //...
The semantics and the implementation of the new accessor methods are determined by performing the equivalent of the following source-to-source transformation:
(a).operator.b(c)
(a).operator.b()
Pointers to fields can be accommodated at the expense of an additional level of memory indirection for reading or writing field values through pointers. Since the same field pointer can be used in both lvalue and other contexts, for a class C, a pointer ap to a pseudo-field a of type T must provide information for both accessor functions aget (pointer to T operator.a()) and aset (pointer to T operator.a(T)). In addition, the same pointer should also allow accessing fields that do not have any defined accessor methods. This second requirement can be trivially satisfied by having the compiler internally generate the accessor methods needed for all of the class's N fields. A two-dimensional table V[N][2] is then generated, with each row containing pointers to the respective field's aget and aset methods. A pointer to a field is the offset of that field's accessor method pointers within the table. Every time a field is accessed through that pointer as an lvalue, the pointer to aget is obtained from the first table column; in all other cases the pointer to aset is obtained from the second table column. Note that the field access overhead is only incurred when fields of classes containing overloaded accessor methods are accessed through field pointers.