http://www.spinellis.gr/pubs/jrnl/2001-SPANDE-VUFC/html/vufc.html This is an HTML rendering of a working paper draft that led to a publication. The publication should always be cited in preference to this draft using the following reference:
|
Diomidis Spinellis
Department of Management Science and Technology
Athens University of Economics and Business
Patision 76, GR-104 34 Athens, Greece
email: dds@aueb.gr
September, 2001
A visual programming language can be informally defined as a programming language with a syntax that includes visual expressions such as diagrams, free-hand sketches, icons, or graphical manipulations [1]. Visual programming approaches aim towards easing the programming learning curve or enhancing programming productivity. Their adoption was based on a number of premises [2]:
However, empirical studies did not find visual programming inherently superior to text-based programs; the extent to which a given notation is suitable for expressing a particular task depends on the context in which the language is employed [3,4,5]. Early work on visual programming, although promising when applied to ``toy'' problems, ran into difficulties when the methods were tried on problems of realistic size. Two approaches for alleviating this problem were followed: a number of researchers applied visual programming languages to limited or domain-specific parts of software development such as GUI programming, the graphic depiction of data structure behaviour, or the combination of textually programmed units to build new programs [1]. Others, proposed the design of visual programming languages based on formalisms used by existing, standard, component-based visual engineering languages such as the ITU-T MSC standard for message sequence charts, or the IEC-1131 standard for function block languages [6].
In our approach we capitalise on the strength of existing GUI-builder IDEs by crafting industry-standard software components that can be used within the GUI-designer of the IDE to perform data-flow-oriented visual programming. Although we demonstrate our method in the context of a specific domain (the visual composition of data-flow pipelines), the same underlying principles can be applied to different visual programming domains.
We define a software component as a unit of composition with contractually specified interfaces and explicit context dependencies only; one that can be deployed independently and is subject to third-party composition [7]. Components, in common with objects, encapsulate state, allow access to it through separately described interfaces, and support modular design based on separation of concerns. However, components differ from objects in a number of ways: they can be implemented in different languages, they are often packaged in binary containers, they can encapsulate multiple objects, and are typically more robustly packaged than objects [8]. Components that support visual composition implement a set of interfaces defined by the visual programming environment that supports their use. These interfaces allow the visual placement of components on forms, the handling of user-input events, and the persistent specification of their properties at program design time. Two widespread families of visual components are JavaBeans [9] and ActiveX controls [10].
Visual components are supported by a number of IDEs. Typical examples are Visual Basic (VB), Delphi, JBuilder, Latte, and Visual Café. Although ActiveX controls, JavaBeans, and their respective environments are marketed as ``visual programming'' technologies, in practice, their visual dimension is limited to the design and implementation of graphical user interfaces (GUIs). A number of systems such as Khoros/Cantata [11,12] and LabVIEW [13] support component-based visual programming - also referred to as coarse-grained visual programming - using specialised components and a corresponding dedicated development environment. Now, the availability of sophisticated GUI-builder IDEs and the respective component development frameworks provides us with an alternative approach for implementing coarse-grained visual programming: one based on widely deployed, industry-standard platforms.
In the following sections we present how specially-designed reflective components can be used in an industry-standard visual programming environment to visually specify sophisticated data transformation pipelines that interact with GUI elements. The remainder of this paper is structured as follows: Section * discusses our selection of components and their execution environment; Section * presents how the controls we have implemented support visual programming; in Section * we analyse the design and implementation of the visual controls, in Section * we present how the control creation process can be automated, in Section * we describe an exemplar application utilising visual programming controls, and in Section * we evaluate the approach we propose. Section * concludes the paper with directions for further work.
The elements of our approach: visual programming, component-based development, a graphical front-end to Unix tools, data-flow visual languages, and pipe and filter architectures have been extensively studied. See for example the references [2,14,1,15,6] (discussing visual programming and providing examples of specific approaches), [16,17,18,3] (discussing graphical Unix-tool front ends), [19,20,21,7,22] (discussing component-based development), [17,23,24,25,26,27] (discussing visual data-flow approaches), and [28,29] (discussing pipe and filter architectures). The main contributions of this paper are: the proposal to use standard GUI builders as visual programming environments by means of specially crafted reflective components, the idea that GUI builders can be used to configure sophisticated component interactions and deployment scenarios, and the demonstration of visual programming and GUI interfacing utilising the well-known Unix text processing tools encapsulated as ActiveX components.
A visual component (an ActiveX control or a JavaBean) is characterised by properties, events, and methods. Properties affect the appearance (e.g. the background colour), functionality, or state of the control or bean. Events are notifications sent by the controls or beans to let programs using them know that a specific action (such as a mouse click) has occurred. Methods are other functions that can be called from within the program that hosts the control or bean to perform some processing. By packaging Unix tools as visual components we gain a number of benefits:
We decided to package the Unix tools as ActiveX controls, based on our previous experience with packaging them as COM (Component Object Model) objects and the availability of framework support for efficient control implementation. ActiveX controls can be used in a number of environments on Microsoft-Intel platforms such as the VB IDE, and the Internet Explorer. Although a JavaBeans-based implementation could have resulted in highly portable components, this did not apply in our case since most Unix tools are implemented in C and will not benefit from the Java binary-level portability. Our approach however, is not Microsoft-specific; the ideas we present can be employed in any environment that supports visual components with basic reflective capabilities.
Our approach is based on three types of visual components:
The way VUFC components are visualised on the IDE design canvas is shown in Figure 1. The T (pipe split) component and the controls on its left move data in the right-to-left direction; they are the only ones where the port assignment properties had to be manually specified.
We created the visual components in a two-step process. We first mined and packaged Unix tools, connectors, and glue as (non visual) COM components. We then packaged the COM components as ActiveX controls. The architecture of the resulting components is illustrated in Figure 2.
Implementing COM components from scratch in C++ is not trivial. Every component, in addition to its custom functionality, must support registration, an interface for creating component instances called IClassFactory, object creation, reference counting, the QueryInterface method, and, possibly, dual interfaces for supporting its use through C++ and automation-based scripting languages. Fortunately, these tasks are supported by the Microsoft Foundation Classes (MFC), a large, monolithic application framework for programming in Microsoft Windows, and by the Active Template Library (ATL) a leaner set of template-based classes that specifically target the development of COM components. We implemented the mined components using ATL. By aggressively utilising C++ templates and multiple inheritance, ATL supports the development of COM components with brevity and minimal runtime overhead. A bare-bones ATL-based COM component can be implemented in less than 100 lines; most of them automatically generated by a ``wizard''-type tool. We therefore found ATL to be ideal for implementing the large number of Unix-mined components and used it as a basis to automate the task. The full details of this operation are described in reference [32].
The second step of the visual component implementation involved the addition of visual component functionality to extend COM objects to fully fledged ActiveX controls. We initially attempted to provide this functionality by extending the ATL-based implementation of each component, but were quickly overwhelmed by the complexity of the task. We found that the addition of a single property to a control required non-trivial code and data additions to a number of C++, IDL (interface definition language), and header files. More importantly, more than 100 lines of inscrutable and unmaintainable code were needed to provide the, critical for our needs, functionality of neighbour control enumeration. We then experimented with the ActiveX control creation facilities of the VB IDE and found the operation reasonably streamlined and programmer-friendly. We retained the ATL-based components as a basis for our work, because their implementation was based on a number of system-level facilities (such as threads, pipes, non-blocking I/O, and handle cloning) that are not available in the VB environment. Our visual component architecture therefore involves wrapping each COM component within an ActiveX visual control, and, one additional co-ordinating component implemented from scratch in VB.
The stability of our pre-packaged COM components, in conjunction with the quick edit-run cycle, the easy access to component properties, and the high-level facilities for graphics programming provided by VB allowed us to implement the full functionality we required in a fraction of the time of the previous ATL-based aborted attempt.
The ability of the components to connect themselves together, depending on their location on the IDE design canvas, is based on their reflective capabilities i.e. their ability to examine their and their neighbours location.
The foundation for the concept of reflection is Smith's reflection hypothesis [33], which states that reflective programs can access, reason about, or alter their interpretation. In our case, ActiveX controls have properties, available both at design and at run time, that can be used to access their location and enumerate the other controls that exist on the same canvas. We exploit this reflective capability to link the visual representation of the controls with their operational behaviour.
All visual components support a number of common properties and methods and some component-specific properties as shown in Figure 3. The component-specific properties are used to provide details on how the control will perform its processing (e.g. MergeOnly, FoldCase, Reverse) and (in exceptional cases) to specify the visual flow of data (InPort, OutPort). The common properties (Height, Left, Width, Right, Enabled, Name, Visible) are provided and implemented by the VB IDE and are needed to support the visual dimension and programmability of the controls. These properties include the location, dimension, and visibility of each control. The common methods (Visualize(), Prepare(), Plumb(), Run(), Execution()) form an interface that all visual components in our framework must provide. They are used by the co-ordinating control to orchestrate the visualisation and interconnection of the visual control setup as a network of co-operating processes. The methods of this interface work as follows:
command "sort" options { MergeOnly:bool:-m:False:Merge already sorted files, do not sort CheckOnly:bool:-c:False:Check if given files already sorted, do not sort Month:bool:-M:False:Compare (unknown) < `JAN' < ... < `DEC', imply IgnoreBlanks IgnoreBlanks:bool:-b:False:Ignore leading blanks in sort fields or keys TmpDirectory:string:-T:"":Use specified directory for temporary files, not $TMPDIR or /tmp FoldCase:bool:-f:False:Fold lower case to upper case characters in keys Alphanumeric:bool:-d:False:Consider only [a-zA-Z0-9 ] characters in keys ASCII:bool:-i:False:Consider only [\040-\0176] characters in keys NumericCompare:bool:-n:False:Compare according to string numerical value, imply IgnoreBlanks OutputFile:string:-o:"":Write result on file instead of standard output Reverse:bool:-r:False:Reverse the result of comparisons StableSort:bool:-s:False:Stabilize sort by disabling last resort comparison FieldSeparator:string:-t:"":Use separator instead of non- to whitespace transition Unique:bool:-u:False:Only output the first of an equal sequence (with CheckOnly check for strict ordering) }
Although the plethora of available filter-style tools provided us with a rich selection of candidates for encapsulation as visual controls, it made us realise that the act of encapsulation is a labour intensive and time consuming task. Following an earlier experience in automating packaging of Unix tools as COM objects [32] we defined a process and implemented support tools for automating the creation of filter-style VUFC controls. Specifically, for every filter that is to be converted into a control, one has to define the syntax and semantics of the tool's command-line options using a small domain-specific language [34]. An example of this description for the sort filter is depicted in Figure 4. For every filter command line option (e.g. -r) one specifies:
A small compiler, implemented in Perl [35], compiles the declarative description of the filter interface into a VB control definition source file that implements the respective component (e.g. VUFCsort). The code contains:
The component also inherits (due to VB limitations, by source code reuse) and exposes as properties the common methods and properties of the standard VUFC interface. An example of how the properties of the automatically created VUFCsort component appear in the VB IDE can be seen in Figure 5. Connector and glue-type components still need to be written by hand, but the effort required to implement them is only a small part of the effort that would be required to repackage the large number of filter-style programs without an automated process.
To demonstrate the viability of our approach we have designed and implemented a simple GUI-based spell-checker based on a pipeline of ActiveX controls. Figure 6 depicts the UML component interaction diagram of the spell-checker. The text to be spell-checked is retrieved from the GUI edit box using a text box source glue component. It is transformed into a list of words using the translate component that is a direct equivalent of the Unix tr command. The word list is then transformed into a sorted list of unique words using the sort and unique components that correspond to the Unix sort and uniq commands. Finally, the sorted stream of words to be spell-checked and a dictionary of all acceptable words are processed by common - derived from the Unix comm command - that outputs the errors: a list of words contained in the first stream and not contained in the second one. This stream of misspelled words is cloned into two streams using T the equivalent of the Unix tee command. One stream is sent, using the list box sink glue component, to a GUI list box. The other is passed to the wordcount component to count the number of words contained in it and then, via an edit box sink, to a GUI text box. It is important to note that the integration of GUI elements as parts of the pipeline and the cloning of a data stream can not be implemented using the standard Unix linear pipeline system.
We implemented the GUI-based spelling checker using VUFC following the design we outlined. The implementation consisted of:
The implementation and resultant user-interface are depicted in Figure 7. The visual components and their interconnections need not be visible to the end-user; they are included in the figure to demonstrate the pronounced similarity between the design of the spelling checker and its visual implementation.
Compared to a spelling checker implemented using a linear pipeline in the Unix environment, our component-based implementation offers the following enhancements:
In addition, the application was implemented using a typed and modular language in a rich integrated development environment offering a purely graphical method for interconnecting the components, a syntax-aware editor, a sophisticated debug facility, a graphical interface builder, integrated help facilities, and source-code management. Third-party tools also provide support for profiling, automated source code examination, and browsing facilities. This level of support is not existent in Unix-based shell-programming approaches and very difficult to obtain in visual programming environments implemented from scratch.
Modern IDEs provide a mature development environment that enhances programmer productivity in a number of ways. As an example, Visual Basic provides a sophisticated graphical editor, a rich mechanism for specifying component properties (categorised by function, alphabetically, or grouped on a form; all with two levels of help material), a syntax-aware program editor (with keyword colouring, real-time function argument prompting and checking, and multiple levels of undo), a debugger, and an integrated object browser. In addition, programmers (including those relying on visual programming components) profit from the facilities provided for interacting with the environment. Specifically, environments based on Java 2 or Microsoft's .net architecture offer classes supporting most common GUI elements, database connectivity, major networking protocols, email and HTTP transactions, multimedia data, localisation and internationalisation, object serialisation, directory services, and distributed computations.
Furthermore, major existing IDEs benefit from a large installed user base that in turn results in programmer familiarity with the environment, wide choice and availability of supplementary documentation, professional magazines, conference and exhibitions, consulting services, and training courses. The large user base has also resulted in the blossoming of an add-on industry providing components and tools. An illustrative case is Component Source - a major commercial vendor of ActiveX components - that provides over 2300 components in its January 2001 CD. In addition many design, source code control, and testing tools can be seamlessly integrated into mainstream IDEs.
Finally, visual components targeting industry-standard IDEs can be crafted using existing rich component frameworks and technologies. In our case we were able to use Unix components already encapsulated in a COM framework and experiment with two different technologies for creating ActiveX controls, one based on the C++ Active Template Library (ATL), and one based on the ability of Visual Basic to create ActiveX controls.
However, the approach we propose is not without problems. The most important potential problem comes from the clash with the underlying programming model supported by the IDE (e.g. Visual Basic in our case) that can make it difficult to express the semantics of the combined system or reason about the resulting artefact. In the specific application we described, the data-flow paradigm of the visual language is orthogonal to the native imperative / object-oriented paradigm supported by Visual Basic. Had we developed visual programming components for expressing imperative constructs such as assignments and loops the interactions between the two could potentially become formidable sources of confusion.
In addition, the editor used for the GUI-design is not optimised for visual programming. As an example, components with common semantic (as opposed to topological) properties can not be selected and manipulated as a group while maintaining their visual connections, nor does the editor provide a way to draw arbitrary lines between components. At a deeper level, as the IDE is not designed for visual programming notions such as modularity, encapsulation, and abstraction are not inherently available in a visual form. They can sometimes be accommodated by utilising GUI elements such as forms and frames, forcing however an unnatural expression style. Similarly, the connection between visual elements located on different forms can not be expressed in a visual way, but has to rely on textual representations such as object name bindings.
Finally, the specific implementation we have demonstrated (the visual design of Unix-style pipelines) has some unique restrictions, that are not however limitations of the approach we propose. The data passed between the components consists of strings, while other visual programming systems like Khoros/Cantata, AVS/Express, and LabVIEW allow for complex structures to be passed between components. Although in theory pipelines can be used to pass binary data and complex structures, Unix systems customarily pass records delimited by newlines and delimit fields by whitespace or another special character. This practice is not overly restrictive: it has served admirably many data processing tasks [36]; furthermore, more complex data structures can always be expressed as character strings using XML. One additional restriction of our implementation might appear to be the absence of feedback loops, again a feature of the visual programming systems mentioned above. While the topologies supported by our implementation can express loops, most Unix filters only accept a single input source - a design choice influenced by the more restrictive linear topology of the pipelines that can be expressed in the common Unix shells - and can not therefore be easily connected into a loop structure.
In the form implemented VUFC can be used for production work, but can also be extended and improved in a number of ways. Currently the user is responsible for the correct layout of the visual components. An interesting enhancement would be the provision of a verification step after the pipeline plumbing phase to detect obvious errors in port connections. This verification could be based on a type system for component ports [37,38] preferably providing edit-time feedback using a static type inference system [39]. A related improvement concerns the topologies that can be implemented. Some existing filters accept only files for input and output and not arbitrary data streams. We are experimenting with the use of named pipes to equip such filters with connectable streams.
The most significant future work will however involve the exploration and extension of the visual part of the functionality. As an example, the operation of the pipeline at runtime can be visualised by colouring the components to indicate their level of activity. Such a facility can be used for debugging or diagnostic purposes. In addition, more sophisticated connector components can be designed to provide greater expressive freedom to the visual designer. Passive connectors could be used to propagate port assignments across non-rectangular regions, while split connectors could be used for functionally dividing a complex design into multiple forms in a manner analogous to the current practice in multi-sheet electronic circuit diagrams. Finally, the application domain of our approach can be extended in a number of directions; pipelines are not the only programming artefact that can be efficiently expressed in a visual way. Component deployment in distributed applications is one obvious target; others surely exist and wait to be explored.