Special Edition Using Microsoft® Visual Studio for Enterprise Development

Previous chapterNext chapterContents


- 20 -
Clients, Servers, and Components: Design Strategies for Distributed Applications

by Larry Millett

View an application as a set of services provided by interacting components. This paradigm transcends two-tier, three-tier, or multitiered implementation architectures.
Although all applications must fulfill specific business requirements, distributed applications as a class have several common design objectives.
Identify common constraints that limit design choices in a distributed application.
Review the three basic strategies for concurrent processing: parallel, pipeline, and asynchronous.
Compare the traditional two-tier client/server approach with the three-tier approach. Consider when each model might be most appropriate.
Read about a strategy for designing distributed applications.

Every year on the Fourth of July, the Boston Pops Orchestra plays a free concert at the Esplanade, culminating in the 1812 Overture. Tchaikovsky's score includes church bells and cannons, rendered by a number of churches in Boston and an artillery detachment at Bunker Hill. Because of their distance from the Esplanade, the cannons must fire and the bells must ring several seconds before the orchestra reaches that point in the score. In fact, because the churches and guns are at varying distances, each is subject to a different delay. Gun crews and bell ringers watch the orchestra on television, and when the performance reaches a predetermined point, each plays their part. Listeners at the Esplanade hear bells ringing all together and cannons firing right on cue with the orchestra. You can consider the 1812 Overture a distributed application, with the orchestra, the artillery, and each belfry as components. As is typical with most distributed software, success depends greatly on component interaction. It all works well together because the constraints are simple (the tempo of the performance, the speed of sound, and the relative distances) and roles and interactions are well defined. The bells and cannons do not try to play in time or in close harmony with the orchestra; rather, they aim for subsecond accuracy in their timing. So long as each component performs well and the timing is correct, the bells and cannons produce a delightfully bombastic effect.

Like the 1812 Overture, a distributed application integrates the actions of many components. Distributed application design must focus not only on the details of individual parts, but also on making the distributed components work smoothly together in concert.

This chapter begins with a brief review of design objectives for a distributed system and a review of constraints that sometimes make the objectives difficult to achieve. The balance of the chapter presents design strategies.

Design Objectives for a Distributed Application

A properly designed application fulfills a well-defined business need. In addition, the best distributed applications exhibit several desirable features:

Achieving all these properties for any application requires substantial application of talent, discipline, and experience throughout the development process. Achieving these properties in a distributed application requires careful and sophisticated design, with particular attention to the interactions between components.

The following sections discuss these design objectives in more detail.

Performance

Application performance is the number one criterion for system quality. System design, architecture, and implementation may be flawless, but if users spend too much time waiting for results, they will be dissatisfied. Applications may perform quite differently in production than in a development environment, due to different network configurations and loads.

It's important to identify specific criteria for application performance (usually in a requirements document). Without objective criteria, performance will be judged subjectively. Important criteria include response time (time to complete one operation), apparent response time (time a user must wait after executing an operation), and throughput (data processed per unit time). Ideally, separate performance criteria should address the best case (no other application active on the network, single user) and the worst case (very heavy network and application loads, many users). Performance criteria should always be stated for the common case (typical application and network loads).

Several studies have shown that people perceive response time under one-fifth of a second as instantaneous. Studies also show that response time over one second can have a negative effect on attitude and productivity. Sometimes, application constraints make subsecond response time simply unattainable; often, asynchronous processing can improve apparent response time.

Overall performance for a distributed application depends as much on component interactions as on individual component performance. Communication delays and resource sharing will require special attention.

Efficiency

A distributed application typically uses resources on several computers, as well as network resources. Each individual component must use local resources effectively; the application as a whole must use network resources frugally. Ultimately, inefficiency will manifest as poor performance or poor scalability.

It's important to pursue the right efficiencies. Processor time and memory are cheap and abundant; bandwidth from Europe to North America is typically scarce and expensive. The anticipated growth path for the application will motivate tradeoffs.

Scalability

For a successful application deployed in a successful business, processing demands--number of users, transaction volume, database size--tend to increase. However, this volume typically grows chaotically. A well-designed application grows in an orderly manner to accommodate disorderly growth in demand. The growth plan may include deploying additional instances of some components, adding additional database storage, or redistributing processing tasks.

The best applications also scale down for cost-effective use by organizations with smaller processing demands (and fewer resources). For example, an order processing system designed for a corporate headquarters might use SQL Server for data management; branch offices might use a scaled-down version with a FoxPro database.

It's important to understand the relationship between resource usage and processing volume. Ideally, resource requirements should increase linearly with volume: If volume is n and usage for a particular resource is r, you would like to have r = kn, for some constant k. If for some resources r = kn2 or worse, scalability is at risk.


NOTE: For some applications, linear resource usage (r = kn) is inherently unattainable. In those cases, load distribution through parallel processing (discussed later in this chapter) will improve scalability. If the processing load is divided among three instances of the application, resource usage for the three instances (a2 + b2 + c2) will be less than resource usage for a single instance (a + b + c)2.

Security

Because a distributed application crosses system boundaries, security becomes more important and more difficult. Generally, each component will execute under a local account context. Remote components, however, need to connect and be provided or denied services as appro-priate.

When a distributed application runs entirely within a Windows NT network, Windows NT's integrated security is the best solution. More comprehensive security solutions may include certificate servers or the distributed security service of Microsoft Transaction Server.


ON THE WEB:You can find information on programming with the Microsoft Certificate Server at http://premium.microsoft.com/msdn/library/sdkdoc/appprog_8vjm.htm.

Fault Tolerance

Components in a distributed application must cope not only with local failures, but also with failures in remote components. The number of potential errors mounts up in a frighteningly exponential way. A well-designed distributed application must degrade gracefully in the event of errors; the best distributed applications automatically recover. A well-designed component manages the effects of a fault locally, rather than propagate the error to other parts of a system.

For efficient administration, a distributed system should pursue a consistent error notification strategy. For example, several components might share a common log file. The Windows NT Event Log is another excellent resource for error notification.


TIP: Microsoft Visual Basic 5.0 (VB5) includes a new feature for easy insertion of entries to the Windows NT Application Log: the LogEvent method of the App object. However, App.LogEvent only works in a compiled application; in debug mode, it will appear to have no effect.
Search VB5 help for the LogEvent method of the App object.


NOTE: For more information about the Windows NT Event Logs, see Special Edition Using Microsoft BackOffice, Volume 1, from Que, ISBN 0-7897-1142-7, Chapter 5, "Checking the Logs," on p. 151.

Verifiability

Testing a distributed system is far more complex than testing a monolithic application. Effective quality assurance (QA) requires participation from the earliest stages of the development process. Effective testing requires a parallel test environment that matches as closely as possible the production environment.

The most important design features for verifiability are clear separation of services and well-documented component interfaces. Separation of services is discussed later in the section "Designing a Distributed Application." Microsoft's Visual Basic help files provide a good example of well-documented interfaces. For example, see the documentation for the TextBox control.


TIP: Quality assurance (QA) for distributed applications is more complex than for traditional applications. QA staff should participate in the application design process so that they can develop an effective application test plan.

Maintainability

Changes in business result in changes to software, and the pace is accelerating. In a well- designed application, a change to a business requirement results in a change to a single component. To achieve this goal, however, you must encapsulate business logic in middle-tier components.

A complete design must address initial system deployment, ongoing operation, and distribution of updates. As components become more widely distributed, update distribution becomes more difficult to coordinate.

A component-based design is one of the best approaches for long-term maintainability. See the discussion of "The Services Paradigm" later in this chapter.

Design Constraints for a Distributed Application

Constraints are factors in the application environment that limit design choices. It's important to address these constraints in the application design. The 1812 Overture operated under constraints of geographic distribution, subsecond timing, and the speed of sound. Distributed software must operate under a number of common constraints:

Take a look at each type of constraint in more detail.

Platform

Typically, components of a distributed application must run on a variety of existing hardware under a number of existing operating systems connected by a patchwork of networks. Each component must perform well locally and interact correctly with remote components.

Target platform constraints also include support (or lack thereof) for interprocess communication (IPC), remote procedure calls (RPC), and object request broker (ORB) services. IPC provides local communication between applications running on a single computer; RPC allows an application running on one computer to interact with an application running on another computer. ORB services include object allocation, deallocation, and invocation. For applications running on Windows NT 4.0 or later (or Windows 95 with an update), Microsoft's Component Object Model (COM) and Distributed COM (DCOM) provide all three services. In fact, DCOM provides a good degree of location transparency; in many cases, components can be deployed locally or remotely with no code changes required.

If some components must run on non-Windows platforms, some extra effort will be required. Basically, you will have three choices:


ON THE WEB:Detailed information on CORBA can be found on the Object Management Group's web site at www.omg.org.

Microsoft is working to expand the alternatives available for interaction with non-Windows components. This is one of the primary objectives for their Universal Data Access initiative. Technologies nearing release include Cedar (LU 6.2 interface to IMS and CICS transaction programs), OLE DB/DDM (access to VSAM and AS400 data sets) and Host Data Replicator (SQL Server replication to DB2).


ON THE WEB:For the latest information on Microsoft's Universal Data Access technologies, go to http://www.microsoft.com/data.

Bandwidth

A distributed application is, by definition, a communicating application. Good design must take into account the speed and reliability of available communications links. For different cases, available bandwidth may vary from a few bits per minute (voice interactive response) to a few megabits per second (function calls to an in-process DLL).

During application design, it's important to consider patterns of bandwidth availability on the target network. For example, gigabit Ethernet backbone may connect servers, whereas European users connect through a congested 56K connection. Some client machines might have a permanent network connection; others might connect infrequently by dial-up connections. It's especially important to consider bandwidth constraints when partitioning an application into components (see "Designing a Distributed Application" later in this chapter).

Resource Contention

Sharing resources between processes is one of the fundamental problems in computing. At the most basic level, processes compete for processor time, memory, disk, and network resources. At a higher level, shared resources include rows in database tables and services provided by other components.

Because database servers provide effective resource-sharing mechanisms, some developers tend to discount this problem in application design. However, resource contention can be an obstacle to scalability, so every effort must be made to minimize resource sharing. Where sharing cannot be eliminated, try to make it nonexclusive.

Availability

An application that must be available 24 hours a day, 7 days a week, without interruptions requires much more thorough design than a batch job system. Not all applications require long periods of uninterrupted availability; those that do present several special design challenges. First, some method must be found to maintain the system: apply patches and updates, install new devices, defragment disk space, and so on. Second, you must ensure that all resources required by the system (databases, routers, electricity, and so forth) are available on the same uninterruptible basis.

High-availability applications require attention to infrastructure details (backup power supply, redundant network paths), hardware details (redundant storage, hot standby, failover switches), and application features (automatic restart, administrator alert). The most important element in design of a high-availability application, however, is a careful analysis of possible failures and recovery options.

Audience

The audience for an application includes its users, its support staff, and its administrators. Design decisions must take into account the needs of each. For example, users might be working in a high-volume call center where extra seconds waiting for data may cost the enterprise thousands of dollars. This case requires special attention to response time. Consider also users' overall comfort level working with computers: A cashier in a shoe store has different expectations than a developer.

IT support staff require accurate, up-to-date user documentation. Trainers may also need access to requirements and design documents to develop a curriculum.

System administrators need to understand how to install, maintain, and operate the application. For example, they might need to maintain a remote installation by dial-up connections. This case requires attention to performance with limited network bandwidth. Administrators are often neglected when gathering application requirements; remember that the success of an application depends in large part on successful administration.

An application has at least three audiences: end users, support staff, and system administrators. End users require an appropriate user interface and snappy performance. Support staff require accurate user documentation. System administrators require compliance with security policies, predictable behavior, accurate system documentation, and straightforward installation.

Political

A distributed system often crosses organizational boundaries as well as system boundaries. Different organizations have different priorities and different expectations for the application. For example, an application developed in Department A may need access to a database owned by Department B. The application may be a crash priority for Department A but an annoying distraction for Department B. The project manager might win Department B's cooperation by developing a reusable component for looking up supplier information.

Compromises may be required in the use of existing components, design of reusable new components, or use of an enterprise data model. For example, the proposed database schema for a new application might duplicate some information already available in an enterprise database. Database administrators might insist on a modified schema. This may seem a compromise for the application at hand, but can ensure data consistency for the enterprise.

Political give and take is a soft skill often disdained by developers. Usually, it's simply a matter of listening carefully to other parties' priorities and finding a way to make your own priorities consistent. By showing respect for other points of view, you can gain respect for your own. Once you've earned some trust and respect, you can usually bypass political gamesmanship.

The Services Paradigm

The Component Object Model (COM) is fundamental to Microsoft's strategies for operating systems and tools. COM applications consist of services provided by components. A service is a group of related functions (an interface) that implements a business requirement; a component is an executable unit of software (EXE or DLL) that implements one or more services. Software design in this model consists of defining services and packaging them into components. Software development consists of building components and integrating them into applications. In documentation for Visual Studio 97 (VS97), Microsoft uses the terms services model and component-based interchangeably.

The services model provides an approach for turning business requirements into software components. Central to this approach is the idea of a three-tiered framework:

The three-tiered framework does not imply any physical implementation; a three-tier application may consist of seven components installed on two computers. Three-tier architecture is a modeling framework, a way of thinking about the services in an application. Grouping all three types of services into a single component results in a monolithic application. The traditional two-tier client/server approach generally does imply a physical implementation: a database component and a desktop component.

The services paradigm is particularly well suited to the development of distributed applications and can result in several benefits:

Implementing a service in one component simplifies maintenance when business requirements change. Code changes may often be limited to a single component. So long as the original interface for a service does not change, developers may add new functions or change the implementation of existing functions without breaking other components.

Because COM is a binary standard, it is language-neutral. So long as a component implements its services as advertised, it matters very little whether a component is built with Visual Basic (VB), Visual C++ (VC), Delphi, or Symantec C++.

The services approach generally results in components of a good granularity for reuse, particularly in the business services layer. Separating common business policies from application-specific user interactions makes the business services components more generic and better suited to reuse. Effective reuse, however, requires careful component design and developer awareness (developers seldom reuse code they don't know about). Also, a thriving ActiveX control market makes available a wide variety of shrink-wrap components.

Numerous studies have documented that complexity grows exponentially with the size of an application. The services paradigm results in smaller modules with well-defined interfaces. The reduced complexity of each module results in more effective quality assurance. Component reuse also reduces project risk.

Concurrent Processing Strategies

Concurrency can improve performance in a distributed application by putting multiple processors to work on a problem. Coordinating the processes requires some additional overhead, however, and adds design complexity. Few applications benefit from a degree of concurrency greater than about four processes. Still, even limited concurrency, judiciously applied, can greatly improve overall performance.

Two concurrency strategies improve throughput: parallel processing and pipelines. In parallel processing, multiple instances of the same application attack a common task. A pipelined application divides a task into stages and processes the different stages concurrently. The two strategies are quite compatible, often combined in processor designs. Intel's Pentium, for example, implements a five-stage pipeline architecture with parallel integer processing pipelines. Although a typical instruction requires several clock ticks to execute, pipelining allows the chip to complete one instruction at each clock cycle. The parallel architecture actually allows the Pentium to complete two instructions per clock cycle, under ideal conditions.

Asynchronous processing can be a very effective way to improve apparent user response time. The key is to recognize that response time often means time for a user to complete a task, rather than time for the system to complete a task. The general approach is for the user process to generate a request and place it into a queue for a separate background process. This type of processing is asynchronous because processes run concurrently without synchronization.

Parallel Processing

The classic example of parallel processing is matrix multiplication: A separate process computes each row in the result matrix (see Figure 20.1). This technique, known as vector processing, is the basis of a number of supercomputer designs.

FIG. 20.1
Matrix multiplication with parallel processes.

Parallel processing can work well when a problem decomposes into a number of virtually identical independent tasks. In fact, the general problem of executing computer programs subdivides into parallel tasks, a practice known as symmetric multiprocessing.

Parallel processing also works well for a problem that decomposes into a number of relatively independent tasks. Examples include background spell-checking in Microsoft Word and flight control systems where a number of parallel processes monitor different instruments.


NOTE: Concurrent processing adds overhead to application execution. When several processors are available (multiple CPUs in one machine, or multiple machines), the performance gain justifies the overhead. With a single processor, concurrent processing can still improve performance for I/O-intensive tasks.

In Visual Studio, Visual C++ provides the best support for multithreaded programming. However, concurrency can also be achieved by running multiple instances of an application, by using out-of-process ActiveX components, and by running components on multiple computers.


The chief design problems for parallel architectures are resource sharing and synchronization. Identical by definition, parallel tasks often require access to the same resources (memory locations, database rows, I/O streams). Coordinating use of these resources adds design complexity and execution overhead. As a side effect of resource sharing, execution times can vary for each parallel subtask. Synchronizing task completion and the start of each new task adds even more overhead and complexity.

Pipeline Processing

A pipeline works like an assembly line: Each stage brings a task nearer to completion and passes it to the next. Work proceeds concurrently on each stage. If the pipeline has n stages, and each stage takes one second, the first task will finish in n seconds. One additional task will complete every second thereafter. Figure 20.2 illustrates a pipelined application.

FIG. 20.2
A three-stage pipelined application in which throughput will be one task per second.

The chief design problem for pipeline architectures is dividing tasks into stages that require approximately equal processing time. Queuing theory states that for an n-stage pipeline where the longest time to process any stage is t seconds, total time to process one item will be nt seconds. If you define throughput as number of tasks completed per second, overall throughput for the pipeline is limited to the throughput of the slowest stage. Slower stages become bottlenecks and require additional synchronization to keep the pipeline operating smoothly. Often, parallel processing at one or more stages can improve throughput. Figure 20.3 illustrates a pipeline with one parallel stage. Stage 1 and Stage 3 each require one second per task; Stage 2 requires 2 seconds per task. Two parallel instances yield an effective throughput of one task per second for Stage 2.


NOTE: It's important to distinguish throughput from response time. For the pipeline in Figure 20.3, throughput (tasks completed per second) will be one task per second. However, response time (time to complete one task) will be four seconds.

One simple example of a parallel/pipeline architecture is a printer pool. Often, printing is the last stage in a series of document processing tasks. Printers are rather slow devices, and an efficient document processing pipeline might easily overwhelm a single printer. By attaching several printers in a pool so that jobs are arbitrarily assigned to the first available printer, you implement a parallel stage in a pipeline.

FIG. 20.3
A hybrid parallel/pipeline architecture.

Asynchronous Processing

The background printing feature of Microsoft Word is an excellent example of asynchronous processing. If a document prints at six pages per minute, a 30-page report would require five minutes. By spinning off the printing as a separate task, control returns to the user almost immediately.

The primary design problem for asynchronous processing is error handling. First, the background process has to notify the user process that an error has occurred. Second, and more problematic, by the time an error occurs, the state of the user process has usually changed significantly. In Microsoft Word, for example, a user might have made a substantial number of revisions since printing. It's generally impractical to return to the state at the time of the original request and retry. For a Word user, it's probably acceptable to just print again from the current state of the document.

For client/server applications, database access is an obvious opportunity for asynchronous processing. For example, a background query can retrieve needed data while a user works on a startup dialog. Avoid asynchronous database updates: If the update fails, it may be impossible to return the user to the pre-update context.

Client/Server Implementation Models

Distributed applications generally fall into two categories: cooperating peers or client/server. In the cooperating peers model, several instances of the same application cooperate to generate a result. In the client/server model, several distinct applications cooperate to generate a result. The client/server approach allows for efficient division of labor and has become the dominant model.

Within the client/server model, two approaches have become popular:

The Traditional (Two-Tier) Client/Server Model

In a traditional client/server application, a database server implements data management functions, and a client application implements other functions. Business logic often is divided between the client application and database-hosted stored procedures and triggers. For example, a report might use a stored procedure to perform currency conversions on data access and retrieval based on a conversion date computed in the report program. The business logic for determining the conversion date is implemented in the client program; business logic for looking up and applying the exchange rate is implemented in a stored procedure.

A traditional client/server model can be implemented fairly quickly because it requires less up-front design. User response time may also be better than in more complex client/server models. The two-tier model is a good choice for an application with a small number of users and a clearly limited scope.

Two-tier applications usually don't scale well to large numbers of users. Implementation of substantial business logic in stored procedures results in database contention as the number of users increases. Also, the client almost always requires a high bandwidth connection to the database server, so two-tier is not a good choice for remote users. Update distribution can be tricky, as changes in the client component must be carefully coordinated with changes in the server component.

The Three-Tier Model

In the three-tier model, a database server implements data management functions, a mid-tier application implements common business logic, and a presentation component provides a user interface. When several applications access the same data, it makes sense to encapsulate related business logic in a separate component. For example, a human resources (HR) department will probably have many applications that access personnel data. For all these applications, business policy might state that all HR staff have access to basic information about employees (name, address, department, supervisor, and so forth), but only supervisors have access to compensation and benefits information, and only managers have access to disciplinary information. A single mid-tier application can retrieve data from the database, apply the access limitations, and pass it along to client applications. As access policies change, only the mid-tier component will require maintenance.

Business data is often subject to integrity rules that cannot be enforced properly by a database server. For example, the database server could enforce the rule that a new order can be created only for an existing customer (a referential integrity constraint). However, the server could not efficiently enforce the rule that a customer with a 90-day past due balance may not place new orders. Although a developer could write an insert trigger or stored procedure to implement the rule, this approach has several shortcomings. First, a trigger would result in longer execution times for inserts, resulting in increased database contention. Second, a trigger or stored procedure enforces the rule after the user has filled in all the blanks to create an order. A better approach would be to encapsulate customer information, including available credit, into a mid-tier component for access by the user interface component. Then the application can check customer credit when the user begins to create an order.

Components that enforce business rules on data may require a high bandwidth connection to the database server where that data resides. Database response time is typically the largest factor by far in response time for the business object. Where the business rules component in turn serves many other components, those components should be located together on the same server, or on a group of servers with high bandwidth connections. Mid-tier components can also be used to overcome bandwidth limitations by caching data from the database. This approach works well with nonvolatile data.

Sometimes the middle tier will include services that are not strictly business services (graphics processing, transaction management, numerical analysis). Such applications are sometimes called multitier. The same design principles apply, however, so this chapter will not discuss multitier applications separately.

Three-tier applications generally scale much better than two-tier. Implementing business logic in a middle tier component rather than triggers or stored procedures greatly reduces the number of database queries, and thus reduces database contention. When database operations pass through a middle tier, user response time can suffer. However, while mid-tier initialization may slow the first operation, subsequent operations may be much faster. Also, not all database operations need to pass through a middle tier. For simple database lookups where the data is unlikely to be used again, direct access from the user interface to the database is appropriate.

Two-Tier Versus Three-Tier: An Example

Consider a sample application: consultant time-sheet submission. First, examine the components for a two-tier implementation. The database server might implement the following services:

The desktop component might implement the following services:

This division of labor capitalizes on the strengths of both the database server and the desktop platform. The design can be implemented fairly quickly by a single developer using Microsoft SQL Server and Visual Basic. Initial deployment should be relatively simple: install database components on the database server, and distribute an install kit to consultants for the desktop component. Subsequent updates, however, will require careful coordination of database changes with distribution of the modified desktop application.

As the business grows, however, one thousand consultants update timesheets within a three-hour period every Monday morning. The implementation of range checking and business policies in triggers results in longer-running updates and inserts and substantial database contention. Contention results in failed database operations (timeouts), resulting in retries, additional contention, more failed operations, and ultimately unhappy consultants. A hardware upgrade might improve the situation, but it's an expensive solution for a problem that occurs only on Monday mornings.

Consider now a three-tier approach. Data services look very much the same, except for the omission of triggers:

The presentation services might include the following:

The presentation layer is quite simple and might be implemented as a Visual Basic application, or as a web application with scripting and ActiveX or Java components.

The business services layer might include the following services:

This architecture is substantially more complex than the two-tier implementation. Development will probably take longer simply due to the problems of integrating the separate components. Initial deployment requires installation of database components, distribution of user components, and installation of mid-tier components.

The three-tier model is more complex, but it's also substantially more scalable and flexible. Because triggers do not enforce business logic, inserts and updates are fast and efficient, and the database server can handle heavier loads with less contention.

Now suppose that the consulting company opens branch offices in four cities and that copies of the database are maintained in each city. As consultants enter hours, the entries must update both the local database and the central database at world headquarters. For each implementation, each database would require distributed transaction support. You'll also require integrated account management so that a consultant requires a single login for all databases to which he has access.

The two-tier implementation would require the following changes to the desktop component:

The updated desktop component would have to be distributed to all consultants, and distribution would have to be carefully coordinated with database changes. This is a nontrivial problem when dealing with a thousand consultants in five cities. Also, database administrators must ensure that the same version of each trigger and stored procedure is installed on each database. As consultants enter hours, triggers fire separately on each database to enforce business logic. In fact, the 24-hours/day constraint is enforced separately in the desktop application and in insert triggers and update triggers on each database.

Now consider changes to the three-tier application. The presentation layer requires no changes. Mid-tier components require the same changes as the desktop application in the two-tier approach:

The updated mid-tier components must be distributed and installed on application servers at world headquarters and at each branch location, and rollout must be coordinated with database updates. However, coordinated rollout requires cooperation among only a handful of system administrators.

Although a two-tier implementation results in a simpler design and simpler initial deployment, the three-tier model offers superior long-term scalability and flexibility.

Designing a Distributed Application

The first step in designing any application is to identify the business objectives for the system, and the constraints under which it must operate: the requirements. One of the most effective approaches to requirements gathering is the use case approach, developed by Ivar Jacobson and others. To oversimplify a bit, a use case is a scenario that describes a business problem to be solved by software. Use cases are very effective tools for identifying the business objectives for a system. Unfortunately, this tool was omitted from Microsoft Visual Modeler. Use-case diagrams are included in Rational Rose 4.0 (Visual Modeler is a subset of the Rational Rose tool).


ON THE WEB:For information about Rational Rose 4.0 (including a free demo version), visit the Rational software web site at http://www.rational.com.

Once the initial requirements are clear, a design process should include the following steps:

Typically, this will be an iterative process: decisions in one stage will require changes to an earlier stage.

Model Business Objects and Their Interactions

Top-down structured analysis and design techniques are algorithm-oriented: they focus on the steps required to produce a result. These techniques work well for small to medium monolithic applications. Object-oriented (OO) approaches are interaction-oriented: they focus on interactions between components. Because OO techniques focus on the most critical aspect of a distributed system design, they are by far the best choice for distributed system design.

To derive an object model from use cases, remember your high school grammar. Nouns from the use cases will show up as classes in the object model; verbs will show up as interactions between objects. Ideally, the object model should result in several diagrams: class diagrams showing classes and relations (specialization, aggregation, association), and sequence diagrams showing the sequence of interactions for each use case. Figure 20.4 shows a sample class diagram, which uses the Unified Modeling Language (UML) notation. Figure 20.5 shows a sample sequence diagram.

FIG. 20.4
This class diagram shows that Automobile, Truck, and Motorcycle are specializations of Vehicle; a Fleet is an aggregation of Vehicles; and a Vehicle is associated with a Road.

The object interactions identify services for the service-based design approach. The objective for this stage of design is to define in detail the services that the application will implement. A detailed definition of a service is basically a function definition and should include the name of the function, a detailed definition of the formal parameters, and possible return values.

The object model for the application will be a work in progress throughout the life of the application. As requirements change or become clarified through subsequent steps, the model will change. At some point in every project, it becomes necessary to "baseline" the object model--identify a specific version as the design for the current release.

Define Services and Interfaces

The object interactions are the services for the service-based design approach. The objective for this stage of design is to define in detail the ways in which the application will invoke services. The invocation details of a service comprise its interface. A detailed definition of an interface is basically a function definition. It should include the name of the function, a detailed definition of the formal parameters and possible return values, and any preconditions for invoking the service.

FIG. 20.5
This UML sequence diagram shows Object 1 sending Message 1 to Object 2, which in turn creates Object 3 and sends it Message 2. After Message 2 returns, Object 2 deletes Object 3, and returns Message 1.

An interface definition should be considered a contract for provision of a service. Early publishing of an interface definition allows developers to treat the service as a black box so that implementation of services can proceed in parallel. Once published, however, an interface should not change. This is quite difficult to achieve early in a project, but the alternative is chaos. If a service is used by a number of objects, the ripple effect will be widespread. That's why it's important to baseline the object model.

This stage of design is a good time to look for reuse opportunities. Obvious candidates include data management (SQL Server) and transaction management (MTS). You may find that important business logic has already been implemented in a prior project's mid-tier component. Often, a little research will identify a commercial product (ActiveX component or class library) that provides services closely matching project requirements.

Opportunities for software reuse generally fall into three categories:

Component reuse means binary reuse. An existing component will be incorporated into the new application without access to source code. Good examples include ActiveX controls and the database server. Use of a non-COM dynamic link library (DLL) is slightly less effective because it requires some source code (header files for C++; function DECLARE statements in VB), and because the interface is subject to change.

Class library reuse is structured source code reuse. An existing well-designed set of source code that addresses a specific problem domain can be incorporated into your project. Examples include the Microsoft Foundation Classes (MFC) and other shrink-wrap class libraries such as Rogue Wave's dbtools.h++. Often, an organization will have an internally developed class library addressing company-specific problem domains. A well-implemented and well- documented class library can save a lot of design and implementation time.

The cut-and-paste miracle occurs when a department manager remembers an application vaguely similar to the one under development, and deletes several weeks from the implementation schedule because the new project can just reuse most of that code. For small, complex functions that have been correctly implemented once before, the cut-and-paste miracle may save a few hours. Unfortunately, the time saved is usually offset by time spent searching through legacy code for the reusable bits, and time spent melding that code into the new application. The cut-and-paste miracle rarely works as well as expected.

Finally, pause to identify services implemented in the new application that might be useful in subsequent projects. Planning for subsequent reuse does require additional design and implementation time in the schedule.

Identify Relationships Among Business Objects and Services

Software engineers define coupling as relationships between software components and cohesion as relationships within a component. Generally, cohesion is good, and coupling is bad. A certain degree of coupling, however, is inevitable, and a reasonable goal is for cohesion to be substantially stronger than coupling.

Relationships among business objects and services take a variety of forms. An Invoice object, for example, presupposes a Customer object; a ComputeTotal service for the invoice may depend on a ComputeTaxes service. The best tool for identifying relationships is the object model. Relationships between objects will appear in the class diagram as aggregation, association, or specialization relations. Sequence diagrams will show dependencies between services.

An aggregation relation is a part-of relation. A taxicab may be part of a fleet; a fleet aggregates vehicles. A specialization relation is a type-of relation. A taxicab is a type of automobile, which is a type of vehicle, so a taxicab has a specialization relation to automobile and to vehicle. An association relation is a uses relation. A vehicle uses roads, so a vehicle has an association relation to roads. The sample class diagram in Figure 20.4 earlier in the chapter illustrates these relations.

Generally, specialization and aggregation relations show much stronger coupling between objects than association relations. Objects with a specialization relationship should always be implemented in the same component. Objects with an aggregation relation should be implemented in the same component, unless one or more objects add value as a reusable independent component.

Sequence diagrams show the strength of association relations. The more messages exchanged between a pair of objects, the stronger the relation. The message sequence diagram shows the number of types of messages passed; design should also consider the frequency of each message type. Figure 20.5, earlier in the chapter, shows a sample sequence diagram.

Dependencies also show where requirements changes are likely to have ripple effects. Sometimes, a minor design change can isolate objects or services that are likely to change frequently.

Partition the Application into Components

Partitioning a distributed application is the most challenging part of the design process. Object boundaries, application tiers, dependencies, and deployment issues must all be considered and balanced. A few general guidelines apply:

FIG. 20.6
The Application Performance Explorer can validate your design.

One very effective strategy for partitioning a three-tier application is to classify the services provided by each object as presentation, business, or data. The object and layer boundaries give a good first cut at a partition. Each object will be partitioned into three subcomponents (assuming that it provides services in each tier). Then, within each tier, group subcomponents based on interactions and dependencies within that layer. This same approach works for a two-tier or multitier application.


TIP: Include closely related objects in one component to improve performance and maintainability. However, expect the strength of interactions to vary markedly across tiers. Two objects may be strongly related in the data layer but quite distinct in the presentation layer.

Select an Implementation Model for the Application

After creating an object model and incorporating reusable components, select an implemen-tation model. Obviously, reuse decisions will play a substantial role in this decision. The organization's level of experience with client/server computing will also play an important role: groups with less experience tend to develop primarily two-tier applications. Sometimes, an application will have very few business services and just works better as two-tier.

A traditional client/server application presents relatively few partitioning decisions. The database server will provide data management services, and a client application will implement business logic and the user interface. In some cases, business logic could be implemented through triggers and stored procedures, or in client code. Generally, stored procedures minimize the number of database calls, improve scalability, and simplify deployment.

For data used by more than one application, or data subject to significant business rules, the three-tier model supports encapsulation of important business logic. Usually, three-tier applications scale more effectively to large numbers of users. Reuse decisions may also motivate a decision for three-tier.

Target Components onto Platforms

In this final design stage, emphasis shifts from abstraction to implementation. Business requirements implemented in the object model must now be reconciled with constraints imposed by the computing environment. Sometimes, partitioning decisions will have to be revised.

Often, external factors dictate a particular platform for both the database server and the GUI. For instance, the department database server might run Sybase System 11 on a multiprocessor UNIX system, and target users might run Windows 3.11 on 486-66 machines. Beyond available hardware, platform constraints also include IPC/RPC mechanisms and hosting for other required services such as transaction management.

Presentation Layer Components  For most applications, the presentation layer platform is predefined: whatever sits on the user's desk today. Typically, this will include a range of performance and capabilities. Usually, management exerts substantial pressure to support all existing hardware; however, this may lead to unnecessary compromises in the application. It's always a mistake to hobble a new application to support old hardware. The issue is especially important if DCOM support is at stake. Probably the most important issue is appropriate network bandwidth. A component that makes excessive demands on network resources will perform poorly and may be unstable. It's generally a good idea to define a minimum platform and a recommended platform. The platform specification should include processor type and speed, memory, available disk space, operating system, and network bandwidth and protocols. As a rule of thumb, when a relatively small number of target users (fewer than 20%) have subminimum systems, the hardware upgrade costs will usually be less than the cost to reengineer the application. This is especially true for the transition from 16-bit Windows to 32-bit Windows. If a majority of users have equipment that does not comply with the recommended platform, application performance is at risk.

Limit functionality in the presentation layer to session management (showing windows in the proper order, enabling and disabling controls) and very basic input validation. More complex validation belongs in the middle tier. This generally means that inputs are validated in groups rather than a field at a time. Identifying these validation groups is an important step in designing both the presentation layer and the middle tier.

Business Services Layer  Components that enforce business rules on data require a high-bandwidth connection to the database server where that data resides. Database response time is typically the largest factor by far in response time for the business object. Where the business rules component in turn serves many presentation components, it may make sense to deploy closer to the user. The business services layer offers deployment flexibility that may not be available for data services or presentation services. Consider whether concurrent processing (pipeline, parallel, or asynchronous) is appropriate. Remember, these three approaches are mutually compatible. Choosing a concurrency strategy may result in modifications to the object model.

When a problem decomposes functionally into a series of sequential tasks, consider a pipeline. Try to design the pipeline for similar processing time at each stage. The design should include a state transition diagram depicting the state of a task as it passes through each stage of the pipeline. This diagram will often identify resource conflicts and processing bottlenecks.

When a problem consists of a number of small uniform tasks, consider a parallel architecture. Parallel processing provides enhanced fault tolerance as well as enhanced throughput. Design should include a state transition diagram for the process.

When a problem includes independent subtasks, consider asynchronous processing. Design should include a state transition diagram for the asynchronous task.

Sometimes, data services may be distributed across multiple databases. When data services require coordinated operations against two or more databases, the application will require distributed transaction support. Microsoft Transaction Server provides good support for distributed transactions, particularly when all databases are hosted on Microsoft SQL Server 6.5.

Data Services Layer  Most often, an existing database server will provide data management services. If the enterprise has been using the three-tier approach for some time, there should be some opportunity for reuse at both the data services and business services layers. At the data services layer, it's very important to maximize the amount of work performed per query. For example, inexperienced developers will sometimes write an application to query the database individually for each row in a result set. VS97 includes several database access classes (Remote Data Objects, Active Data Objects, Data Access Objects) for more effective management of result sets. Most database servers support stored procedures, another way to maximize work per query. There is a tradeoff, however: stored procedures are often difficult to debug and don't fit well into most configuration management tools.


TIP: Use Data Access Objects (DAO) with desktop databases such as Paradox, FoxPro, and especially Access. Remote Data Objects (RDO) can provide much more efficient access to a database server on the local area network, providing extended access to database errors, and return codes and output parameters for stored procedures. Active Data Objects (ADO) can also provide efficient access to database servers, with fewer advanced features than RDO. All three libraries use ODBC; Microsoft is nudging developers toward ADO.



ON THE WEB:For a comparison of DAO, RDO, and ADO, see http://www.microsoft.com/data/whatcom.htm.

The data services layer is often a hot spot for resource contention. Take care to implement services so as to minimize locking. Avoid cursors, use the minimum workable locking level, and avoid long-running transactions. In particular, no database transaction should require user input or confirmation. (The users might decide to go to lunch.)

From Here...

In this chapter, you looked at the problem of designing a client/server application on a traditional network, beginning with Microsoft's strategic services paradigm. Then you looked at some of the factors that influence a design. After reviewing common design objectives and constraints, you learned about concurrent processing approaches, with a comparison of the two-tier and three-tier client/server application models. Finally, you saw all the pieces pulled together into a comprehensive design strategy.


Previous chapterNext chapterContents


© Copyright, Macmillan Computer Publishing. All rights reserved.