Special Edition Using Microsoft® Visual Studio for Enterprise Development

Previous chapterNext chapterContents


- 21 -
Creating Distributed Applications

by Timothy A. White

Gain an understanding of distributed applications and where they fit in the corporate application development environment.
Discover some of the key components that Microsoft has delivered for creating and integrating distributed applications.
Review strategies for coping with variable or limited network availability in a distributed environment.
Get an overview of some alternatives to the connected and disconnected client approaches using intranet and Internet applications.

The landscape of corporate America is and has been undergoing significant change, partly in response to the dramatic pace at which technology has evolved. Technology is enabling organizations to realize the potential of distributed business environments, pushing services and products to customers around the globe. Technology allows business to transpire regardless of location. As this evolution occurs, businesses shift from the old data processing (host-based) paradigm to the newer client/server paradigm. With this shift, many of the legacy applications that have served them well are either rewritten or scrapped.

This chapter explores some of the available technologies that allow organizations to utilize these existing legacy assets and to combine them with the power of client/server computing. Many of the features of Visual Studio, as described in this book, can be coupled with the techniques discussed in this chapter to further leverage corporate assets. Not only does this save an organization money and time, but it also can uncover vast quantities of valuable data that has been hidden deep in the bowels of legacy systems. As you read this chapter, let your mind remain open to new ideas. The overworked phrase "thinking out of the box" applies here. This chapter discusses distributed applications from two perspectives. The first perspective looks at internally distributed applications running on a variety of platforms, distributing tasks across the enterprise. The second perspective views remote user applications that free workers from the confines of the office and place them in the best possible position to create positive value for the organization. These methods are not mutually exclusive; in fact, many organizations will realize significant benefits from a derivative of the two. n

What Is a Distributed Application?

A distributed application is one that has multiple components running on different machines. In today's business environment, the creation of distributed applications is an evolutionary process. The boundaries of distributed applications are constantly being expanded and stretched as the world of technology changes to keep up with the business needs of corporate America. Corporate information technology (IT) staffs are looking for ways to not only balance budgets and deliver timely solutions, but also to leverage their existing legacy systems while developing new systems using newer, more flexible client/server technologies.

This chapter is not intended to provide step-by-step "how to" instructions on developing distributed applications, but to raise your level of awareness of the technologies and tools available for such endeavors. It also strives to point out some of the pitfalls in building and implementing distributed solutions.

Distributed applications reflect the distributed nature of many large organizations. Organizations that can take advantage of distributed environments, leveraging all their information assets, can gain a real advantage in the marketplace. Demographic data, managed properly and used to produce timely indications of changes or shifts in a business environment, can produce significant positive results. Many organizations have demographic data stored in a variety of places throughout the organization. Using older techniques of gathering this information and crunching it into usable formats can take weeks, months, or even years. With this long turnaround time, many opportunities that this data can create are lost due to poor responsiveness. In this area, distributed applications, and specifically the ability to link disparate data stores, can provide tangible results for many organizations.

Distributed applications take on many forms. Some are fairly easily understood, such as the two-tier client/server model. Others are far more complex, requiring the efforts of many developers and potentially years to build and implement. As the pace of technological evolution quickens, several challenges confront corporate IT staffs. Two of these challenges are the need to leverage legacy systems and the push towards departmentalized computing.

Legacy Integration

Legacy integration is not a new concept, but one that is growing in popularity. This is largely due to necessity, but also to the tools now available to developers and integrators. Literally billions of lines of legacy application code are in use today, most of it working fairly well. Many of these applications will still be in use well into the next century. The data stored on legacy systems also represents a potential competitive advantage for many organizations. By making better use of this resource, organizations can realize increased synergy and profits. With the evolution of many organizations, the legacy systems that once supported all their needs might no longer suffice. This is not to say that all the pieces of these legacy systems are fundamentally flawed, but rather that some key components need to be reengineered to meet changing business needs. So, given these business needs for new or advanced functionality, the question becomes: integrate or rebuild? Organizations looking for that something to put them over the top might find it in the form of increased efficiency in the use of their legacy systems and data. Legacy integration can also improve delivery time of key applications and better utilization of resources.

It is fairly common to hear someone say, "We should just rewrite it in xyz," with xyz representing any number of client/server development tools. This sounds great because as a developer, you always want to use the latest and greatest development tools, but for the business this might have little or no positive financial impact. In other words, rewriting these applications can be a tough sell. The impact of rewriting a legacy system typically trickles through the entire systems development environment. Rewriting one system can have a monumental impact on other systems, which then must be changed or rewritten. In today's environment, leveraging the current investment is probably the more prudent course of action, unless there are significant financial incentives for rewriting applications.

In the eyes of many corporations, these legacy applications represent a tremendous investment--not only in hardware and software, but also in business logic that might not be easily defined or reengineered. It becomes difficult to justify rewriting something that is working, proven, and providing positive benefits to the business. If it is not a total change of direction in business processes, integrating with existing systems can ease the development effort. It does so not only by reducing costs and time associated with the development, but also by reducing the number of business processes that need to be defined or redefined--which can mean many hours spent in meetings discussing the actual and theoretical business processes.

The development of quality applications, regardless of tools and expertise, takes time. Quality solutions require you to take time for design, development, good business analysis, and definition of user and system requirements. Add to this the time to rewrite legacy code, and the deliverable time frame might outlast the useful life of the application. Today's business environment can change dramatically in a short period of time. If application development staffs are not responsive and flexible to the business needs, the business opportunities at hand might be missed. This not only alienates the business, but the perceived unresponsiveness will do little for IT's stance within the organization.

One way to quicken the process of rewriting legacy applications is to talk to the developers of the original system. By picking their minds about what business processes drove key areas of application functionality, you can quickly discern needed functionality from unneeded functionality in a new system. Because many legacy systems are 10 to 20 years old or older, these individuals might no longer be an available resource. Project managers might also have a difficult time assembling a development team that is equally qualified to look at legacy code, reverse-engineer the business rules, and build client/server applications. Once again, the factor of timeliness comes into play: Is it worth the time to reverse-engineer the legacy process, or should it be integrated with the new applications or functionality? As stated earlier, with the rapidly changing business environment, the more prudent approach might be integration.

Many opportunities can exist in integrating with legacy applications, but many challenges will also be faced. Integration raises some of the more difficult questions. Problems such as dealing with disparate hardware, operating systems, and communication protocols come to mind immediately. There is also the question of various data storage mechanisms. How are data stores, such as VSAM and SQL Server, integrated? Another challenge is in building the right mix of developer talents. Both advanced technology and legacy systems experts will be required, and the proper mix will make the integration efforts a lot smoother. Not many IT shops have developers or systems specialists that have attempted these types of development efforts. Confidence, persistence, willingness to listen, openness to new tools and technologies, and sheer durability will go a long way toward conquering these challenges.

Departmentalized Applications

A departmentalized application is typically characterized by a small set of users. Hardware might be dedicated to a particular user community, and the application fulfills a single, departmentally specific purpose. This single purposefulness coincides with the corporate definition of a department, which typically is designed around a single business purpose or function. Departmental applications are typically more cost-effective to develop in the short term. There are, however, some inherent problems with this approach, primarily the sharing of data. These silos of data can increase the costs of departmental applications in the long term.

Departmentalized applications make sense for several reasons. Typically it is more cost- effective to develop applications that have a smaller scope and a smaller set of users. This allows for a smaller development team, which eliminates some management overhead. The user community is focused on one business goal--whatever it is that their department does. And it is easier to manage response time and scalability on the departmental level, not having to worry about scaling to extremely large user communities and so forth.

Departmentalization does come at a cost. Typically the information is not shared across departments, which takes away from the synergy and responsiveness of an organization. This is one reason for the use of corporate data warehouses and data marts, which provide the ability to gather information from multiple sources and store it centrally for all to access. The data warehouse or data mart concept works fine for a scenario in which the departmental data is used for analysis that is not "real-time" sensitive. In some situations, however, departments within an organization will demand real-time data from other departments.

You can overcome the challenge of distributed data by using distributed transaction processing. This approach consists of applications that make use of and manipulate multiple databases or data stores. Managing distributed transactions--making sure that they process successfully--is the central issue faced in this approach. You would not want to risk data integrity by updating one database and not the other. Another issue is the availability of either local or remote databases. You might not want to stop processing in the application because a distributed database is unavailable. These issues, as you will see later in this chapter, can be handled by using a reliable and durable transaction-processing architecture.

The following sections introduce you to an array of technologies that exist to help you build distributed and integrated applications. Examples are also presented for how to overcome many of the challenges mentioned here. The first section covers some of the technologies that are available for developing solutions for these situations.

Key Technologies

Microsoft has amassed an array of tools and technologies that allow you to build distributed applications and to integrate these applications with each other and with legacy systems. Some of these tools are used primarily for moving data to multiple data stores, whereas others are concerned with sharing data and processes between disparate environments. As mentioned earlier in this book, Microsoft tends to use catchy code names. Some of the tools described in this section are referred to by their code names, and some are listed with the production name and the code name.

The following technologies are discussed in this section:

The technologies presented here have a common goal. They strive to allow you, as the developer, to focus on business issues and worry less about the back-office issues. The role of the developer is to focus on the business issues and problems at hand, design solutions, and then select the proper tools for the implementation of these solutions. Microsoft has sought to provide you with a rich tool set for nearly all the situations that can arise during the implementation of these solutions. Some of these tools are designed to help you build distributed client/server applications that can be integrated with legacy applications. Others are designed for distributed applications that must make use of multiple databases or handle remote users.


ON THE WEB:The following technologies and tools are presented in cursory fashion. For more details on these and any of the Microsoft products, it is recommended that you visit its web site, www.microsoft.com. Be prepared to spend a significant amount of time locating information; a wealth of technical documentation is provided on each of these tools.
You will also find some hints on finding information regarding each tool or topic at the end of each topic's section of this chapter.

Replication

As an organization grows and its data becomes more specialized and departmentalized, it is often necessary for applications to share data. Organizations stand to gain competitive ad-vantages through synergy and responsiveness. The sharing of data between departments is paramount in developing this synergy. For situations where real-time consistent data is not necessary and loosely synchronized data will suffice, replication provides you with a valuable tool.

Replication is a process by which data is propagated from one database to another, maintaining a consistent view. This propagation can be transaction based, scheduled, or on demand. Transaction-based replication of data is based upon a database transaction occurring. This model presents near real-time distributed transactions. Transaction-based replication is slowed by transaction size, transaction volume, and network traffic. Scheduled replication is more widely used in situations where the data is not as time-sensitive or transaction volume does not make it possible or practical to use transactional replication. One common use for scheduled replication is in data warehousing, where data can be gathered for decision support based on the previous day's events. On-demand replication is typically used for data security functions, such as taking copies of distributed databases for point-in-time disaster recovery.

Replication, as described by Microsoft, uses publishing databases, which publish data to subscribing databases. Publishing databases can publish data to one or many subscribers. Microsoft SQL Server will support replication by means of ODBC to Microsoft Access or Oracle databases, which can be very beneficial for organizations with disparate data stores. Subscribers to replication see a point-in-time consistent view of the publishing database. In other words, replication does not guarantee that at every instant in time all copies of a data element will be identical.

SQL Server replication supports partitioning of replicated data. Data might be horizontally partitioned, vertically partitioned, or a combination of the two. Horizontal partitioning publishes only specific rows of data to a subscribing database. Vertical partitioning publishes only specific columns of data to a subscribing database. Combination replication, involving both horizontal and vertical partitioning, replicates only selected columns from selected rows. Combination partitioning truly grants you flexibility in the implementation of replication and the solutions you are able to provide. This functionality is useful for systems that might need only specific pieces of data from a department.

For more information regarding Microsoft SQL Server replication, developers can refer to Special Edition Using Microsoft BackOffice, Volume 2 from Que, ISBN 0-7897-1130-3, and the documentation provided with the SQL Server software. For other RDBMSs, refer to the manufacturer's documentation for the specifics regarding replication.

The Host Data Replicator (Cakewalk)

Many companies that have existing host-based legacy environments use DB2. To answer the call for data transparency with these systems, Microsoft has developed the Host Data Replicator, code-named Cakewalk. The Host Data Replicator allows replication of database tables between DB2 and SQL Server. It also allows bidirectional replication, in which an entire table is refreshed with a snapshot from the source environment. The Host Data Replicator uses Microsoft's SNA version 3.0 for the connectivity to the host environment, while all of the processing resides on the NT server on which the Host Data Replicator is installed. The Host Data Replicator supports a variety of replication scenarios, types, and DB2 products.

The Host Data Replicator supports a myriad of replication scenarios that allow for flexibility and transparency of data between SQL Server and DB2 systems. Replication scenarios supported are horizontal, vertical, and combination partitioning, as discussed previously. The Host Data Replicator also supports the use of derived columns, allowing for the calculation of fields during replication. The Host Data Replicator also provides the ability to use Structured Query Language (SQL) to alter data before or after replication and the ability to manipulate the data type or column order of replicated data.

The Host Data Replicator supports the following replication features:

The Host Data Replicator provides for three replication time frames. The first is on-demand replication. This can be implemented through a programmatic interface to provide quasi- transaction-based replication. The next is scheduled replication, which is very useful if hour-, day-, or week-old data is sufficient for the business purposes it is intended to serve. The last is recurring scheduled replication. This is very similar to scheduled replication, usually set to occur at a specific time or times of the day and over a given period of time.

The Host Data Replicator supports a variety of DB2 flavors. Some organizations might have multiple DB2 environments, but the transparency provided by the Host Data Replicator can be used with one or more of these DB2 environments and Microsoft SQL Server. This also allows the same set of application code to work using any of the supported DB2 platforms. By writing applications to use the Host Data Replicator, you can free the dependence of the application on a specific DB2 platform. This allows scalability on the DB2 legacy side of the development environment without changing the code to access it.

Host Data Replicator supports the following versions of DB2:

For IT shops that have an install base of DB2, it has become much easier to share information with SQL Server databases. This will reduce development time for new client/server applications by allowing developers to share data between environments without sacrificing development time providing middleware solutions.


ON THE WEB:For more information on the Host Data Replicator, visit the Microsoft Developer Network web site at www.microsoft.com/MSDN and search on Cakewalk or Host Data Replicator.

The Microsoft Transaction Server

The Microsoft Transaction Server (MTS) is a set of components that provide specific pieces of functionality for creating, deploying, and managing distributed applications. MTS allows for the packaging and distribution of application components and logic. A key component of MTS is the Distributed Transaction Coordinator (DTC) for managing distributed transactions.

MTS manages packages of components, which are either purchased or developed in-house. These components are pulled together to form application units. MTS also manages a shared pool of Open Database Connectivity (ODBC) data connections. These data connections can allow application access to a variety of data stores, ranging from SQL Server database tables to mainframe VSAM files.

MTS can also be used to deploy mid-tier logic implemented in the form of ActiveX components. These ActiveX objects receive requests from client applications, apply business logic to the request, and then call the appropriate resource manager to fulfill the request. Collections of these ActiveX components form packages, which can be released to a Windows NT server to act as a linkage between clients and the shared resources they want to access. This functionality typically forms the middle tier in a three-tiered client/server system.

Using MTS packages has the following advantages:

As stated earlier, one component of MTS is the Distributed Transaction Coordinator (DTC). The DTC allows for preservation of the ACID (Atomicity, Consistency, Isolation, and Dura-bility) properties of transactions. These transactions can be either managed by the DTC or resolved by an administrator in the case of problem transactions. DTC also provides an Application Programming Interface (API) based on the Component Object Model (COM) to aid in the development of distributed applications. This API enables C++ developers to create transaction objects. These objects can then be instantiated and processed using the transaction resource managers and transaction coordinators.

Atomicity of transactions means that either a transaction completes successfully or no part of the transaction completes successfully. Under DTC, transactions are all or nothing. DTC ensures this by using the two-phase commit (2PC).

In the 2PC, an update statement is submitted to each SQL Server. The data is updated but not committed. In this state, the server is prepared to commit. Once the DTC is notified that both updates occurred and that both databases are prepared, the DTC sends a commit instruction to both, and the databases commit the transaction. Should a failure, like a power outage, occur during this process, the DTC maintains a log of state information regarding the transaction. When both databases are back online, the log is checked, and both databases return the last known state regarding the transaction. They either go ahead and commit a prepared transaction, or roll back the entire transaction. With this level of fault tolerance, you can keep your databases consistent and clean, while using distributed transactions.

At times you will have transactions that don't behave properly; these are said to be in-doubt transactions. DTC will resolve these transactions by communicating with other DTC services involved in the transaction. Based on that communication, the transaction is either rolled back or committed. It is also possible for system administrators to monitor the DTC for in-doubt transactions and resolve them by using the SQL Enterprise Manager. Manually resolving in-doubt transactions does take some level of expertise, specifically in knowing how the results might affect the business processes that have been created or are using the transaction.

Another welcome feature to DTC is the COM-based API. This makes it much easier for you to develop applications that take full advantage of the services that DTC offers. It also frees you from the complexities of developing middleware solutions. This allows you to focus more time on the business processes at hand, which will lead to better applications for the business user. In the eyes of the business user, your development efforts will also seem more timely and responsive.

For more information regarding the Microsoft Transaction Server or the Distributed Transaction Coordinator, see Special Edition Using Microsoft BackOffice, Volume 2 from Que, ISBN 0-7897-1130-3, or the documentation provided with the SQL Server software.

The OLE DB/DDM Driver

The OLE DB/DDM driver, code-named Thor, gives you the ability to integrate legacy data with more current relational database systems, such as SQL Server. The OLE DB/DDM driver makes use of two data access methods: Microsoft's OLE DB and IBM's Distributed Data Management (DDM). OLE DB/DDM opens up the world of legacy data files, such as VSAM and OS/400 files, to the client/server application developer, you. This enables you to leverage a tremendous resource within many organizations, their legacy data.

OLE DB is a major component of Microsoft's Universal Data Access initiative. This allows for data stored in any form to be accessed through a common set of interfaces. This means that corporate data stored in spreadsheets, relational databases, flat files, or email systems is accessible to the application, greatly simplifying your job as the application developer. This is accomplished by the data stores exposing common interfaces to the data.

OLE DB partitions the database functionality into logical components and allows for these components to communicate by event processing. An OLE DB component can be created to present data in a tabular format while allowing for complex application logic to be processed within the component. OLE DB provides a COM-based API for developing robust database applications using any number of data stores and for a variety of platforms. Support for OLE DB falls on the shoulders of the data provider, that is, Microsoft Excel, Microsoft Project, or ODBC SQL-oriented data. OLE DB resides above the data store and below the application, allowing the application developer to interface with the OLE DB APIs without worrying about the underlying data store. It is worth noting that OLE DB is not a replacement for ODBC, but rather allows OLE DB data consumers to utilize ODBC data providers.

IBM's Distributed Data Management (DDM) protocol is a standard access method to row-oriented legacy data files, such as VSAM. DDM is available for most host environments. The OLE DB/DDM driver requires no host-side software from Microsoft.

The OLE DB/DDM driver allows you to access VSAM data set members much as you would access files on a Windows NT server. Because you can not only view the VSAM data set member, but also have record-level I/O access, you can utilize this data without first performing costly conversions to SQL Server or other RDBMS formats. The OLE DB/DDM driver also allows access to both fixed and variable record length records, with full data set navigation. Other features of OLE DB/DDM are file locking, record locking, and record attribute preservation of VSAM files. OLE DB/DDM truly is a single solution for accessing multiple data storage types on multiple disparate platforms.

The following is a list of VSAM file types supported:


ON THE WEB:For more information regarding the OLE DB/DDM driver, visit the Microsoft Developer Network web site at www.microsoft.com/msdn and search on Thor or OLE DB/DDM.

Cedar

Cedar, as code-named by Microsoft, allows you to create distributed client/server applications using legacy mainframe applications as functional components. Cedar's program-to-program interoperability allows organizations to leverage their existing mainframe applications in a distributed COM-based client/server environment. Using the tools that come with Cedar, you can quickly create Distributed COM (DCOM) components using legacy systems. These components can communicate and interoperate to form very robust applications. Cedar allows for atomic, transaction-based client/server applications to use business logic that is already in place rather than having to rewrite or reengineer legacy business logic. Companies can utilize their existing investments in mainframe programming tools and developers while taking advantage of the more flexible COM- and DCOM-oriented technologies and tool sets to extend legacy systems.

An example of this flexibility is extending a mainframe-based record look-up utility. You could incorporate this functionality into a new Windows-based application, tying it to a pushbutton, allowing the mainframe component to find information, based on a name, and supplying this data to your Windows application.

These DCOM components created by Cedar can be used by and run on any DCOM-compliant platform, such as MVS. Developers can use these components in creating client/server distributed applications, which can consist of components running locally, on middle-tier servers, and on the mainframe, passing data between them from a variety of data stores. As in the previous example, a program or utility running on MVS could be created as a DCOM component and utilized by your Windows application.

You can create Cedar components by using the Cedar Interface Builder. The Interface Builder allows you to define the methods and I/O parameters for the host application. This process includes specifying the location and name of the mainframe program and specifying any default data type mappings. The last step in this process is the creation of the Cedar type library and registering it with MTS. The Cedar components will reside on the Windows NT server, not on the legacy host system, and must be registered on any client platforms that they will be called from.

Cedar works by intercepting object method calls and redirecting them to the appropriate mainframe program. Cedar uses the definition, built during the creation of the component, to convert the method call into the appropriate format for the target platform and sends the method call to the host platform. The connection to the host environment is provided by Microsoft SNA Server 3.0. Once the mainframe component processes the method call and returns the results, Cedar converts the results from the native host format to a format understandable by the calling object. The results are then sent back to the calling object.

Cedar is a component of MTS, discussed earlier. This allows Cedar to interact and make use of the MTS transaction functionality. Cedar, working with MTS components such as DTC and using the OLE DB/DDM database connectivity, can provide two-phase commit functionality between different database systems running on different platforms. This empowers you, as the developer, to create synergistic applications that make full use of legacy assets while delivering the flexibility and functionality that the organization desires.

Cedar-defined objects can be used with any development tool that supports automation objects. Such tools include Visual Basic 5.0 and Visual C++. You can view these objects using the standard object browser once the object library is added to the application.


ON THE WEB:For more information on Cedar, visit the Microsoft Developer Network web site at www.microsoft.com/msdn and search on Cedar.

The Microsoft Message Queue

Today's business environment often requires individuals within different departments to communicate with each other to reach a common business goal. In much the same way, business requirements are also forcing applications running on different systems to communicate with each other in support of common business goals. With disparate data stores, machines, network protocols, and applications, this might seem a daunting task. One answer is to rebuild existing applications and homogenize them and the networks on which they run. This start-from-scratch mentality is a difficult sell in the "do more with less" IT environment of today. A more reasonable solution is to allow applications to communicate with each other regardless of platform or language. Allowing new applications to communicate with existing applications across disparate technologies, passing state and transactional information from application to application, becomes the ideal method of merging technologies quickly and efficiently.

The answer to the merging of heterogeneous technologies comes in the form of messaging. Microsoft has answered this need with the Microsoft Message Queue (MSMQ), code-named Falcon. Message queuing provides reliable communication between applications that could possibly be running on varying platforms across heterogeneous networks. Messaging can be real-time or occur at periodic intervals. MSMQ provides guaranteed delivery and offers a variety of response types. MSMQ is also an ideal tool for a remote client architecture, where users can be at any location at any time. Using the Windows NT framework for security and offering a Software Development Kit (SDK) and API, MSMQ becomes an invaluable part of many distributed application projects.

MSMQ conducts either synchronous or asynchronous messaging with guaranteed message delivery. MSMQ uses a store-and-forward format, meaning that messages are written to intermediate queues and then if possible are transferred to either another intermediate queue or the destination queue. At each leg of this journey, the messages are removed from the intermediate queue only after they have reached their target queue. Should a target queue be unavailable, the message will remain in the intermediate queue until it can be delivered or removed from the queue manually. In this way MSMQ provides guaranteed message delivery regardless of network or system problems.

Building on the philosophy of guaranteed delivery is the atomicity of messages. Messages can take the form of transactions, meaning that one message can contain information to update a database and send a message to a host process. These transaction messages will act as an atomic unit. Should either the update fail or the target message queue be unavailable, both processes will wait until they can be completed successfully. Transaction messages are also guaranteed to arrive in order and not more than once to the destination queue.

Messages can be one of three types:

It is also possible to receive confirmation from the destination queue that the message was received successfully. The responses can be one of three types.

The following is a summary of the three response types supported by MSMQ:

Remote users are becoming not just more common, but a standard for some positions within many organizations. The obvious example is that of a salesperson who might spend days, weeks, or even months traveling to promote a product. For this distributed, disconnected user, MSMQ provides connectionless messaging. Applications utilizing MSMQ do not need to be directly connected to the receiving application. Applications that are not connected can use a local MSMQ intermediate queue to store messages until the user is connected to either the receiving application or an intermediate queue that will be able to deliver the messages to the receiving application. This allows a user to work offline and then transmit data in the form of transactional messages at a later time.

MSMQ security for messaging is based on the Windows NT security framework and the Crypto API for encryption and digital signatures. Maintaining a high level of security and working within the existing Windows NT security framework will ease some of the development pains for creating MSMQ applications. This will also allow the security administrator to sleep at night when an organization uses remote clients to distribute and collect sensitive data.

MSMQ comes with an SDK that contains MSMQ ActiveX components that can be used by Visual Basic, Internet Information Server Active Server Pages, or any other ActiveX container. MSMQ ships with MSMQ Servers that handle the routing services and host the information store that holds configuration data. Each MSMQ queue is assigned a Globally Unique Identifier (GUID) when it is created. This ensures that no matter where a message queue exists on a network, an application using this GUID will be able to find the associated message queue. MSMQ also has a Message Queue Explorer for monitoring and managing messages. Another feature of MSMQ is that the dynamic configuration of disparate networks does not come at the cost of major application changes. This is due to the availability of the MSMQ API on a variety of platforms. This means that one set of application code will work across all supported platforms.

MSMQ API allows developers to do the following:


NOTE: It should be pointed out here that MSMQ can theoretically function transparently. In nearly all cases, however, you will need to make use of the MSMQ API. By determining message queue status and taking an active role in the management of messages and the message queues, your applications will function in a much more consistent manner.

MSMQ has two components that must be installed. The MSMQ Server can be installed on an existing Windows NT Server without dedicating a machine solely for MSMQ. The MSMQ Client runtime needs to be configured on every machine that will be using MSMQ-based communications.

MSMQ gives organizations the ability to leverage their legacy and distributed applications and move forward with new technology initiatives. For organizations that are moving away from mainframe or host-based systems, MSMQ allows this migration to be done in a piecemeal fashion. These reasons can provide significant benefits to IT development efforts and the business units that pay for them.


ON THE WEB:For more information on MSMQ, visit the Microsoft Developer Network web site at www.microsoft.com/msdn and search on MSMQ or Falcon.

Limited Bandwidth Strategies

A common challenge for distributed applications is the need to support connected and disconnected clients. This refers to the way in which the application uses and disseminates information. If the application is connected to the database or network the entire time it is in use, then it is called a connected client. If there are periods of time when the application will be in use but not connected to either a database or network, then it is called a disconnected client.

Based purely on numbers, most client/server applications are of the connected client flavor. However, corporate cultures have been undergoing some changes in recent years that have allowed them to embrace telecommuting. There are also situations that demand the use of disconnected client architectures, such as systems used by traveling salespeople. It quickly becomes very clumsy and awkward to ask a potential customer for the use of a phone line while preparing for a sales pitch. Allowing salespeople to retrieve data, process it with a customer, and then send the updated or new data back to the office for processing is a viable alternative to this situation.

The Connected Client Architecture

A connected client application is one that maintains a constant connection to either a database or a network. This is the most common type of client/server computing. A large part of this book and this chapter is dedicated to technologies that are expanding the connected client architecture and client/server computing in general. Connected client applications are usually built to support some type of online process. They also will typically push some of the functionality or business logic onto the Relational Database Management System (RDBMS) that is being used as a data store.

Connected client applications are typically of the On-Line Transaction Processing (OLTP) variety. Users work at client workstations entering transaction information into a database, such as sales data. These applications offer real-time response to user requests for information and are typically designed around one business process, such as sales. In this way, you see the development of departmentalized computing.

The task of data resolution usually falls to the RDBMS, such as Microsoft's SQL Server. Managing concurrent updates and simultaneous requests for data from multiple users is one fundamental job of the RDBMS. It is also worth noting that the business rules pertinent to these applications are often maintained within the RDBMS in the form of stored procedures, grants and privileges, and user roles.

The Disconnected Client Architecture

A disconnected client application is one that is used offline, allowing the user to process data and not be directly connected to a network or corporate data store. This also implies that data must be resolved by some means other than using real-time transactions. It is very common to find the business rules for how the data is managed to be in the form of stored procedures, which allow for batch processing of data received over a given period of time.

Security for transactions must be handled by the local data store, which must be robust enough to handle some form of replication. The replication or dissemination of data, from remote users to corporate data stores, is typically done by periodically connecting to a corporate network by modem and transmitting data. The transmission is usually handled by application logic, making use of automated replication or messaging.

When to Use Connected and Disconnected Clients

The primary factors in choosing a connected or disconnected architecture are workflow analysis and cost. Workflow analysis helps determine the need for connectivity; cost analysis determines feasibility. Remember that additional implementation costs can be offset by reduced workflow costs. The choice of a connected or disconnected approach should be weighed for each system interface.

Workflow Analysis  The first step in selecting a connected or disconnected approach should be an analysis of tasks in which the software will play a role. For each task, identify the role of the computer, the data required, and the available connectivity. For a freight dispatcher on a loading dock, a computer can track the assignment of loads to trucks; up-to-the-minute status on shipments can be valuable to customer service staff, and the fixed location makes a connected application the best choice. Customers reasonably expect to know whether their order has shipped. For a truck driver, daily updates on location of a shipment add value for customer service; mile-by-mile updates add substantially less. On a truck traveling at sixty miles per hour, a disconnected (laptop) application is the only feasible choice. Such information as that the shipment left the loading dock yesterday and left Abilene this morning will satisfy most customer inquiries. Each new application must fit into a unique workflow. However, don't overlook opportunities to improve the workflow. For example, most organizations produce sets of reports from legacy data just because they always have. Many of these reports are obsolete and no more than recyclable waste. The costs of pushing hard copy reports can result in millions of dollars in extra expense, not just in materials but in the personnel to distribute them. The concept of a totally paperless environment is overly idealistic, but a little common sense regarding the sharing of online information between departmental systems can potentially save thousands of dollars annually.

Remember that the workflow (and the associated costs) will be different depending on the selection of a connected or disconnected architecture.

Cost Analysis  It's important to understand the costs of implementing a connected or disconnected application. Costs include infrastructure (hardware, networks, and systems software), communications, support, training, administration, and of course application development (initial and maintenance). It takes time for an organization to deploy and assimilate a robust network architecture. Application design should never target a more robust network than currently exists. If branch offices have local area networks, but no wide area network exists to link branches, then choose a disconnected architecture for interbranch communications. If plans are afoot to implement a wide area network, consider that in plans for version 2. Communication costs are a factor in the disconnected client implementation. One cost that might be incurred is the cost to upgrade or implement remote access to the corporate network. If the organization is running Window 95 or Windows NT, the use of Remote Access Server (RAS) makes this a much simpler process. RAS allows for a remote dial-in connection to a network and provides excellent support for disconnected applications. This utility is included as part of the Windows 95 and NT operating systems. For information regarding the configuration, use, and management of RAS devices, you can consult Special Edition Using Microsoft BackOffice, Volume 1, ISBN 0-7897-1142-7, from Que Corporation.

Long distance connection fees represent another communications cost. Some nationwide Internet Service Providers (ISPs) allow for the use of a single phone number, accessed nationwide and billed at a single rate. Still other ISPs allow for connection to a local number, regardless of city, and the use of the Point-To-Point Tunneling Protocol (PPTP). PPTP establishes a Virtual Private Network (VPN), which allows you to log on to your corporate network. The advantage to this is that you can use a lower cost national ISP and still have connectivity from a remote destination.

One last comment on the topic of communication costs involves the inability to connect. This could represent real dollar costs in the form of late or lost orders. This usually results from one of three things. One is user error; when this happens, it is beneficial to have some form of remote control software that will allow a technical support person to dial in to the user's machine to correct the problems. The second factor is network problems that result in the RAS server's being unavailable; all the organization can do is create a set of policies regarding technical hardware support and have parts on hand. The last factor is an inadequate or antiquated phone line at the remote user location. Many older hotels have old phone lines that work well for voice but have too much crosstalk or attenuation for data communications. The only resolution for this problem is to search out another phone line.

Don't overlook support and training costs in this analysis. Ongoing support of a RAS server for a disconnected application requires time, attention, and money. Poorly trained users can make expensive mistakes and generally don't make the most of the available software.

Data Integrity as a Common Goal

Data integrity should be at the heart of every OLTP or batch application, whether connected or disconnected. Many of the technologies that have been discussed have features to help developers in this task. Data integrity becomes particularly challenging when the disconnected client model is used. Handling concurrent updates, hardware failures, and disparate technologies are issues that nearly all development staffs will face at some point.

The next two sections of this chapter discuss in more detail the challenges faced--and some possible solutions--with the connected and disconnected client models. The examples presented make use of the technologies discussed earlier.

Connected Client Applications

Connected client computing is the common case for client/server development. Client machines are connected by a network to some departmental or corporate data store. This does not mean that the data store is necessarily shared among departments. Many organizations, as they evolved into specialized units, created systems that focused on departmental needs. These systems were typically designed and implemented in as short a time frame as possible. The fact that these systems and their data stores were not tied together created pockets, or silos, of departmental information. As the organization continues to grow and struggles to survive in the marketplace, the need for shared data becomes more evident. Things such as time to market and customer service are measurements that, given localized data stores, are hard to get a grasp on. Many organizations and their leaders might feel that their customer service is exemplary, but presented with data compiled from all departments, they find out that the rate of customer dissatisfaction is actually growing. Being able to see these facts before they become problems can create competitive advantages.

The following example presents a similar situation. In this case, departmentalized data must be shared to gain some measure of synergy and efficiency between different departments or business units.

A Real-World Connected Client Example

Most organizations have a product that is sold or produced. Many organizations probably do both--manufacture and sell. This section uses an example with the organizational model of a manufacturer. The systems specifications and expected results are also listed.

Figure 21.1 offers a visual representation of this organization and the disparate systems it uses. Local sales representatives sell a product to a customer. The sales information must then be used by the manufacturing department to load level production and distribution of the product. The problem domain used in Figure 21.1 is as follows: The salesperson reads a report of customers to call, and the report is generated from a legacy Customer file in VSAM format. The report is based on customer demographics, which are determined by customer address and size of customer. When the salesperson makes a sale, the order and quantity are entered into a SQL Server database used by the Sales department. An order ticket is generated and sent to the order entry department, where the order and quantity are entered into the manufacturing SQL Server database to allow for planning of production. Once the product is through the manufacturing process, the distribution department must get the address from the Customer system, residing on the mainframe; then the product is shipped, and the salesperson is notified by paper report.

FIG. 21.1
Three separate systems: One is for sales users, one is for manufacturing, and one is a legacy customer VSAM file residing on a mainframe.

Figure 21.1 shows the complexity of disparate systems. The example might be far-fetched or might seem a little too fictitious, but it is a safe bet that most large companies have pockets of data that are not shared between systems. Department A generates a report, which is sent to Department B, and so on. It should be obvious that the systems in Figure 21.1 need to share data more effectively and efficiently.

The requirements you are handed, as the project leader, are as follows. Resolve the bottleneck resulting from the paper flow to the order entry department. The entry of order information into the manufacturing system should come as a result of the salesperson entering a sale into the Sales department database. Another fundamental change to this system is in regard to customer service. If a customer calls the salesperson, probably their only point of contact, and wants to find out order status information, how does the salesperson provide this? The salesperson doesn't have this information available. This is primarily a result of the manufacturing system not communicating with the sales system. Another goal is to make better use of the Customer repository residing on the mainframe. Instead of producing reports, the information should be available to both the salesperson and the distribution department electronically, as needed.

The problem has been established and the requirement handed down, and now you must design and implement a solution. In the next section you will see how the key technologies discussed earlier can be implemented to provide this solution.

The Role of Key Technologies

The focal point of this system is the two SQL Server databases. The fact that they need to communicate has been established. This communication could be handled in two ways, one no more correct than the other. The first is by replication, setting up a replication server to move data from one to the other, based on distributed database updates. The second alternative is to change the applications to send updates to both databases. For the sake of discussion, both will be explained.

The replication scenario for this solution would probably involve database changes and some minor application changes to take advantage of the new information. This solution dictates the use of either transaction-based or scheduled replication, depending on the requirements of the applications. Is data that is 30 minutes old too old? If so, then transaction-based replication is necessary; if 30 minutes is sufficient, then you could implement scheduled replication. Scheduled replication would free some of the network resources, whereas transactional replication would ensure a more consistent view of the data. Replicated information would probably be horizontally partitioned, taking only selected rows based on a last update timestamp.

As in the replication scenario, distributed transaction processing would require some application changes. The applications would need to send update transactions to the Distributed Transaction Coordinator to ensure the atomicity of the transactions. Remember that atomic transactions ensure that either both databases are updated correctly or both transactions roll back, thus ensuring data integrity between the databases. With this in mind, you would need to install and configure MTS on a Windows NT server, potentially on one of the database servers.

By implementing replication or MTS and DTC, you have resolved the communications bottleneck between the two SQL Server databases. You have also allowed visibility to the order status information by the salesperson. The increase in the organization's ability to service its customers should result in additional sales and, at a minimum, allow the organization to maintain its current level of sales and service. The last requirement to fulfill is the integration of the legacy customer information.

For the task of legacy integration, you can use Cedar. This will require some additional development on both the sales and manufacturing systems to take advantage of the new legacy components. With DCOM installed on the mainframe, you can begin to develop your mainframe components, specifying the inputs and outputs and registering the components with MTS and the client machines. Cedar itself will reside on a Windows NT server. The server could be either of the two SQL Server servers. The server must have Microsoft's SNA Server version 3.0 or later to allow for communication between the platforms.

Using the DCOM components, the sales and manufacturing applications can be altered to allow visibility and update capabilities to the legacy system. This not only helps to eliminate the paper flow from the systems, but will also create a synergistic environment out of disparate technologies.

Figure 21.2 shows the finished product using replication, while Figure 21.3 uses distributed database transactions. These three disparate systems now communicate with each other to transfer data seamlessly. This creates not only a more efficient organization, but also the foundation for a new development methodology. The next time a project such as this one lands in your lap, you now have the knowledge and expertise to implement a distributed and integrated solution.

FIG. 21.2
The same three disparate systems are using a replication scenario that will allow for the successful sharing of information between the isolated departments.


TIP: Given the approaches in Figures 21.2 and 21.3, I prefer the distributed database transaction method. As a developer I prefer to have control over the data movement because it allows me to respond to error messages or use queuing to help alleviate slow response. I also prefer to rely on the database server as little as possible. In large OLTP applications, it is easy to overburden your database server, so remember that scalability should be a primary concern on every project. By separating the distributed transactions from the database server, you can gain some scalability options. In n-tier architectures, you can scale your middle-tier servers or your database server(s) for your individual application needs.

FIG. 21.3
A distributed transactions-based system will now move data between isolated groups of users, creating a more synergistic environment.

Keys to Success

The key to successfully implementing this type of solution is the proper utilization of technology to link and merge disparate systems. It would be nice to have unlimited time and resources so that all applications could be rewritten to a common platform within an organization, but this is rarely the case. Developers are often judged on what they can do in as little time and with as few resources as possible. The preceding discussion showed one alternative to the problems presented, but this is by no means the only solution. Using the technology available and leveraging existing assets within an organization, you can create a win-win scenario. The users get the system they desire, and the IT staff responsible gets credit for delivering it. This can help to build a good relationship between the IT staff and the business that it serves.

Disconnected Client Applications

Disconnected client computing is substantially more difficult than connected client computing. Client machines are not connected to a network full-time. Thus, you must allow for offline usage while allowing for connection at the user's convenience. Much like connected clients, the application architecture can be focused on departmental concerns, having the same issues with respect to data sharing and flexibility. This architecture can produce significant benefits by placing salespeople in the environment of the customer.

The following example presents a situation similar to the one used in the connected client example. Departmentalized data must also be shared, but this example has the added complexity of remote users and a DB2 legacy database.

A Real-World Disconnected Client Example

Many organizations are realizing the benefit of having sales people in the field, visiting customers. The appropriate term is face time. This yields significant sales and profit increases but is also much more challenging to you, as the developer. As you have seen, Microsoft provides a plethora of tools to handle these development efforts. This section deals with an example similar to the one in the previous section. The noticeable differences are a disconnected client and the use of DB2 as not just a legacy data store but as a central repository, similar to a data warehouse.

The user requirements are also similar. The disconnected client data needs to flow into the sales system's SQL Server database. This data needs to be pushed to a data warehouse residing on a host system running DB2 as the RDBMS. As in the previous example, the manufacturing system needs visibility to this data for planning purposes. In this example, the assumption is made that the disconnected client application will be implemented during this development effort.

Figure 21.4 illustrates the proposed disconnected client system, the current legacy system--which will become the data warehouse--and the manufacturing system.

The Role of Key Technologies

There are many challenges to the disconnected client approach. The first is the design and implementation of the disconnected client sales application. After the sales application is built, you then must deliver the data to the data warehouse in a reliable and timely manner. Finally, the data from the data warehouse must be delivered to the manufacturing system in adequate timeliness for planning. The distribution of data in this manner is done to separate the OLTP environments from the Decision Support (warehouse) environment. This will help to alleviate any potential bottlenecks and segregate the data by usage.

In building the sales application, many issues must be addressed. One such decision is which development tool to use. Another decision will be the data store necessary to store user data, including modified data, which can then be sent to the sales system database. The last major decision is the mechanism of transmitting data between the disconnected client and the sales system database.

In choosing the development tool for the sales client application, Visual Basic and Visual C++ are both highly rated options. Both of these provide the flexibility necessary to use COM and DCOM components while still rapidly building applications. Database connectivity is also fairly easy using these tools.

FIG. 21.4
The proposed disconnected client system; the current legacy system, which will become our data warehouse; and the manufacturing system as separate entities within the organization.

The next decision is the local data store. Microsoft Access has been proven to work effectively in these situations. Properties such as size of footprint and ease of program connectivity make it a viable solution. One note of caution is to pay careful attention to the amount of data that will be transferred to the user. Supplying the users with all the data that they might need, given any situation, is costly and time-consuming. The better alternative is to allow the user to determine what data will be needed for a given period of time. For instance, if the user is going to visit three customers and will then be transmitting data back to the office, supplying the user with the data necessary for just these three customers might be sufficient.


CAUTION: Pay particular attention to the amount of data that will be transferred to the user on a periodic basis. Large sets of data will increase communication costs. It will also increase the costs associated with a user waiting for data to be loaded. One approach is to send only changed information. Another approach is to let the users select what data they need for a given period of time. This allows them to select as little or as much data as they want.

One reason for selecting a robust development tool, such as Visual Basic or Visual C++, is that you can programmatically control the messaging architecture. Using MSMQ, the client application can update a local database with changes and write the transactions to a local MSMQ message queue to be transmitted at some later time. After the user connects by modem to the corporate network, the messages in the queue can be transmitted to the internal message queue, routing to the proper queue by referencing the GUID of the destination queue. Once in the destination queue, the transactions can be processed, making updates to the SQL Server database as necessary. Any in-doubt transactions or the collision of concurrent updates will need to be handled by an internal administrator or processed by some predetermined rules, possibly implemented using SQL Server stored procedures.

Transactions in the destination queue can also update the DB2 warehouse using the OLE DB/DDM driver over an SNA Server connection to the host environment. These transactions not only update the warehouse and the SQL Server database, but they can act as an atomic transaction (either both updates are posted or neither is posted). This allows distributed applications to maintain a reliable level of data integrity, one of the key challenges in disconnected client computing.

Thus far the discussion has covered receiving data from distributed users, but MSMQ will also allow you to transmit transactions to the user. These transactions, once processed, will load the client database with current data for processing. The creation of the data subset, to be sent to the user, can be done using stored procedures, based on user requests for specific pieces of data.

For communicating data from the DB2 warehouse to the manufacturing SQL Server database, you could use the Host Data Replicator. The data, as in the connected client replication discussion, can be horizontally partitioned and used as a recurring scheduled replication. This should provide timely data to the manufacturing system, assuming that real-time data is not absolutely necessary. Given the nature of disconnected client computing, real-time data is usually not an option. Day-old data is inherent in the disconnected client approach because users will typically communicate once or twice a day. There might also be situations where users can choose to send every other day or so. The issue of communication frequency is one that should be resolved by a good set of business rules and guidelines set forth during the design stage.

One added piece of information regarding communications has to do with phone lines and services. Users who are fairly geographically centered might be able to communicate from the same location, such as their home, for every transmission. For users who spend a larger part of their time on the road, it is prudent to experiment with various phone lines because the quality can never be guaranteed. Some services allow dialing through a national or local Internet Service Provider (ISP) regardless of location, using one phone number.


TIP: During the establishment of business rules and the design of the system, factor in the inability of users to communicate for varying periods of time. Determine the impact that this lack of communication has on the business and the system. Then present solutions to the users during formal training sessions.

Figure 21.5 shows the final solution using the disconnected client architecture.

FIG. 21.5
These three disparate systems can now communicate with each other to transfer data seamlessly, and the salesperson can now spend time in front of the customer.

Keys to Success

The most important key to success with the disconnected client is a good set of business rules. Things such as which user wins on concurrent updates and specifics regarding the timeliness of data transmissions can make huge differences in the development cycle. All user requirements and rules should be thoroughly documented and even signed by the users. This gives everyone a clear definition of what is to be delivered. Good business rules can also ease support of such systems, which can be costly and time-consuming. Another key is the successful implementation of the technologies discussed.


NOTE: One of the most important factors, often understated during any development project, is the clear definition of business rules. It will behoove you to get, in writing, a detailed set of business rules. By taking an afternoon and brainstorming as many potential pitfalls or gaps in a proposed system and getting specific answers to those questions in writing, you will save yourself a great deal of time and headache.

Internet/Intranet Alternatives

Internet/intranet (I-net) technologies present some very real opportunities to extend the classical client/server model. Microsoft has, as this book discusses, delivered an exceptional suite of tools for developing both Internet- and intranet-based solutions. The ability to use the thin client model versus the fat client model and the use of Java are two topics of interest when discussing distributed applications.

The thin client is one in which the application runs on the server with users accessing it through an Internet browser, such as Microsoft's Internet Explorer. This is in contrast to the fat client, which is the more traditional application architecture that uses executable files and has a fairly large amount of information stored on the client. Two advantages to the thin client are the ease of maintenance (all the application files are located on the server) and the simplification of data access compared to the disconnected client architecture. One downside is that the user must be connected, which as mentioned earlier might be prohibitive due to the nature of the business.

Java is more than a buzzword or just a lot of hype. This programming concept has the power to significantly change the way developers approach application development and the way organizations approach the purchasing of hardware. Through the use of a Java virtual machine (JVM), the developer can write one set of application code that can run on any platform that has a supported JVM. Microsoft has extended pure Java and allowed for some of the advanced controls available to the Windows environment.

This section is just a cursory discussion of the benefits of Internet technologies and Java to raise your awareness of the options available to you. For more detail on the specific benefits and implementation of these tools, refer to the other chapters throughout this book that deal with web-based application development.

From Here...

In this chapter, you learned about the products that Microsoft has delivered for building and integrating distributed applications. You learned of the two basic types of client/server computing, connected and disconnected, and some of the challenges that are inherent in these approaches.

For more information regarding the topics covered in this chapter, see the following additional chapters:


Previous chapterNext chapterContents


© Copyright, Macmillan Computer Publishing. All rights reserved.