
by Jody C. Socha
Over the years, there have been several attempts to develop software products to streamline the development process. Examples of these tools range from compilers to modeling tools to full suites of CASE tools. What has been missing from these tools, however, is a vehicle to tie all the information together so that it can be shared from one tool to another. This missing tool is the repository.
This chapter introduces the Microsoft Repository. It presents the various components of the repository and discusses how those components are used. It also discusses how your development environment needs to be changed to support using the repository.
NOTE: The Microsoft Repository is included with Visual Studio 97 and Visual Basic 5.0. It is installed automatically when you install Visual Basic from either of those sources.
The word repository by itself is insufficient to describe what the Microsoft Repository is about. The definition of repository in the dictionary is a place to store things. The repository's technical architecture is made up of a database with a set of interfaces that can be used to access the data in the database, which describes just about every database application in existence. Thus, with some careful planning, the repository could be used to store just about anything.
On the other hand, the Microsoft Repository (and its competitors) is meant to serve a more specific purpose. Namely, that purpose is to store information describing applications and information systems. For example, suppose you have a Visual Basic application that uses ODBC to access an MS SQL Server database to manage a store's inventory. The repository could be used to store the various Visual Basic forms, classes, methods, and properties that comprise the code. Another portion of the repository would describe the SQL Server database tables, columns, stored procedures, and triggers. In addition, the repository could be used to organize any requirements and design documents used in developing the system. Thus, the repository would contain a complete description of the application in one central location.
Together, these two perspectives combine to create a potentially powerful effect. The flexible architecture of a repository should allow you to store nearly any information about a system that you want:
However, simply storing all this information does not unlock the true power of using a repository. The real benefit is not derived until the various pieces of data stored in the repository are integrated together to form a single, complete picture of the system. Now, you can determine which sections of code will be impacted if you change the name of a column in the database. Alternatively, you can see which parts of the Visual Basic application utilize the interface into the C++ OLE server that you want to rewrite. Theoretically, you could see what would happen to the accounting application used in the Atlanta office if a change is made to the inventory maintenance application in the Detroit office. It is this ability to measure the impact of a change across a system or systems that makes the repository so potentially powerful.
A repository is not a Computer Aided Software Engineering (CASE) tool. CASE tools provide various modeling and analysis techniques for developing software. A repository provides only the storage mechanism for information about your development environment. What a repository can do is work together with a CASE tool to store the results of an analysis effort. That information then can be shared with other analysts or developers. Alternatively, the information can be related to the results of other CASE tools to provide an integrated view of the entire environment.
Given its flexibility, the repository can also play a role in a data warehousing environment. The repository stores the structure of the data in each of the operational systems and the target warehouse system as well as the relationship map between the two sets of structures. This setup can greatly reduce the complexity of updating the warehouse when one of the operational systems is modified. In addition, if a more detailed analysis is required than the data warehouse provides, a pointer to the appropriate operational system data is available through the repository.
Now for the bad news. Very little of this potential power is available by simply starting the Microsoft Repository application. The Microsoft Repository in its basic form provides little more than a generic database structure and set of interfaces that serve no specific purpose. Only by developing a series of additional interfaces, plug-ins, and programs can the repository be truly utilized, and this requires quite a bit of work.
The remainder of this chapter is divided into three sections. The first deals with how the repository is structured. The second moves on to cover how to configure and use the repository. Finally, the third discusses how to reengineer your development process to take advantage of the repository.
The following section describes how the Microsoft Repository is organized. Consider the repository to be a series of layers controlling access to the repository's data at the center. The following discusses each of those layers and the function each serves.
The repository itself is organized into three basic components. The following list describes those components, and Figure 28.1 illustrates them:
FIG. 28.1
The Microsoft Repository is organized into several layers designed to give specific
applications access to a generic interface.
There are three basic methods for communicating with the repository, depending on your purpose:
Each of the repository's components and its functions are discussed in more detail in the following sections.
The heart of the repository is a standard database. The default database for the repository is created when the repository is installed on the computer. The default location is C:\Windows\MSApps\repostry\repostry.mdb.
The Microsoft Repository uses either its Jet database engine or SQL Server to store the data. The repository engine then talks to this central database through an ODBC connection. It is your choice which platform to use. Whenever you utilize the repository, you need to establish a connection to the appropriate database and then pass that connection to the repository engine. Then, through the engine, you can either set up the repository's data structure, or use the information stored in the repository.
There are basically two sets of tables in the database. The first are the tables that the engine uses to store the repository's structural information. These tables store the information that describes the various TIMs that have been created. The second set of tables is used to actually store the data you are interested in storing. These are the tables to which the TIMs allow access.
The primary method for accessing the repository is through the TIMs and the engine. However, this does not mean that the only way to pull information out of the repository is through the engine. You can establish an ODBC connection to the database if you desire. You can then query the repository tables directly, thus simplifying obtaining data for reporting purposes. However, you are strongly encouraged to only read information out of the database. Sticking information into the database circumvents the engine's consistency-checking capabilities and can lead to unpredictable results.
The engine manages data in the repository. Together the engine and the database form the repository itself. All of the other components are a series of interfaces used to access the information in the repository. However, unless you are writing one of the intermediate extract applications, you will not interact with the engine itself. Thus, while you want to be familiar with its basic purpose, there is not much about the engine that you really need to know.
The engine provides the main control mechanism for the repository. It controls how data is written and read to/from the database, and it controls the basic repository structure. It provides the necessary consistency checking to ensure the integrity of the data.
The engine provides a set of classes and methods that can be called to perform various functions such as connecting to the repository database. This set of interfaces is documented in Microsoft's Repository Programmer's Guide.
Tool Information Models (TIM) are the definition of what information you would like to store in the repository. The word "Tool" is slightly misleading because a TIM can be used to store information from more than one tool. You have the option of creating your own TIMs or using any predefined TIMs that are available.
There is an initial TIM that Microsoft provides called the Microsoft Development Object Model (MDO Model) for use with Visual Basic. This model is created for you on the repository when the repository add-in for Visual Basic is activated. The repository add-in is provided when you install Visual Basic either from Visual Studio 97 or the Visual Basic 5.0 Enterprise Edition.
MDO is the only predefined TIM available from Microsoft when this book went to press. However, Microsoft has announced plans for releasing its Open Information Model (OIM), which promises a non-tool-specific structure for repository storage. Microsoft is working hard with third-party vendors to share the OIM concept.
There are some third-party vendors that offer predefined TIMs although they should be moving towards using the OIM.
ON THE WEB:See the Microsoft web site, http://www.Microsoft.com/repository, for more information on the repository including information on the latest status of the open information model and sample applications.
If you intend to write your own TIMs, then you need to become familiar with the Type Information Model. The Type Information Model is a special model that is used by a TIM to describe itself to the repository. The type information model provides the following basic concepts:
The type information model is sufficiently general to allow you to describe any structure to the repository. For example, imagine you wanted to keep track of the team members assigned to a project. The following steps explain how the type information model could be used to describe this information. More information on the type information model can be obtained from Microsoft's Repository Programmer's Reference Guide.
ON THE WEB:See the Microsoft web site, http://www.Microsoft.com/repository, for more information on the repository including the Repository Programmer's Reference Guide.
An example of a predefined TIM is the Microsoft Definition Object (MDO) model that is provided for use with Visual Basic. More information on the type information model can be obtained from Microsoft's Repository Programmer's Reference Guide.
Also, the type information model makes it possible to extend predefined TIMs by simply adding new classes, relationships, properties, methods, and interfaces to the existing model. You could also establish relationships between two different TIMs by adding a new relationship and assigning a class in one model as the relationship's source and a class in the other model as the relationship's destination. Be careful, however, not to change the meaning of a predefined TIM's components. Outside tools may use the TIM to store data and may not store it the way you want.
Microsoft has also announced that it is developing its Open Information Model for the repository. This model is supposed to provide the foundation for all third-party vendors to use so they populate the repository in a common manner. For those people unsure about implementing a repository, it would be a good idea to wait until this new model is released.
Around the outside of the repository are the various applications that use it. These applications generally fall into three categories:
In an ideal environment, the repository configuration should enable you to truly open up your development environment. Of course, the world is not ideal, and there will be limitations in Microsoft's ability to support the open concept.
The first level of openness comes at the database layer. Because the repository uses an ODBC connection to a standard relational database, any commercially available database could be used for repository data storage. However, Microsoft advertises that the repository only works with its Jet database engine or SQL server.
The second level of openness deals with sharing data between tools. For example, information about classes in a design tool could be extracted to automatically create classes in a development tool. At the present time, this type of information sharing will be difficult due to the lack of maturity of commercially available TIMs and applications. In the future, this capability should grow continuously stronger. For example, the Microsoft Open Information Model promises to solve many of these problems.
The third level of openness deals with the ability to actually replace tools. Theoretically, the repository should be capturing your tools data in such a way that that tool could be pulled out and replaced with another similar tool. Naturally, this level of openness is impossible. Vendors are not going to freely give away proprietary information just so you can switch to a competitor's tool. However, if similar tools all reference the same TIMs, then transitioning to a new tool can be greatly simplified. Again, maturation of the TIMs and TIM applications will go a long way to helping with this problem.
Utilizing a repository-based development environment takes planning to configure the pieces. You need to examine your current development process, identify resource needs, and assemble the various components that you need.
Setting up your environment to use a repository is not much different than any other software development effort you will embark upon. You need to lay out the requirements to identify what information should be captured in the repository, and how you will analyze that data. Then, you need to design the repository by identifying what plug-ins, Tool Information Models, and analysis tools need to be obtained or built. Finally, you will need to build and test the repository once all the components are ready.
You also need to consider who will be responsible for maintaining the repository. People will be needed for system administration, developing any additional applications, and analyzing the repository's contents.
You will need to reexamine many of your development conventions. At what points in the development process will information need to be stored in the repository? How often will it need to be updated? What role will the repository play in managing change requests? Are there any coding conventions that should be revisited to allow better information transfer?
You will need to examine your environment to understand your development needs. Figure 28.2 shows the development lifecycle according to the standard "waterfall" approach. The waterfall method consists of a series of steps in which one step does not begin until the first had completely ended.
FIG. 28.2
A traditional view of the software development process is the waterfall method.
Moving to a repository-based environment means that you want to enable sharing of infor-mation between each phase of the lifecycle through centralized data storage. This new information-centered environment is shown in Figure 28.3.
Naturally, using a repository in your development environment will require additional manpower. Whether that manpower need is filled on a full-time or part-time basis depends on how extensive the repository system will be. You will have to experiment to understand how much of a strain the repository will put on development.
FIG. 28.3
When the repository is added to the development process, it assumes the role of a
centralized, information-sharing manager.
Several roles need to be performed in a repository environment.
System Manager As with any system, someone will need to perform administration on the system. A security scheme may need to be established. The database will need to be configured and maintained. Software tools that share data with the repository will need to be configured and maintained.
Modelers/Analysts One or more individuals will need to provide the foundation of the repository by building and assembling the various TIMs. These individuals will need strong analytical skills because they will be determining what data needs to be stored in the repository and how that data should be obtained from the various tools. A well-designed TIM should be able to accommodate more than one tool. In addition, this team will need to analyze any commercially available TIMs, such as the MDO Model for Visual Basic, to determine if the necessary data is being captured and how that model integrates with the rest of the repository model.
Developers Once the TIMs are in place, a group of programmers is required to build the software needed to actually use the repository. One set of software will be built to extract the information out of the various software development tools into the repository. A second set will be built to browse and manipulate the data as well as produce analysis reports. Alternatively, team members can research the various prebuilt components available. For example, there is already a Visual Basic add-on for populating the MDO Model, and Microsoft provides a default browser. In addition, several other vendors are already developing repository interface software. It would be worth the effort to research these commercial offerings before embarking on a time-consuming development effort.
Users The user of the repository is your organization's software development team. Because the repository stores information about the development environment, the repository will eventually prove invaluable in analyzing your application's structure, tracking the use and reuse of software components, and analyzing the impact of change on the application or applications. Thus, everyone from the project manager to coders to quality assurance could utilize information contained in the repository. Regardless of how an organization wants to staff the repository team, the organization definitely wants to assign their most competent and experienced personnel. If you have a goal of a repository-based development environment, then consider that faulty data in the repository will lead to faulty development decisions, leading to faulty software, leading to errors in operations. Moreover, repository data management will be more complex and abstract than any other form of development effort you are undertaking.
The components that you need to assemble the repository are dependent on your development environment. Thus, you will need to examine each phase of your development environment and determine how a repository should be utilized in each phase.
Define the Overall Structure The first and simplest component to identify is the repository itself. For it, you will need to identify what database server you want to use, which is based on how much information you think will be stored in the database. Then, you need to configure the database and install the repository engine. The challenging part is to develop the various TIMs and extraction applications that surround the repository and provide the interface for the various lifecycle phases. You can use Table 28.1 as a high-level planning tool for understanding your repository needs.
| Development Phase | Products | Toolset |
| Requirements | Requirements Documents | Word 97 |
| Design | Object-Oriented Design Design Documents | Visual Modeler Word 97 |
| Development | User Interface Code Business Rule Code Database | Visual Basic 5.0 Visual C++ 5.0 SQL Server 3.1 |
| Testing | Test Plans and Procedures Test Results | Word 97 Word 97, Excel 97 |
| Implementation | Installation Procedures | Word 97 |
| Project Management | Project Plan Issues Log | Project Excel 97 |
NOTE: The tools listed in the table all happen to be Microsoft products. However, there is no reason that tools from other vendors cannot be used.
Notice that for any development phase, several methodologies and tools may be used. Likewise, more than one tool may be used to implement a product, or the same tool may be used in preparing more than one product. Notice, also, that some of the tools are specifically geared toward producing the specified product (such as Visual Modeler or SQL Server), whereas others are more general and have been adapted to fill a need (for example, Word and Excel).
Identify Information Requirements After identifying these products and tool sets, the next step is to identify your information requirements. At this step, you need to ask the question, "What information about this product needs to be shared?" Notice that you need to capture information about the product, not the tool itself. The tool will be utilized in determining how to capture the information, not what information needs to be captured. The information that can be extracted from the product may be dependent on how well organized its contents are, and you always have the option not to capture any information. For example, suppose that an analysis of several requirements documents reveals several pages of simple paragraphs in no particular order, describing in general what you want the proposed system to do. Because there would be limited benefit in placing any of this information into the repository, you choose to store nothing. Placing the document into a centralized directory accomplishes the same thing much easier.
On the other hand, if the document is subdivided into subject areas and the requirements are numbered, named, and prioritized, there may be benefit from pulling in this information. The repository could provide a simple mechanism for browsing and reporting on the requirements. Furthermore, the potential to cross-reference the requirements to the system's various design components exists to provide requirements traceability throughout systems development. You probably will also want to capture the name of the document as well as file/directory information.
In the design phase, you identified that your project is developing an object-oriented design (OOD) with supporting design documents. Because OODs utilize a structured methodology, it simplifies identifying information requirements. An OOD generally includes classes, attributes, methods, and associations. Several more conventions are also available depending on which specific methodology you are using. You probably also will want to capture filenames because a design will probably be spread across several files. This information is useful in identifying what class modules need to be developed in the code as well as how those classes should work together. The design documents contain definitions of the classes in the design that were created by exporting reports from the design tool that are formatted with Microsoft Word.
The same step of identifying information requirements is repeated for the other development phases. Be sure to include support processes such as project management and quality assurance in the analysis. Table 28.2 shows some of the results.
| Development Phase | Products | Information Requirements |
| Requirements | Requirements Documents | Document Name, File, Requirement(Priority) |
| Design | Object-Oriented Design Design Documents | Class (Method, Property) Association not used |
| Project Management | Project Plan | Developers, Milestones, Issues, Develop to Issue Association |
Installing and setting up the repository is a fairly simple process, mainly because it's installed and configured for you when you install Visual Basic 5.0 using either the Enterprise Edition or Visual Studio. The remainder of this section discusses how you can configure the various components of a repository.
The TIMs define what information the repository will store. You have the option of building your own TIMs or using predefined TIMs. Building your own TIM gives you more control over what information is stored in the repository. It will also make it easier for you to relate how information in different tool sets relates to each other. However, building your own TIM will involve a lot more work. Using the predefined TIM simplifies the process, and you can extend the existing model if you have special needs. Plus, vendors will provide their own extractor applications for use with their TIMs, further reducing the amount of work you need to do.
Using Existing TIMs versus Building Your Own For the time being, you should probably not write your own TIM unless you have a special need. There are several reasons for this. First, the repository is in its earliest stages. Second, Microsoft has announced the OIM, which is intended to provide the foundation model that other vendors will tie their models into. Thus, give the repository and the OIM time to mature before you invest heavily into developing your own models. Keep checking the Microsoft repository web site for information on the OIM and a list of third-party vendors. This does not mean that you should not learn how TIMs are built. Most likely, the TIMs provided to you will not meet all of your needs. At some point in time, you will probably need to extend a TIM that involves the same basic techniques as building a complete TIM.
Using Existing TIMs Using an existing TIM is an easier exercise. Typically, the vendor should supply the application needed to install its particular TIM onto the repository. For example, the Microsoft MDO Model is installed on the repository automatically when the repository add-in for Visual Basic is used. You should become familiar with the model and the information it stores.
Building a TIM If you want to build your own TIM, first make sure you become more familiar with the type information model and its components. This step is important because when you lay out the design of your TIM, you will need to classify the design components into the type information model categories. You also will need to identify which items in the design relate back to the repository root object. After you have developed your design, you will need to develop a program that will configure the TIM on the repository. An example of a program to create a TIM can be found on the Microsoft web site as the Create Simple Database TIM download. The program should perform the following basic actions in order.
NOTE: The file Reputil.bas is installed with the repository and provides several useful functions for building TIMs and Extractor applications. The functions help you to determine whether a TIM already exists, retrieve a TIM from the repository, create and retrieve repository objects, and extend the ReposRoot class to implement a new interface for a TIM.
CAUTION: Whenever you are populating the repository with data that should be treated as a set, use the transaction begin, commit, and rollback methods. Otherwise, each transaction will get committed individually, and, if a fatal error occurs, an easy method to remove the corrupt data will exist. For example, when creating a TIM, wrap the entire TIM creation in a transaction so that the TIM will not be partially implemented in case of a failure.
ON THE WEB:In addition to the example program available at the Microsoft web site (http://www.Microsoft.com/repository), you also can find full code examples for the preceding procedures at http://www.gasullivan.com/vs97book.
The preceding discussion outlines how to create a TIM but does not aid you in identifying the structure of the TIM. As an example of what the structure would be, reconsider Table 28.2 and start laying out your model. Use the list of development processes and products to identify subject areas for your model. From there, identify the classes, interfaces, relationships, properties, and methods needed to implement your identified requirements using the Type Information Model as the main structure. Be sure to identify which classes refer back to the repository root object for browsing purposes.
For example, based on Table 28.2, you would probably want a separate TIM for your requirements documents. Within that, you would create the following items:
Look for opportunities to reuse classes, if possible. For example, almost all tools utilize files to store data. So, a File class could be reused across all TIMs to capture filename, directory location, and last modification date.
Identify relationships across subject areas. These relationships eventually will provide the real power behind the repository for analyzing cross-product change impacts. For example, a requirement in the design document could be related to the design class that implements it, such as an employee class that implements the requirement "The system will allow employees to store their timesheets online."
Although the repository structure is in place through the TIMs, your work is not done yet. You need to develop and assemble the various extraction applications that will populate the repository. Alternatively, you need to configure any predeveloped extractor applications that you want to use.
Building the Extractor An example of a program to populate a TIM can be found on the Microsoft web site (www.microsoft.com/repository) as the Populate Simple Database TIM download. See the earlier section "Building a TIM" for more information on the sample applications. The program should perform the following basic actions in order:
NOTE: The method for referencing an object for data storage depends on how the TIM's interfaces are structured. For example, a requirement in a requirements document has been rewritten, and you want to update the requirement in the repository. Making the change will involve identifying first the requirements document and then the requirement itself.
Extractor Considerations At this point, your main concern is with the tool used to produce the product than with the product itself. For example, you will want to know if the tool in question stores its information in a proprietary format. In this case, you need to research the tool's exporting and reporting mechanisms. Hopefully, you can dump the tool's information into a simple text file that can easily be parsed by an import routine. Otherwise, your only other choices are to add the information manually or get another tool. Consider a requirements document written in Microsoft Word. You probably cannot gain information on Word's storage format, which will change from version to version anyway. Instead, you could write a Word macro to dump the desired information to a presorted text file, and then letting the extractor application move this information to the repository. You could, of course, simply prepare your requirements in a text document, but then you would lose sophisticated formatting capabilities such as bolding, headings, and so on. A developer should not have to give up these capabilities in order to use the repository.
Your second item of concern should be examining how the development products are organized to determine how easy it is to obtain the information required. For example, if the Word documents are just a series of standard paragraphs, your ability to pull out specific information is limited. You are better off placing the document in a centralized directory. On the other hand, imagine a document organized into a series of tables with columns labeled as Requirements ID, Name, Description, and Priority. In this case, code could more easily be written to sort through the file and pull out the data in its individual pieces.
CAUTION: This does not mean you should run and reorganize all of your products into well-defined structures. This type of activity could strangle the flexibility and creativity of the development team. Of course, a total lack of structure makes the products hard to read and use. Some type of balance between the extremes needs to be reached.
Once again, you have the option of using vendor-supplied applications or writing your own. The choice at this time is easier. Basically, if you have written your own TIM, you need to write your own extractor. Otherwise, if you have used a vendor-supplied TIM, you should use their application for populating the repository. The same rules apply in a hybrid situation. If you have extended the vendor-supplied TIM, you will need to write an application to fill in those extended portions.
The final piece of the repository puzzle that you will need is a set of reporting and analysis applications. Again, you have the choice of using predeveloped applications such as Microsoft's Repository Browser to do the work for you or building your own.
An example of a program to analyze the repository can be found on the Microsoft web site (www.microsoft.com/repository) as the Use Simple Database TIM download. See the section "Building a TIM" for more information on the sample applications. The program should perform the following basic actions in order:
The browser is a basic analysis tool that is provided with the repository. It usually can be found in the bin directory in the repository directory in the Visual Basic installation area. Its name is Repbrows.exe.
The browser will initially connect to the default repository database, although you can connect to other databases as required (see Figure 28.4). You will either be prompted for an Access type file or ODBC connect data if you are looking for a SQL server database.
FIG. 28.4
The browser can connect to other repository databases.
Once connected, the browser uses the standard windows tree view to show the repository structure. Figures 28.5 and 28.6 show the structures of two TIMs and how they show up in the browser.
FIG. 28.5
The browser can be used to view the MDO Model components.
FIG. 28.6
The browser can be used to view the Simple Database Model components.
The following examples are intended to get you thinking about the complexities of a repository-based environment. Remember, the repository is just another piece of software, and, as such, it cannot "think" for you. When parsing a text file, for example, the search tool will expect certain information in certain positions or highlighted by specific keywords. If the file is not set up correctly, then errors will be reported or, worse, a large amount of corrupted data will be stored in the repository. Using the repository in a sloppy development environment, where coding standards are not enforced and coding is "quick and dirty," will produce little to no benefit. However, disciplined development can make the transition with relative ease.
For the following "Variable Declaration Example" and "Comment Formatting" sections, assume that the repository has been set up to extract information out of a Visual Basic application about its classes.
The following example of a variable declaration in Visual Basic is simple. In it, a Course class needs access to the assigned instructor information through an Instructor class. In the header portion of the file, the following code appears:
`Set up the instructor
Private mvarInstructor as Object
Now, suppose that you want the repository to track the relationship between classes in your application. One way to do this is to have the tool-repository extraction application scan through the public and private variables to find those that reference other classes.
Unfortunately, in the preceding code there is no direct way to tell which class the member variable represents, so there is no way to build the Course-to-Instructor relationship. A quick solution is to avoid this form of vague data typing and declare the variable specifically:
`Set up the instructor
Private mvarInstructor as New Instructor
Now, the extraction application can immediately relate the Course class to the Instructor class. An alternative could be that, when a variable defined as an object is encountered, the extraction application would immediately look for that variable's assignment operation elsewhere in the file:
`Set up the Instructor object (in Class_Initialize)
Set mvarInstructor = new Instructor
Unfortunately, there are problems with this approach. First, the parsing algorithm will be slower the more searching that needs to be done. Second, simply finding the word Set and an equal sign in the code does not mean that you have found what the class is being set equal to. The following is a better alternative:
`Get the Course's Instructor from the collection (in Class_Initialize).
Set mvarInstructor = mvarInstructors.Item(iInstructorID)
At this point, the extraction application can still determine that the mvarInstructor variable is an Instructor class by interrogating the data type of the return parameter of the Item method on the Instructors collection class (unless, of course, the return parameter is also declared as an Object).
A second alternative could be to use a special comment above or after the variable declaration:
Private mvarInstructor as Object `Datatype: Instructor class
Alternatively, you could access the repository through a special browsing tool and add the relationship manually. However, a lot of work could be avoided by simply declaring variables as their specific types wherever possible, which you should be doing anyway. A similar set of arguments could make for avoiding the variant data type when possible, also.
The method that you use to format comments in the code determines how easily that comment information can be stored in the repository. Information is extracted about each class's public variables, methods, and properties. It is also safe to assume that, if the information is available, you would want to store any descriptions or definitions about classes, variables, methods, and properties.
A good candidate for a source of descriptions is the code itself, if comments are provided. Then, as the extraction application is parsing the file, the comments could be sucked up at the same time the descriptions are obtained. Sounds easy at first, but there are always problems.
First, if you are on one of those projects that do not comment the code, then this option is not even available. You will either need to find an alternative source of descriptions, such as from a design document or a class dictionary document, or add the descriptions to the database manually, or not capture descriptions at all. Actually, it is doubtful that an organization that does not comment code will find much benefit from using the repository in the first place.
Second, even if your organization does comment code, the comments are not necessarily readable or even useful. Consider this simple example:
`'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
` This file implements the Instructor class.
`'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Yes, there is a comment there, but it contains so little information, it's not even worth extracting. Here is another, more correct header comment:
`'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
` This file implements the Instructor class. An Instructor is a person
` who teaches one or more classes. This class is actually derived from ' the Person class. It has the following public properties and methods: ' Name, instructor, ExamineClassSchedule
`'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
A quick read of the comments reveals that much more information is available. However, parsing it into subcomponents is impossible. Examining the comments reveals that
What you might actually need is a standard comment header template that can be easily parsed for all relevant information. The point is that the following commenting style is much easier to parse through and separate the data into specific components. On the other hand, your programmers may resent having their "hands tied" and fail to provide adequate comments. Of course, using a more rigid commenting style makes the code easier to read for humans as well as the computer. On the other hand, you do not want to require 500 different categories of information for people to fill in, either. One of your jobs in implementing a repository will be to strike a balance between these two extremes.
`'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
` File: Instruct.cls
`
` Class: Instructor
`
` Description:
` An Instructor is a person who teaches one or more classes.
`
` Derived From: Person (virtual)
`
` Public Properties:
` Name The full name of the Instructor.
` InstructorID Number assigned to the Instructor by the ' college.
`
` Public Methods:
` ExamineClassSchedule Analyzes the times the instructor is
` scheduled to teach and identifies any scheduling conflicts.
`'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Of course, even a standard template can have problems. Each of the following two comments would require a slightly different parsing technique. The extraction application will need to include adaptable parsing algorithms, your developers will have to impose discipline in their commenting styles, or you do not get the information stored into the repository.
`Description: The description goes here. `Description: ` The description goes here.
One use of the repository is to store basic information describing the structure of a database you are using or developing. Extracting this information from a database should be a relatively straightforward task. A simple TIM can be created to store information about tables and their associated columns. Then, an extraction application can be built to access the database's data dictionary tables and read out the table and column data.
However, extracting such data by itself will not provide a great deal of useful information, especially because this type of information is generally tracked by organizations through various data-modeling tools and dictionaries.
More useful information is available by identifying which portions of an application's code access which tables and columns in the database, or to identify which stored procedures reference those tables and columns. With this kind of data, you can quickly assess the potential impact on your development environment if a change is made to the database's structure.
As usual, saying that you want this information and actually obtaining it are not the same thing. Consider the following sample code for your fictitious Instructor class module. The Instructor class uses a class called Database with a variable name of mvarDatabase that provides a mechanism for executing queries and maintaining any result sets. The Database object to use is passed to the Instructor class through a property.
`This function gathers information about a specified instructor from the 'database.
Public Sub Load_By_Instructor_ID(lInstructID as Long)
Dim strSelect as String
Dim objDatabase as New Database
Dim bSuccess as Boolean
strSelect = "Select Instr_ID, Name from tblInstructor where Instr_ID = " & _ CStr(lInstructID)
bSuccess = mvarDatabase.ExecuteQuery(strSelect)
If bSuccess
`Code for manipulating the result set in the mvarDatabase object
Else
`Code for handling errors
End If
End Sub
Your goal at this point is to store in the repository the fact that the Visual Basic Instructor class references the tblInstructor table and the Instr_ID and Name columns in the database. So, how do you do that? You can use various options. The following three options are possible but have significant limitations:
MsgBox "Select the option you want from the list provided."
As noted, each of these options has considerable drawbacks. On the other hand, you could redesign the code to allow for automatic identification of tables and columns used by a class. Naturally, you should not redesign the code just to take advantage of the repository. However, this approach has a few other advantages that may make it worthwhile.
The redesign is relatively straightforward. To start with, declare a set of member variables to hold the table and column data. A new class should have been created to hold the column data. Next, the values are set when the class is initialized. Finally, the various functions for the class are rewritten to use these variables instead of writing complete SQL statements.
The following code is in the declaration section of the module:
`Variable declaration section `Table name Private mvarTableName as String `Column data Private mvarColumns as New Collection Private mvarInstr_ID() as New Column Private mvarName as New Column
The following code is in the Class_Initialize subroutine and is used to initialize the variable's values. First, you initialize the table name:
mvarTableName = "tblInstructor"
Then, the Instr_ID column:
mvarInstr_ID.Name = "Instr_ID" mvarInstr_ID.Required = True mvarColumns.Add Item:=mvarInstr_ID Key:= mvarInstr_ID.Name
Finally, the Name column:
mvarName.Name = "Name" mvarName.Required = True mvarName.Length = 30 mvarColumns.Add Item:= mvarName Key:= mvarName.Name
The following code shows how the Load_By_Instructor_ID is rewritten using the new design:
`This function gathers information about a specified
`instructor from the database.
Public Sub Load_By_Instructor_ID(lInstructID as Long)
Dim strSelect as String
Dim objDatabase as New Database
Dim bSuccess as Boolean
`Generic code for building the select statement
strSelect = "Select "
Dim Field as Variant
For Each Field in mvarColumns
strSelect = strSelect & Field.Name & ", "
Next Field
strSelect = strSelect & " From " & mvarTableName
`Build the unique where clause
strSelect = strSelect & " Where "
strSelect = strSelect & mvarInstr_ID.Name
strSelect = strSelect & " = "
strSelect = strSelect & mvarInsr_ID.Value
bSuccess = mvarDatabase.ExecuteQuery(strSelect)
If bSuccess
`Code for manipulating the result set in
`the mvarDatabase object
Else
`Code for handling errors
End If
End Sub
The following things have been accomplished in this basic redesign:
Very few, if any, software developers have spent the last several years writing code so that its information can be easily imported into a repository. Nor should you or anyone undertake a massive redesign effort just to get data into the repository. Therefore, for the time being, the type of data and the amount of detail you can collect will be somewhat restricted. However, by implementing good coding practices, as discussed in the preceding examples, information can be more easily analyzed in the future.
The good news is that, if you are already using good coding practices, not only is your code easy to maintain, easy to read, and reusable, but the key information about your code can now be more easily imported into the repository. You certainly will not be able to import everything you want to know, but you will be able to import more detailed and more reliable data.
Why a section on reengineering your development processes? Because the repository is like any other software: it will only do what you tell it to do.
In other words, the repository is meant to capture information describing your development environment. If you utilize sloppy development practices, and your code is disorganized, undocumented, and hard to read, the resulting data in the repository will be disorganized, undescribed, and hard to explain (if it can be obtained at all). Whereas if you utilized disciplined development practices and the resulting code is well organized, well documented, and easy to read, the data in the repository will reflect that. Furthermore, it will be much easier for the repository to automatically identify those cross-application relationships and dependencies that make the repository truly useful.
Consider a widget factory that wants to implement an automated control system. Near the end of the assembly line, the design team decides to insert a sensor to measure the weight of widgets before they are packaged. However, for some reason, workers are randomly removing widgets to paint and pack in separate boxes. Now the sensor will not be able to measure the weight of these special widgets. Even if it could, the design team has not taken into account the added weight from the paint in calibrating the sensor. Thus, the benefit of the automated system has been undercut by a lack of documentation on the widget process and inconsistency in how that process works. The same sorts of problems can easily occur when pulling information out of your applications and into the repository.
Quite often in a development environment several applications will be built to serve as test cases and may be thrown away when completed. Whether you want to store information about these test cases, or prototypes, depends on your development environment. On the one hand, storing the prototype information allows you to query the structure of the prototype and analyze its interdependencies with other applications. On the other hand, you could mistake prototype components for real application components if you don't pay attention. In general, you probably want to store prototype information in the repository, but you want to ensure that the prototype and its associated components are labeled as such. Be careful not to accidentally label real application components that the prototype is using as a prototype, however.
Earlier sections have provided general guidelines on building and using TIMs and have discussed the mechanics of building the TIM. This section attempts to tie the concepts together to show how to integrate these components into a single model of your development environment's information requirements.
The metamodel is a fairly abstract concept, and a repository can be implemented without understanding the concept. However, in order to truly understand what the repository is all about, the metamodel concept needs to be understood.
The metamodel is simply a description of all the information that you need to store in the repository. It provides an abstract outline of information that is required about your applications. The TIMs that were discussed earlier are used to implement the various components of the metamodel. Thus, technically a TIM is a metamodel for its particular tool. The metamodel also identifies the relationships from one TIM to another.
A metamodel is used to store metadata. The definition of metadata is "data about data." For example, a database stores the names and phone numbers of employees in a company. The names and phone numbers are data. On the other hand, the repository stores the names of the database's tables: company and employee. Company and employee are metadata.
Figure 28.7 provides an example of these concepts. At the bottom layer are various processes involved in providing transportation. At the next layer are the systems that store the data needed to perform their day-to-day business. At the next layer is the metadata that describes how the system data is organized. Thus, metadata is data about data. Moving up one more layer shows the meta-meta data that describes how the metadata is organized.
Fortunately, you do not need to grasp these concepts completely to implement them. Microsoft has provided mechanisms for implementing the metamodel, and has taken care of the meta-meta model for you. TIMs that are required encapsulate the metamodel concept. In addition, the repository engine and the Type Information Model encapsulate the meta-meta layer for you, so you do not have to worry about it at all.
Your main concern should be building an integrated view of the entire development environment and not just individual tools. To accomplish this goal, you need to get organized. Divide your organization along two different axes. The first axis focuses on the steps involved in developing software from requirements to design to implementation. Examples of this division were discussed in previous sections. The second axis focuses on the different aspects of the system, namely data, process, and technology. For example, a typical computer system consists of a database and several applications, each of which runs on some type of computer.
FIG. 28.7
Diagram showing the various levels of data abstraction.
These concepts will be discussed in the following examples. The first shows how a logical concept can be related to an implementation of that concept in code. The second discusses how applications can be linked to the database.
Typically, in a development effort, you will lay out a design before you begin writing code. A common methodology for laying out a design is to use an object-oriented design (OOD). A basic unit of OOD is the concept of a class. Another major concept in OOD is that classes can be derived from other classes. These concepts could be captured in the metamodel as shown in the Figure 28.8.
FIG. 28.8
A class can inherit its structure from one other class. The inherited class will
have the same functions and attributes as the base class.
At this point, to store data about classes from an OOD software tool, you would create a TIM with a class of class. You would also designate the class class as the source and destination in the Inheritance relationship. You would also need to expose a class interface to access the data through any extraction or analysis tools.
After the design is complete, the next step is to implement the design concepts in the code. If you have a software tool that supports object-oriented programming, you will want to implement your design classes as classes in the code. This concept of a design class is shown in Figure 28.9. The relationship shown indicates there might be many implemented classes for every design class because the same design concept may need to be repeated in several different projects.
FIG. 28.9
An implemented class is a software module that physically implements the conceptual
class from the system's design.
At this point, however, you are not ready to store data. Unfortunately, not all code implements classes the same way, nor do they all implement the concept of inheritance the same way. For example, Visual Basic does not provide true inheritance, but, rather, provides a different version where one class "implements" another class. On the other hand, Visual C++ provides inheritance and includes the slightly more advanced concepts of virtual and pure virtual functions to define the inherited characteristics. Another difference is that Visual Basic classes support the concept of properties, whereas Visual C++ does not. These new ideas are captured in Figure 28.10. Visual Basic only allows inheritance of the names of the functions while Visual C++ allows inheritance of the implementation of the function.
FIG. 28.10
The concept of inheritance is implemented differently in Visual Basic and Visual
C++.
At this point, you can now expand the TIM discussed earlier to include the new classes and relationships. Note that when defining the TIM, the Type Information Model does not directly support the concept of inheritance. Therefore, there is no easy way to say that a Visual C++ class is derived from your Implemented class. Instead, create a new relationship type called is a type of. Make the Implemented class a source for the relationship, and make the Visual Basic and Visual C++ classes a destination of the relationship.
As a final consideration, assume that you are currently using the MDO model. You now need to expand that model to fit into the metamodel concepts diagrammed in the figures. Because the MDO Class Module portion of the MDO model is basically the same concept as the Visual Basic Class, you can consider them the same thing. Figure 28.11 demonstrates this concept. After that, you will need to expand the MDO Model in the repository because it does not currently capture enough information to show implement concepts.
FIG. 28.11
The MDO model for Visual Basic implements the concept of classes in the MDO Class
Module class.
You have now created a path for identifying which parts of your design are implemented by which applications.
Another piece of information you might want to capture is which portions of a database are accessed by what portions of a program. Assume that your applications are written in Visual Basic. The structure of the MDO model that is used for capturing Visual Basic information is shown in Figure 28.12.
FIG. 28.12
These components of the MDO model store information about a project, its components,
and references to other projects.
The other item you need to track is the database itself. Figure 28.13 shows a simple model for tracking the database information including the name of the database and its basic data dictionary information.
FIG. 28.13
A database consists of the database itself and one or more tables. The tables in
turn consist of one or more columns.
Now you are ready to show how the application and the database are interrelated. However, you have a decision to make because several alternatives are available. Two of those alternatives are shown in Figures 28.14 and 28.15.
Figure 28.14 simply shows a relationship that allows tracking of which databases are used by which applications. This information is high level and relatively easy to gather and maintain. On the other hand, this information is of limited usefulness for any analysis that needs to be done.
FIG. 28.14
One way to show interrelationships between a Visual Basic project and a database
is to simply relate the name of the database to the Visual Basic Project.
Figure 28.15 shows each module of the application and the associated database table or tables that it interacts with. This information is more detailed and will be much harder to collect and maintain. On the other hand, the added detail will allow production of higher quality reports.
FIG. 28.15
A second method of showing the relationship is to capture which tables are accessed
by which modules of code.
The answer to the question of which technique to pick depends on what you want. Do you need detail or just high-level concepts? Is this your first repository project? How hard will it be to gather detailed information versus the high level? Do not take these questions lightly. They will have a major impact on the usefulness of your repository system.
As a final note of consideration, the diagrams are used to show what database structure an application uses to gather data. The actual connection of an application to a database is handled through some type of data source such as an ODBC driver. The sample figures do not include the concept of a data source and would have to be expanded to include it. The point is that the metamodel plays a major role in defining what the repository can and cannot do.
Before running out and using the Microsoft Repository, bear a couple of things in mind. First, the repository is only at version 1.0. Second, as of this writing, Microsoft has announced the development of its Open Information Model, which vendors are to use as a foundation for connecting to the repository. Unless you have a specific need in mind, you may want to wait until the tool and the models have stabilized before using them. However, the tool will only do so much for you. You should be examining how your development environment will be impacted by the tool in the future.
© Copyright, Macmillan Computer Publishing. All rights reserved.