Cartographer implements a novel approach to managing distributed systems by automatically discovering and tracking the relationships between its component systems and applications. Cartographer does so via specially designed agents -- residing on clients, servers and (potentially) network devices -- that detect, identify, and track the inter and intra-system dependencies or relationships. Dependencies include network level services like DNS, DHCP, and SMTP as well as higher-level application abstractions like filesystems, databases, directory services, telephony, and middleware.
Relationships are modeled using a dependency graph borrowed from the Graph Theory branch of mathematics. In our model, systems and applications are represented as vertices and dependencies are represented as edges. More specifically, we use directed graphs to indicate dependencies between clients and servers or between peers. Once dependencies are discovered, Cartographer agents automatically organize systems and applications into peer-to-peer overlays. Then, peers exchange management information amongst themselves to detect and correct service problems with the goal of doing so without the active participation of management software.
Example Dependency Graph 1 represents a small slice of a distributed system containing systems S1, S2, and S3. These systems (and the applications residing on them) possess several dependencies amongst themselves. Systems S3 and S2 rely on system S1 for DNS service while S3 mounts the /www filesystem from S2. Finally, system S3 utilizes a database also residing on S3.
Why the name Cartographer?
Why did we name our software Cartographer? We chose this name to reflect the fact that our software builds and analyzes maps of distributed systems. It does not build maps in the traditional sense -- topology is neither collected nor stored. Instead, the relationships between systems and applications are continuously discovered and identified. These relationships are then measured and analyzed in order to assess the health and availability of the distributed system as a whole.
Cartographer is supported on the following platforms:
- Solaris 9+ on Sparc
- Solaris 9+ on x86
- Linux systems containing 2.4+ kernels on x86
- Windows XP/2003/Vista/2008/7
On Solaris 9 systems, Cartographer runs as a traditional UNIX daemon and is started and stopped using /etc/init.d scripts. Cartographer does not currently utilize the Service Management Facility recently introduced in Solaris 10. On Linux, Cartographer also runs as a traditional UNIX daemon and is started by init using run-levels specified by the chkconfig command. The default install directory is /opt/cartographer for all UNIX platforms. On Windows, Cartographer runs as a native service and is manipulated using the Service Control Manager bundled with Windows. The default installation directory is C:\Program Files\Cartographer on Windows. Support for Windows 2000 has been deprecated. Solaris 11 support is currently under development and not yet generally available.
Managers and Agents become Peers
Cartographer contains agents just like traditional management architectures (e.g. SNMP and the Internet Management Framework). However, in the Cartographer architecture, traditional managers have been eliminated -- there are no managers to poll and store data in a centralized repository. Graphical applications can provide views into the system but they do not act in a managerial role.
Agents, on the other hand, are intelligent, self-organizing, and self-distributing, and act in a dual-role exchanging management policies and updates. When agents are collecting dependency data and measuring performance, they are acting in their traditional role. When agents communicate with each other, run distributed decision-making algorithms, and self-propagate, they are acting more as peers in a typical peer-to-peer network.
Cartographer agents self-organize into P2P overlay networks in order to exchange management information, software updates, and events. Systems, in Cartographer's model, can be both clients and servers just as in real life. A system is considered a server if it provides some service to a client. A system is considered a client if it utilizes a service from some server. Systems are considered peers if they both utilize the same server for some particular service. The graph below illustrates that nodes S1 and S3 are peers because they both utilize the DNS service from node S1.
In the future, these overlay networks will be utilized to run distributed, decision-making algorithms, compare service times and service experiences, and to diagnose and troubleshoot faults.
Cartographer's management protocol is a custom-designed, XML-based, management protocol transported via SSL over TCP. The protocol adapts and extends the Internet Management Framework's structure of management information (SMI). Further, the protocol utilizes XML for both data modeling and transfer syntax. The choice of XML provides more future-proofing than the original Basic Encoding Rules (BER) that SNMP utilizes. Further, choosing XML allows us to take advantage of all the XML tools and code libraries in existence today. For now, the protocol is called the XML management protocol or XMP for lack of a better name.
The acronym XMP should not be confused with the IETF XMPP Extensible Messaging and Presence Protocol.
One of the drawbacks of traditional management frameworks is the overhead in deploying and maintaining agents across large distributed systems. Even lightweight, intelligent second-generation agents, like SystemEDGE, become a burden to deploy and maintain when the number of systems becomes large. Consequently, we have designed Cartographer to be self-deploying and self-upgrading with minimal administrative overhead. Agents utilize peer overlay networks for distributing upgrades by periodically querying each other to see which components and versions they have installed. New and updated components are thus deployed by introducing them on a few computers in a distributed system and letting nature take its course. Cartographer uses a pull-model for upgrades.
Self-deployment works similar to the process used for upgrading. Cartographer agents constantly search for dependencies and peers during the course of their operation. When they discover a peer, they will try to contact it using XMP. If a system is not XMP-enabled, then Cartographer agents will try to remotely install themselves through various platform-specific techniques. Cartographer uses a push-model for automated deployment. One deployment limitation exists, however -- Windows machines are required to deploy to other Windows machines because of the proprietary nature of Microsoft COM, DCOM, and NetBIOS protocols. However, both Windows and UNIX can deploy to UNIX systems.
These graphs were constructed using MRTG querying Cartographer agents natively via XMP. See xmptomrtg.
- Interface I/O
- Win32 Handles in Use
- Interface I/O
- Load Average
- Application Size
- Number Process/Threads
- Memory Utilization
Cartographer and XMP are currently integrated with the following tools and NMSs:
- OpenNMS integration is undergoing testing. The first release of integration includes data collection and graphing. Future integration will include event processing and dependency importation.
Distributed Root Cause Analysis and Event Correlation
Each Cartographer agent tracks a portion (their sub-graph) of the global distributed system dependency graph. Agents utilize their sub-graph in conjunction with local and remote events and automated testing to perform root cause determination.
Distributing out root-cause computations to agents, rather than performing them in a centralized management station, solves several problems.
- Scaling is increased: a single piece of management software is inherently not scalable to the size of today's networks.
- Localized knowledge: agents observe, detect, and problem-solve on the actual clients and servers where the action is.
- Faster reaction to changes in the distributed system: no waiting around for a five-minute polling interval to detect and process changes.
- Event correlation occurs from multiple points of view: nodes independently observe and correlate -- many eyes look at the problem.
Documentation of the protocol and SMI, via XML Schemas, has been written as have MIB specifications. Now that those key pieces are complete, more documentation on installation, configuration, and operation is needed.
- Self installation and deployment
Self-installation and distribution development are in the works. What does that mean? Essentially, Cartographer agents will deploy themselves throughout an enterprise as they discover dependencies. Typically, a deployment is started by first installing Cartographer agents on a handful of Windows and UNIX machines and letting Cartographer agents take care of the rest.
- Automatic testing and measuring of performance
Cartographer agents will use the dependencies they discover to automatically configure service testing. When agents measure and detect sub-optimal performance, they will communicate amongst themselves to identify possible causes. They will do so all without management intervention.
- Additional platform support
Ports to other platforms (e.g. MacOS and other proprietary platforms) are in the planning stages.
- Additional Integration Work
More adapters, plugins, and glue to allow third-party management software to communicate with Cartographer, via XMP, are needed. For example, integration with Zenoss and Hyperic are under consideration.
- Support for virtualization
Right now, agents run within the context of a machine/OS without regard for virtualization. Future versions will detect virtual machine dependencies.
- XML Management Protocol or XMP
- Software Distributions
- Cartographer Agent Installation
- Installing the Cartographer GUI
- Example Cartographer Agent Configuration
- Release Notes
- Video of ANSMTUG presentation
- Slides to ANSMTUG presentation via Slideshare
- Schemas (XSDs) for the XMP
- XMP agent engine [http://www.krupczak.org/images/0/08/CoreMib.xml core MIB specification
- Cartographer MIB specification