Friday, December 25, 2009

The Data Visibility Exceptions

The Data Sentinel is not unlike the grumpy bureaucrat processing your driver’s license application forms. After ensuring that you comply with what’s sure to be a ridiculously complicated list of required documents, it isolates you from directly accessing the files in the back.
While you, the applicant, the supplicant, cannot go around the counter and check the content of your files directly (not legally, anyway), the DMV supervisor in the back office is able to directly access any of the office files. After all, the supervisor is authorized to bypass the system processes intended to limit the direct access to the data.  Direct supervisory access to data is one of the exceptions to the data visibility constrains mentioned earlier. 
Next is the case of ETLs (Extract Transform Loads) of large sets of data as well as its reporting. These cases require batch level access to data in order to process or convert millions of data records and can wreck performance if carelessly implemented. Reporting jobs should ideally run against offline replicated databases; not the on-line production data bases. Better yet is to plan for a proper Data Warehousing strategy that allows you to run business intelligence processes independently of the main Operational Data Store (ODS). Never the less, on occasion, you will need to run summary reports or data-intensive real-time processes against the production database. When the report tool is allowed to access the database directly, bypassing the service layer provided by the Data Sentinel, you will need to ensure this access is well-behaved and that it runs as a low priority process and under restricted user privileges. The same control is required for the ETL processes.  Operationally, you should always schedule batch-intensive processes for off-peak times such as nightly runs.
A third potential cause for exception to data visibility is implied by the use of off-the-shelf transaction monitors, requiring direct access to the databases in order to implement the ACID logic discussed earlier.
A fourth exception is demanded by the need to execute large data matching processes. If there is an interactive need to run a process against a large data base set with matching keys in a separate data base (“for all customers with sales greater than an $X amount, apply a promotion flag equal to the percentage corresponding to the customer’s geographic location in the promotion database”), then it makes no sense trying to implement each step via discrete services. Such an approach would be extremely contrived and inefficient. Instead, use of a Table-Joiner super-service will be required. More on that next.

Friday, December 18, 2009

Transactional Services

Related to the issue of Session-Keeping is how to ensure that complex business transactions take place in order to meet the following so-called ACID properties:
·         Be Atomic. The transaction is indivisible and it either happens or does not.
·         Be Consistent. When the transaction is completed all data changes should be accountable. For example, if we are subtracting money from one bank account and transferring it to another account, the transaction should guarantee that the money added to the new account has been subtracted from the original account.
·         Act in Isolation.  I like to call this the sausage-making rule. No one should be able to see what’s going on during the execution of a transaction. No other transaction should be able to find the backend data in a half-done state. Isolation implies serialization of transactions.
·         Be Durable.  When the transaction is done, the changes are there and they should not disappear. Having a transaction against a cache that fails to update the data base is an example of non-durability.
Since we are dealing with a distributed processing environment based on services, the main method used to ensure that ACID is met is a process known as Two-Phase Commit. Essentially, a Two-Phase Commit establishes a transaction bracket prior to executing changes, performs the changes, and after ascertaining that all needed changes have occurred a commit to finalize the changes by closing the transaction bracket. If during the process, the system is unable to perform one or more of the necessary changes, a rollback process will occur to undo any prior partial transaction change. This is needed to ensure that, if unsuccessful, the transaction will, at the very least, return the system to its original state. This process is so common-sense that, in fact, all this business of transaction processing has been standardized. The OpenGroup[1] consortium defines transactional standards, and in particular the so-called X/Open protocol and XA compliance standards.
However, transactional flows under SOA tend to be non-trivial. This is because a transaction flow requires the keeping of session states throughout the life of the transaction and, as earlier discussed, state-keeping is to SOA what Kryptonite is to Superman.  Say you want to transfer money from one checking account to another. You access the service Subtract X from Account; then you create another service, Add X to Account Y. This simple example puts the burden of transactional integrity on the client of the services. The client should ensure that the Add to Account service has succeeded before subtracting the money from the original account. An approach like this breeds as much complexity as a cat tangling a ball of yarn, and it should be avoided at all costs.  Far simpler is to create a service, Transfer X from Account X to Account Y, and then let the service implementation worry about ensuring the integrity of the operation. The question then is what type of implementation is most appropriate.
While SOA based transactional standards are in place [2] , actual vendor-based implementations supporting these standards don’t yet exist in the way mature Database XA compliant implementations exist. In general, you’d be better off leveraging the backend transaction facilities provided by RDBMS vendors or by off-the-shelf transaction monitors such as CICS, MTS, or Tuxedo. All in all, it’s probably best to encapsulate these off-the-shelf transaction services behind a very coarse meta-service whenever possible, rather than attempting to re-implement the ACID support via Two-Phase Commit at the services layer. 
It should be noted that what I am essentially recommending is an exception to the encapsulating databases via a Data Sentinel when it comes down to implementing transactional services. The reasoning behind this is that integrating with off-the-shelf transactional services will likely require direct database access in order to leverage the XA capabilities of the database vendor. 
As more actual off-the-shelf transactional service solutions for SOA appear in the future, we can then remove the exception.
More on the Data Visibility Exceptions will follow. . .


[1] http://www.opengroup.org/
[2] http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ws-tx

Thursday, December 10, 2009

State Keeping/State Avoidance


Managing SOA complexity brings up the question of session state. By ’state’ I mean all the required information that must be maintained and stored across the series of interactions needed to complete a full business exchange. Maintaining the state of a service interaction implies remembering at what stage are the conversing partners and the working data in effect. It will often be at your discretion designing services to either depend more or less on the use of state information. At other times the problem at hand will force a specific avenue. In either case, you should remember this simple formula: State-Keeping = Complexity.
Maintaining state might be inescapable in  automated orchestration logic, but it comes with a cost. State-Keeping constrains the options for maintaining high availability and indirectly may increase SOA’s fragility by making it more difficult to add redundant components to the environment. With redundant components you must ensure that messages flowing through the system maintain their state, regardless of the server resources used. Relying on session states, while also allowing flexible   service flows, is hard to do. It’s done, yes, but the price you will have to pay is an increase complexity and performance penalties related to the need to propagate the state of a particular interaction across several nodes. Therefore, a key SOA tenet is that you should use sessionless flows whenever possible. In other words, every request should ideally be atomic and serviceable regardless of the occurrence of previous requests.
Do you want to know the name of an employee with a given social security number? No problem. As a part of the request pass the social security number, and receive the name. If you next want the employee’s address, you can pass either the social security number or the name as part of the request. While atomic, sessionless, requests such as these do impose a requirement that the client maintains the state of the interaction and holds the information elements related to the employee, this approach does simplify the design of systems using server clusters.
Still, while the preferred tenet is to avoid session keys. On occasion, it becomes impossible for the client to keep the state, forcing the server to assume this responsibility. In this case, the approach is to use a uniquely generated “session-id” whereby the server “remembers” the employee information (the state).  You will have to ensure the session key and associated state data is accessible to all servers in a loosely-coupled cluster, making your system design more complicated.
For an example of keeping a session-based state, consider an air booking process where the client is reserving a pair of seats. The server will temporarily decrease the inventory for the flight. For the duration of the transaction the server will give a unique “reservation id” to the client so that any ongoing requests from the client can be associated with the holding of these seats.   Clearly, such a process will need to include timeout logic to eventually release the two seats in the event the final booking does not take place before a predetermined amount of time.
This discussion leads to another tenet: maintaining state, either in the client or in the server, along the lines mentioned is ultimately acceptable. Keeping the state inside the intermediate nodes? Not so much.  Why? An intermediate component should not have control in timing-out a resource that’s being held in the server. If it did, it would be disrupting the server’s ability to maintain integrity in its environment. Also, an intermediate component will not have full awareness of the business semantics of the service request/response.  Relying on an intermediate component to preserve state is like expecting your mail carrier to remind you that your cable bill is due for payment on the 20th of each month. He might do it, yes, but the moment you forget to tip him during the holidays, he just might “forget”!
Ironically, many of today’s vendors offer solutions that encourage the processing of business logic in their intermediate infrastructure products, encouraging you to maintain state in these middleware components. They do so because enabling middleware is an area that does not require them to be aware of your applications, and thus is the easiest area for them to offer you a “value-add service” in a productized, commoditized fashion. You should resist the melodious chant of these mermaids and refrain from using their tempting extras services. If not, you may find yourself stuck with an inflexible design and with a dependency on specific vendor architecture to boot.
My advice is to avoid these vendor-enabled approaches. There is much that can get complicated with the maintenance of state, especially when the business process requires transactional integrity, referential integrity, and security (and most business processes do). The moment you give up this tenet and maintain session state inside the SOA middleware as opposed to the extreme end represented by the Client and the Server, you will be ensuring years of added complexity in the evolution of your SOA system.

Friday, December 4, 2009

Taming the SOA Complexities


Remember when I used to say, “Architect for Flexibility; Engineer for Performance”? Well, this is where we begin to worry about engineering for performance. This section, together with the following SOA Foundation section represents the Level III architecture phase. Here we endeavor to solve the practical challenges associated with SOA architectures via the application of pragmatic development and engineering principles.


On the face of it, I wish SOA were as smooth as ice cream. However, I regret to inform you that it is anything but.  In truth, SOA is not a panacea, and its use requires a fair dose of adult supervision. SOA is about flexibility, but flexibility also opens up the different ways one can screw up (remember when you were in college and no longer had to follow a curfew?).  Best practices should be followed when designing a system around SOA, but there are also some principles that may be counter-intuitive to the “normal” way of doing architecture. So, let me wear the proverbial devil’s advocate hat and give you a list from “The Proverbial Almanac of SOA Grievances & Other Such Things Thusly Worrisome & Utterly Confounding”:
·         SOA is inherently complex. Flexibility has its price. By their nature, distributed environments have more “moving” pieces; thereby increasing their overall complexity.
·         SOA can be very fragile. SOA has more moving parts, leading to augmented component interdependencies.  A loosely coupled system has potentially more points of failure.
·         It’s intrinsically inefficient. In SOA, computer optimization is not the goal. The goal is to more closely mirror actual business processes. The pursuit of this worthy objective comes at the price of SOA having to “squander” computational resources. 
The way to deal with SOA’s intrinsic fragility and inefficiency is by increasing its robustness.  Unfortunately, increasing robustness entails inclusion of fault-tolerant designs that are inherently more complex.  Why? Robustness implies deployment of redundant elements. All this runs counter to platonic design principles, and it runs counter to the way the Level I architecture is usually defined. There’s a natural tension because high-level architectures tend to be highly optimized, generic, and abstract, referencing only the minimum detail necessary to make the system operate. That is, high level architectures are usually highly idealized—nothing wrong with it. Striving for an imperfect high level architecture is something only Homer Simpson would do. But perfection is not a reasonable design goal when it comes to practical SOA implementations.  In fact, perfection is not a reasonable design goal when it comes to anything.
Consider how Mother Nature operates.  Evolution’s undirected changes often result in non-optimal designs. Nature solves the problem by “favoring” a certain amount of redundancy to better respond to sudden changes and to better ensure the survival of the organism. “Perfect” designs are not very robust. A single layered roof, for example, will fail catastrophically if a single tile fails. A roof constructed with overlapping tiles can better withstand the failure of a single tile. 
A second reason SOA is more complex is explained by the “complexity rule” I covered earlier: the more simplicity you want to expose, the more complex the underlying system has to be. Primitive technology solutions tend to be difficult to use, even if they are easier to implement.  The inherent complexity of the problem they try to solve is more exposed to the user. If you don’t believe me consider the following instructions from an old Model T User Manual from Ford:
 “How are Spark and Throttle Levers Used? Answer: under the steering wheel are two small levers. The right- hand (throttle) lever controls the amount of mixture (gasoline and air) which goes into the engine. When the engine is in operation, the farther this lever is moved downward toward the driver (referred to as “opening the throttle”) the faster the engine runs and the greater the power furnished. The left-hand lever controls the spark, which explodes the gas in the cylinders of the engine.”
Well, you get the idea. SOA is all about simplifying system user interactions and about mirroring business processes.  These goals force greater complexity upon SOA. There is no way around this law.
There are myriad considerations to take into account when designing a services-oriented system.  Based on my experience I have come up with a list covering some of the specific key techniques I have found effective in taming the inherent SOA complexities.  The techniques relate to the following areas that I will be covering next:
State-Keeping/State Avoidance. Figuring out under what circumstances state should be kept has a direct relevance in determining the ultimate flexibility of the system.
Mapping & Transformation. Even if the ideal is to deploy as homogenous a system as possible, the reality is that we will eventually need to handle process and data transformations in order to couple diverse systems. This brings up the question as to where is best to perform such transformations.
Direct Access Data Exceptions. As you may recall from my earlier discussion on the Data Sentinel, ideally all data would be brokered by an insulating services layer. In practice, there are cases where data must be accessed directly. The question is how to handle these exceptions.
 Handling Bulk Data. SOA is ideal for exchanging discrete data elements. The question is how to handle situations requiring the access, processing and delivery of large amounts of data.
Handling Transactional Services.  Formalized transaction management imposes a number of requirements to ensure transactions have integrity and coherence. Matching a transaction-based environment to SOA is not obvious.
Caching. Yes, there’s a potential for SOA to exhibit a slower performance than grandma driving her large 8-cylinder car on a Sunday afternoon. The answer to tame this particular demon is to apply caching extensively and judiciously.
All the above techniques relate to the actual operational effectiveness of SOA. Later on I will also cover the various considerations related to how to manage the SOA operations.
Let’s begin . . .