Friday, March 12, 2010

Security & Continuance


Security and Continuance aspects should be dealt with simultaneously. They represent the two faces of the same coin. As discussed earlier, the continuance cause is advanced by the design of fault tolerant systems. If you don’t believe me, consider this example: Soon after the People’s Republic of China opened its economy and began the process of establishing a Chinese Stock Exchange, a group of advisors from a leading US computer company were asked to check out the country’s new electronic trading system developed to support that exchange. Upon inspection they found that the entire system was based on a single server computer with no fallbacks and no backups.  When they were told by a very proud systems engineer that the system was able to process upwards of 300 transactions per second, the American team was flabbergasted.  How were they able to process such throughput in what was, after all, no more than a single mid size server? “Well everything is being kept in memory,” was the response.  “But . . . doesn’t that mean that if someone hacks the system or the system goes down for whatever reason,  you are bound to lose all the stock exchange transactions held in the memory?” the baffled Americans asked[1]. The Chinese programmer, who clearly at the time was not yet well versed in the principles of Capitalism, thought it over for a moment and then replied, “Well . . . Stock Market . . .  very risky business!”
The fault tolerance issues of the trading system could be resolved by using redundant servers and by handling the transactions according to ACID rules, but then the system security should also become an intrinsic element of this design.  But how much security is appropriate? Instinctively, most security managers who would love to encase the system in layer-upon-layer of firewalls and encryption—something I like to call the “Fort Knox in a Box” approach.  If you were to carry out the most rigorous of these security recommendations, you would end up with a system that’s not only expensive but also so heavy and burdensome that no one would be able to use it.
There is always the tradeoff between security, business continuance, cost and performance. What’s the right level? Therein lays the conundrum.
This might be considered controversial but, in my view, as long as the relaxed position does not compromise the core business in a way similar to the stock market application mentioned in the anecdote, the right level can only be found by calibrating the amount of security or continuance coming from the more relaxed position. In other words: start simple. Simpler security guidelines are more likely to be followed than are complicated rules (in my experience, stricter parents always wind up with the most rebellions kids!) However, this approach only works well when you have designed the system to be flexible, so that it can quickly accommodate new security layers, and when you can act proactively to preempt any security exposure.
The stock market solution in my story was too flimsy from the get-go, but at least there was a chance to harden the system. In my experience, trying to loosen-up a system that has been initially over-engineered often results in a structurally weakened system.  
When it comes to security and business continuance, one should apply reasonable criteria that can be measured against the actual likelihood and impact of exposure.  Paranoia is a good attribute to have when it comes to designing security systems, but hysteria is not. I knew a security manager who wanted to encrypt all the messages flowing in the central serve complex; no matter that this complex was decoupled from the outside world by virtue of a DMZ. The argument was that disgruntled employees would still be able to snoop at the unencrypted messages. Assuming, of course, that those disgruntled employees had access to the central complex (not all employees did), the proposed security “solution” was one that would have cost the company many millions of dollars more in extra hardware to protect it against a possibility that was strictly speculative.
I once witnessed a large web project that initially contemplated the placement of encryption on every web page via a series of password layers, causing the overall system to perform at snail’s pace. An effort was made to remove many of the security layers and encryption in order to improve performance, but by then the system had been designed with such an inherently complex structure that it could not be improved upon. The entire effort had to be scrapped and a less burdensome and more efficient system had to be created from scratch.
Naturally, this argument would have made sense in the context of a specific critical business system. After all, the degree of security should be commensurate with the consequences of a breach. If we are to protect a nuclear silo, massive security layers make sense. Trying to apply that level of security to protect your web server might be overkill. Adopting a strategy to not encrypt all the internal traffic was deemed to be an acceptable risk given the circumstances.
For instance, consider the need for compliance and certification of industry standards such as the Payment Card Industry (PCI) security standard requiring encryption of all critical credit card information. Even if a literal reading of the standard might allow the transfer of plain credit card information in an internal, controlled environment, one can make the decision to encrypt this information anyway. However, an acceptable compromise implies that only those fields related to the PCI certification need to be encrypted; not all the messages flowing in the core system.
A security strategy whereby assets are safe-guarded on a case-by-case basis according to their criticality is more appropriate than trying to encase the entire system in accordance with its most critical element.


[1] If you remember discussion on ACID attributes for transaction systems, this would be an example of a transaction environment lacking in inherent durability.