Thursday, November 21, 2013

Twitter and the oversimplification of discourse in Social Media

Mark Twain famously began one of his letters with this apology, “I didn't have time to write a short letter, so I wrote a long one instead.”

Frankly, if a few years ago someone had questioned me about the viability of Twitter, my response would have been a dubious stare and a reference to Mr. Twain’s statement. I would not have believed it possible for people to dedicate the required time to craft a message of 140 characters or less. Had I been asked to invest in Twitter, I would have been just like that guy who refused to sign the Beatles.  

I have recently prototyped an application that scans tweets in real-time, on any given topic, and automatically tries to evaluate the tweet’s sentiment and opinion (check it out at In the process of testing this prototype, I unwittingly became witness to the true nature of the so-called twitter-sphere.  Let alone that in the process of parsing the various tweets, my software had to do acrobatics around heavily used emoticons ( L), or figure out all kinds of acronyms and abbreviations such as omg,  lol, aatk, cuz,  bff, or wtf. Never mind that the software had to perform some miraculous heuristic tricks to figure out language-gone-wild situations. In the end, I came to the realization that there is not a lot of meaningful discourse taking place. I regret to say it, but the quality of most Twitter communications truly sucks.  This is what I think of Twitter in less than 140 characters:  A highly #obfuscatedstream of incoherent dribble, intermixed with lame trivialities, swamped by a morass of hash-tags and mangled hyperlinks.

It almost felt wrong trying to apply sophisticated natural language methods such as chunking, na├»ve Bayes or Maximum Entropy classifiers, grammar-trees, Lexical corpora, dictionaries and other techniques in order to try to evaluate Tweet sentiments. The experience was like using the Hubble telescope to help a paparazzi spy on a Kardasian. Then again, who am I to judge what’s considered vox-populi or culturally topical? What is really ‘parse-worthy’? And really, why should I blame the messenger? Maybe the problem is that 140 characters is still too permissive a limit to prevent ‘dumbificated’ discourse.  Perhaps, the right direction is to go completely wordless.   The more recent popularity of Instagram or the more ephemeral SnapChat, (who I hear just turned down $3B from Facebook!) makes complete sense in this context.

This gives me a unique idea sure to make me billions of dollars (pinky-finger in the corner of my mouth as I slowly enunciate the word beee-llion.) Why not a social-media site that allows users to enter a maximum of one word?  ‘Supercalifragilisticexpialidocious” would be acceptable, and even exclamation marks would be okay, but the space character would be forbidden. (Obviously German users concatenating their words could easily violate the spirit of the site, if not the letter; so German would not be available, or I could restrict the word length to, say, 140 characters. That way, the German word for speed limit, ‘Geschwindigkeitsbegrenzung’ would be perfectly acceptable.)

What name should I give it? I thought of,, and so on with no luck. All these domain names have already been taken, presumably by people with a similar idea. I went the cute route, and searched for, and other y-ending derivatives. All taken. Finally I struck gold with, which, to my surprise, was available. There you have it.  I’m on the road to my next start-up: A social site that will allow you to express yourself in only one word! Coming to you soon!  Or rather Coming!  In the meantime, Facebook, feel free to contact me at your earliest convenience.  Please!

Friday, November 8, 2013

Which language to use? A Brief History of Programming Languages

Which computer language to use for your IT transformation, and why , are questions that require a comprehensive understanding of your project, the availability of programmers knowledgeable in that language, and the access to supporting frameworks and libraries.
On Monday, November 12, 1945 at 12:45 pm, John Von Neuman and five other scientists met at the RCA research center in Princeton and essentially invented the architecture of all modern computers[1].  Amongst the key assumptions in this so-called “Von Neuman architecture” was the idea that computer programming could be done with software as opposed to flipping switches and wiring. Thus was born the concept of computer language and of software as the soul of the computer.
Since that time, computer languages have progressed from primitive machine languages, to assembler, and to more advanced symbolic languages.  Fortran became the power-language of the scientific community, while business folks tended to prefer Cobol,  invented by Grace Hooper (don’t let anyone tell you that women had no part in the advancement of computer science!)
Attempted sequiturs to these two languages were not as successful.   IBM tried to merge “the best of” Fortran and Cobol into something called PL/I, that ended with the worst features of both languages.  Not to be outdone, the US Department of Defense sponsored an over-specified Ada language (in honor of yet another female computer pioneer, Ada Lovelace), which suffered a similar fate as the equally unsuccessful F-35 jet fighter.  (In all fairness, the DOD was also fundamental to the creation of the Arpanet).  More innovative languages from academia failed to get traction due to their syntactic obscurity or processing demands (APL, Lisp, Modula, Prolog, Smalltalk, etc.), but some did manage to transition to commercial areas (e.g. Basic and Pascal.)
Still, the one truly ground-breaking language innovation came from Dennis Ritchie, a hippie-looking AT&T Labs researcher, who created a streamlined language, called C.  C was quickly followed by an avalanche of computer languages, Object Oriented mostly, combined with a veritable array of scripting languages such as PHP, Perl, and JavaScript.
So, returning to the original question. . .  Which language should you use?  While a reason for so many choices has been the continuous search for a chimeric language that is ‘just like English’, the fact is, languages have been, and continue to be defined by the need to precisely specify the desired outcome.  Most computer languages today can be classified as follows:
  •  Imperative/Procedural. This is basically how most traditional languages work (Assembler, Basic, C, Pascal, etc.). The programmer sets every operation in a step-by-step basis.
  • Declarative/Functional. This type of language is supposed to work on the basis of the programmer indicating “what” is needed instead of “how” to get it done. An example is the SQL language, used to extract data from relational abases, or Prolog, a language based on a logical inference engine.
  •  Object Oriented. This category includes languages implementing a programming paradigm based on the creation of classes and objects, with specific rules on polymorphic instantiation, inheritance and encapsulation of data. Object-Oriented languages such as C++, Java, and C# have been the preferred ones for the last decade.

There  are other significant dimensions to various languages: how strongly typed they are (strongly typed languages like Java, and C# impose very restrictive rules on the use of variables), whether the language is interpretive, compiled or emulated; the choice may have implications in how quickly your team can code, but also on the performance of the solution, and also in how well the language is known (better availability of programmers) and how many support resources (libraries, discussion boards, etc.) exist for that language.
Add to this the more recent popularity of “multi-faced” languages such as Ruby and Python. Python for example, can be used as an Object-Oriented language, as a procedural language, and even as a functional language, and all within a single program! The focus of these new languages is flexibility, but they do allow the selective use of high performance libraries (in Python you can access higher performing C language modules, for instance) to allow for selective performance optimization.
Coming from large development shops, I still believe that Java or C# should be the preferred languages in the core development of large projects. In these, you need strongly, statically typed languages with a focus on object orientation to ensure compliance of standards and inter-operability. Additionally, both languages are supported by larger corporations (Oracle for Java, Microsoft for C#), and they are also accompanied with very mature support frameworks and libraries. Needless to say, the pool of available programmers with knowledge of either of these two- almost-identical languages is extensive.
Still, if you’re doing start-up work and have a small team with the mission to deliver quick prototypes under a very dynamic Agile mode, you would do well to either use Python or Ruby. Both languages supported by a vibrant open source community. Ruby comes with a well-established web framework (Rails; hence Ruby on Rails), and Python frameworks include Django, Bottle, and Flask amongst others.  The variety of web frameworks for Python can actually be a problem; so Python may be better applied to programming that relates to pattern matching and language processing (much of today’s social media mining tools are based on Python).
Obviously, you will still need to rely on a web based scripting language. JavaScript is my favorite. I am not a fan of PHP’s syntax (write once, read-never).
For large projects you should allow some heterogeneous language development for specific subject matter areas. Yes, the bulk of the coding can be done in Java or C#, but certain complex algorithms could be better implemented using “language X”. You need to be judicious in the handling of these exceptions however, or you could run the risk of ending up with an unsupportable zoo of programs written in a too many languages and frameworks!

[1] “Turing’s Cathedral” by George Dyson. Pantheon Books, New York.

Friday, September 20, 2013

The Green Field Illusion

 While I opined elsewhere in this blog that brand new projects (aka ‘Green Field Projects’ or green-field for short), can also be viewed as transformation initiatives (after all, you are transforming from a “manual” to an automated environment), there are, in fact, specific considerations that make green-field projects somewhat different; specially as these projects relate to start-ups and new business launches.

First of all, there is the view that green-field projects are by their nature easy. After all, there is no need to cope with hideous legacies. Also, the ability to create something from scratch in our image and likeness is naturally viewed as a major plus. This view of green- field reminds me of Julie Andrews spinning and dancing around the beautiful green hills of Switzerland under the melodious background of the Sound of Music!

The truth is that, usually driven by ambitious start-up deadlines, green-field projects require the express deployment of a variety of systems with under-developed resources, and with super-aggressive timelines. Having successfully gone through just such scenario during my last professional stint, I can actually confirm that green-field projects resemble much more the green field of a football field with you quarterbacking a team starting its first down from your own five yard line and expected to score a touchdown in less than a minute through rushes, sacks and no timeouts.  Think of Elway's Drive against Cleveland to get an idea.

Start-up driven green-fields projects demand different kinds of focus and priorities from typical legacy transformation projects.

These are some of the major differences:

·         In transformation initiatives you are more likely to sacrifice schedules to meet minimum functionality requirements.  You will almost never be allowed to replace a legacy system with lesser functionality. 
·         In a green-field project you have to focus on delivering the minimum required functionality necessary to launch or bootstrap the initial systems. The success of the start-up depends on the green-field results of a mission critical basis, and there is simply no alternative to fallback to legacy.
·         Expectations for a transformation delivery are that it will be of high quality. There is little tolerance for any new systems breaking down as they replace old-tested solutions.
·         In a green-field delivery, you should assume failures are more likely to occur and, most likely, the opening of the business won’t want to wait for you to do a lot of testing. It’s reality. You should put a higher focus on remediation processes and tools when launching a green-field deliverable. Also, you will have to be a lot more agile in how you implement changes in a green-field deployments versus a more structured legacy transformation initiative.
·         Because you are expected to meet crazy deadlines and diminishing budgets in a green-field project, innovation becomes even more critical.  You’ll have to be very creative and quick on your feet in  figuring out ways to leverage preexisting solutions in the market or come up with creative ‘hacks’ to get the delivery out the door. Still, you will also be obliged to maintain a minimum of best practice standards to ensure that after launch clean-up is as painless as possible.
Having said all this, green-field projects can indeed be a lot of fun. You will be tested under fire but once completed, you will be rightfully entitled to feel a measure of pride for having created that essential first generation of systems!