Thursday, November 21, 2013

Twitter and the oversimplification of discourse in Social Media

Mark Twain famously began one of his letters with this apology, “I didn't have time to write a short letter, so I wrote a long one instead.”

Frankly, if a few years ago someone had questioned me about the viability of Twitter, my response would have been a dubious stare and a reference to Mr. Twain’s statement. I would not have believed it possible for people to dedicate the required time to craft a message of 140 characters or less. Had I been asked to invest in Twitter, I would have been just like that guy who refused to sign the Beatles.  

I have recently prototyped an application that scans tweets in real-time, on any given topic, and automatically tries to evaluate the tweet’s sentiment and opinion (check it out at In the process of testing this prototype, I unwittingly became witness to the true nature of the so-called twitter-sphere.  Let alone that in the process of parsing the various tweets, my software had to do acrobatics around heavily used emoticons ( L), or figure out all kinds of acronyms and abbreviations such as omg,  lol, aatk, cuz,  bff, or wtf. Never mind that the software had to perform some miraculous heuristic tricks to figure out language-gone-wild situations. In the end, I came to the realization that there is not a lot of meaningful discourse taking place. I regret to say it, but the quality of most Twitter communications truly sucks.  This is what I think of Twitter in less than 140 characters:  A highly #obfuscatedstream of incoherent dribble, intermixed with lame trivialities, swamped by a morass of hash-tags and mangled hyperlinks.

It almost felt wrong trying to apply sophisticated natural language methods such as chunking, na├»ve Bayes or Maximum Entropy classifiers, grammar-trees, Lexical corpora, dictionaries and other techniques in order to try to evaluate Tweet sentiments. The experience was like using the Hubble telescope to help a paparazzi spy on a Kardasian. Then again, who am I to judge what’s considered vox-populi or culturally topical? What is really ‘parse-worthy’? And really, why should I blame the messenger? Maybe the problem is that 140 characters is still too permissive a limit to prevent ‘dumbificated’ discourse.  Perhaps, the right direction is to go completely wordless.   The more recent popularity of Instagram or the more ephemeral SnapChat, (who I hear just turned down $3B from Facebook!) makes complete sense in this context.

This gives me a unique idea sure to make me billions of dollars (pinky-finger in the corner of my mouth as I slowly enunciate the word beee-llion.) Why not a social-media site that allows users to enter a maximum of one word?  ‘Supercalifragilisticexpialidocious” would be acceptable, and even exclamation marks would be okay, but the space character would be forbidden. (Obviously German users concatenating their words could easily violate the spirit of the site, if not the letter; so German would not be available, or I could restrict the word length to, say, 140 characters. That way, the German word for speed limit, ‘Geschwindigkeitsbegrenzung’ would be perfectly acceptable.)

What name should I give it? I thought of,, and so on with no luck. All these domain names have already been taken, presumably by people with a similar idea. I went the cute route, and searched for, and other y-ending derivatives. All taken. Finally I struck gold with, which, to my surprise, was available. There you have it.  I’m on the road to my next start-up: A social site that will allow you to express yourself in only one word! Coming to you soon!  Or rather Coming!  In the meantime, Facebook, feel free to contact me at your earliest convenience.  Please!

Friday, November 8, 2013

Which language to use? A Brief History of Programming Languages

Which computer language to use for your IT transformation, and why , are questions that require a comprehensive understanding of your project, the availability of programmers knowledgeable in that language, and the access to supporting frameworks and libraries.
On Monday, November 12, 1945 at 12:45 pm, John Von Neuman and five other scientists met at the RCA research center in Princeton and essentially invented the architecture of all modern computers[1].  Amongst the key assumptions in this so-called “Von Neuman architecture” was the idea that computer programming could be done with software as opposed to flipping switches and wiring. Thus was born the concept of computer language and of software as the soul of the computer.
Since that time, computer languages have progressed from primitive machine languages, to assembler, and to more advanced symbolic languages.  Fortran became the power-language of the scientific community, while business folks tended to prefer Cobol,  invented by Grace Hooper (don’t let anyone tell you that women had no part in the advancement of computer science!)
Attempted sequiturs to these two languages were not as successful.   IBM tried to merge “the best of” Fortran and Cobol into something called PL/I, that ended with the worst features of both languages.  Not to be outdone, the US Department of Defense sponsored an over-specified Ada language (in honor of yet another female computer pioneer, Ada Lovelace), which suffered a similar fate as the equally unsuccessful F-35 jet fighter.  (In all fairness, the DOD was also fundamental to the creation of the Arpanet).  More innovative languages from academia failed to get traction due to their syntactic obscurity or processing demands (APL, Lisp, Modula, Prolog, Smalltalk, etc.), but some did manage to transition to commercial areas (e.g. Basic and Pascal.)
Still, the one truly ground-breaking language innovation came from Dennis Ritchie, a hippie-looking AT&T Labs researcher, who created a streamlined language, called C.  C was quickly followed by an avalanche of computer languages, Object Oriented mostly, combined with a veritable array of scripting languages such as PHP, Perl, and JavaScript.
So, returning to the original question. . .  Which language should you use?  While a reason for so many choices has been the continuous search for a chimeric language that is ‘just like English’, the fact is, languages have been, and continue to be defined by the need to precisely specify the desired outcome.  Most computer languages today can be classified as follows:
  •  Imperative/Procedural. This is basically how most traditional languages work (Assembler, Basic, C, Pascal, etc.). The programmer sets every operation in a step-by-step basis.
  • Declarative/Functional. This type of language is supposed to work on the basis of the programmer indicating “what” is needed instead of “how” to get it done. An example is the SQL language, used to extract data from relational abases, or Prolog, a language based on a logical inference engine.
  •  Object Oriented. This category includes languages implementing a programming paradigm based on the creation of classes and objects, with specific rules on polymorphic instantiation, inheritance and encapsulation of data. Object-Oriented languages such as C++, Java, and C# have been the preferred ones for the last decade.

There  are other significant dimensions to various languages: how strongly typed they are (strongly typed languages like Java, and C# impose very restrictive rules on the use of variables), whether the language is interpretive, compiled or emulated; the choice may have implications in how quickly your team can code, but also on the performance of the solution, and also in how well the language is known (better availability of programmers) and how many support resources (libraries, discussion boards, etc.) exist for that language.
Add to this the more recent popularity of “multi-faced” languages such as Ruby and Python. Python for example, can be used as an Object-Oriented language, as a procedural language, and even as a functional language, and all within a single program! The focus of these new languages is flexibility, but they do allow the selective use of high performance libraries (in Python you can access higher performing C language modules, for instance) to allow for selective performance optimization.
Coming from large development shops, I still believe that Java or C# should be the preferred languages in the core development of large projects. In these, you need strongly, statically typed languages with a focus on object orientation to ensure compliance of standards and inter-operability. Additionally, both languages are supported by larger corporations (Oracle for Java, Microsoft for C#), and they are also accompanied with very mature support frameworks and libraries. Needless to say, the pool of available programmers with knowledge of either of these two- almost-identical languages is extensive.
Still, if you’re doing start-up work and have a small team with the mission to deliver quick prototypes under a very dynamic Agile mode, you would do well to either use Python or Ruby. Both languages supported by a vibrant open source community. Ruby comes with a well-established web framework (Rails; hence Ruby on Rails), and Python frameworks include Django, Bottle, and Flask amongst others.  The variety of web frameworks for Python can actually be a problem; so Python may be better applied to programming that relates to pattern matching and language processing (much of today’s social media mining tools are based on Python).
Obviously, you will still need to rely on a web based scripting language. JavaScript is my favorite. I am not a fan of PHP’s syntax (write once, read-never).
For large projects you should allow some heterogeneous language development for specific subject matter areas. Yes, the bulk of the coding can be done in Java or C#, but certain complex algorithms could be better implemented using “language X”. You need to be judicious in the handling of these exceptions however, or you could run the risk of ending up with an unsupportable zoo of programs written in a too many languages and frameworks!

[1] “Turing’s Cathedral” by George Dyson. Pantheon Books, New York.