Friday, January 1, 2010

Data Matching and Integration Engines

Encapsulation of data via data services via Data Sentinel works well when the data is being accessed intermittently and discretely. However, there are cases where the data access pattern requires matching large amounts of data records from one data base to large data volumes in another data base. An example could be a campaign management application with a need to combine the contents of a customer database with a promotion data base defining discount rates based on the customer’s place of residence.  Clearly, the idea to have this service call a data service for every customer record when performing promotional matches would be unsound and impractical from a performance perspective. The alternative, to allow applications to perform direct data base joins against the various data bases is not an ideal one either. This latter approach would violate many of the objectives SOA tries to solve by forcing applications to be directly aware and dependant of specific data schemas and data base technologies.
Yet another example is when implementing data extraction via an algorithm such as MapReduce that necessitates the orchestration of a large number of backend data clusters. This type of complex orchestration against potentially large sets of data cannot be left to the service requester and is best provided by sophisticated front end servers.
Both examples show the need to make these bulk data matching processes part of the service fabric, available as coarse data services. The solution then is to incorporate an abstraction layer service for this type of bulk data join process. Applications can then trigger the process by calling this broadly-coarse service. In practical terms, this means that when implementing the SOA system you should consider the design and deployment of data matching and integration engines needed to efficiently and securely implement this kind of coarsely defined services.  In fact, you are likely to find off-the-shelf products that at heart are instances of Data Matching Engines: Campaign Management Engines, Business Intelligence systems, Reporting Engines servicing users by generating multi-view reports.
Now, using off-the-shelf solutions has tremendous benefits but the use of external engines is likely to introduce varied data formats and protocols to the mix. Non withstanding the ideal to have a canonical data format all throughout, there will always be a need to perform data transformations.  That’s the next topic.