Applying Object Technologies to ETL Solutions, part 2

CRR_3158_001_big_mouthThese lines are a continuation of a previous post you could review here. In that post I discuss the use of the called SOLID principles in ETL (extract-transform-loading) solutions. This time, I will check some GoF design patterns and the GRASP principles. In many ways those patterns and principles draw the same paths of the SOLID principles. The perspective is probably their main differences.

Let’s start with some of the GRASP:

Creator and Controller. In ETL solutions there are little opportunities to «create» objects in the object-oriented sense. Here, I’m referring to the answers of «who is responsible to instantiate/call/create this or that component?». As I said before in the first part, I’m inclined to use a kind of Façade as a master o entry point for the more detailed/concrete components. This master/director/manager component will call/instantiate the more cohesive workers. But it’s common that my set of ETL creators are the same as set of the controllers, because the ETL «creation» of the component occur when you start using it (therefore, no «high-cohesion» nor SRP principles are violated). The Controller will maintain the coordination of the operations to be executed by the sub-components. They are responsible for some cleaning or preparations of the field, the logistic provision, or the pre- or post- tasks required to the whole process.  The Controller domain is the workflow management. It defines the order and relations of the strictly high-cohesive components, separating unnecessary dependencies and architectural «hard-coding». Also, the Controller assumes the use case representation, the goal to be achieved, while the called sub-components implement the steps for the case.

High-cohesion and Low-coupling. These are «evaluative» principles. What does that mean? These principles should be taken into account when two or more options are considered. Which of them open the possibilities for extensions without unnecessary dependencies, and respecting the OCP? Which of them focus to one task only (SRP) and could be reused by other solutions (ISP)? In novice programming, there is a common tendency to «monolithic» structures in ETL solutions. These are the packages/modules/units with only one location that perform all you have to do for the job. These «solutions» aren’t flexible, nor reusable, but «get the job done». Whenever you need similar functionalities, included in a bit of this code, the programmer just use the «copy-paste» programming technique. I have seen this tendency in SQL procedures also. There is the scenario: to avoid SQL code directly in your ETL mechanism, you must enclose all the code in a single SQL procedure guarantying good performance, security and extensibility. Then, in those procedures, the novice create tables and indexes, insert data from other sources, update the data with another sources, delete data for cleaning unnecessary records based on hard-coded rules, and, at the end, execute a simple Select SQL statement to retrieve the results. I cannot avoid to quickly take a seat (my sight blurs, and my breath starts to abandon me), when someone said to me that this ETL solution take hours to complete because it retrieves one or two millions of records. I cannot avoid a satisfactory (maybe a little cynical) laugh when an intensely code refactor of «the solution», provides results in seconds.

All the other principles (indirection, pure fabrication, polymorphism, and protected variations) are related to the other enumerated principles, with the same objectives: code reutilization by configurations adjustments (dynamic binding of details at runtime), high performance, and a great flexibility for extensions. Remember, those principles are guidelines for your creations. They don’t dictate code to your ears. Only practice, and deep thinking and continuous test, could results in an inner understanding of their implications.

GoF Design Patterns. Of the well-known 23 design patterns of the canonical book of Design Patterns, I consider most of them of practical value for today applications and implementations. The rest—as the Observer or the Iterator, e.g.— has been included as part of or replaced by frameworks (event-driven applications) and other standard mechanisms. The frequent use of those patterns should be considered as a sign of good object-oriented technique. I had implemented many of my favorite patterns under ETL solutions when the occasion opens to some object oriented programming inclusions. For example, it’s not a strange problem the need for a Strategy component to encapsulate different processing algorithms at the same, or structurally similar, data. Recently, in an ETL solution design, I proposed the use of Strategy classes, instantiated by a Factory method, to validate the accuracy of different record types and take action based on that evaluation. The solution should recognize the specific domain to been validated at run-time (using dynamically loaded parameters), instantiate the validation class (with a factory method and reflection mechanisms), and split the results between the «bad» from the «good» records, and gather the «exceptions» metadata. This solution allows to increase any new validation formula or type without changing the current code, simply creating the new validation class. Again, this model leverage the use of the IPO model supporting high-cohesion and low-coupling.

In other similar case, although not by class implementation, you can use the inner «principles» behind the patterns as guidelines to other solutions. For example, I like the Prototype pattern as an alternative when you have to share the same resource, avoiding locking and concurrency limitations. «Clone-then-use» is an excellent mechanism, when you want to share a simple file (Word document, Excel sheet, or a simply flat-file), but have the concern about locking or concurrency problems. Each user should obtain its own copy of the item, use it and then discard it.

Other similar examples could be summarize to show that in programming good principles can be translated to different paradigms with the same or better results than expected. Those principles and patterns (SOLID, GoF Patterns, and GRASP) among others, should be in the toolbox of any advanced programmer. «Programming by principles» is how I refer to this conception of the actual implementation: there is no line of code without a good principle that support it.




Anuncio publicitario

Deja una respuesta

Por favor, inicia sesión con uno de estos métodos para publicar tu comentario:

Logo de

Estás comentando usando tu cuenta de Salir /  Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Salir /  Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Salir /  Cambiar )

Conectando a %s