Decooda’s Liquid Data (Massive Parallelism with Event Based Actors)

A quick search on the web regarding concurrent programming with threads should make the most hardened developers think carefully about their concurrent architectures. Take a look at a few titles that came up as I searched for “Developing multi-threaded applications”.

  • Concurrency Hazards: Solving Problems In Your Multithreaded Code. 
  • How to detect and debug multi-threading problems? 
  • Why lock may become a bottleneck of multithreaded program? 
  • How to you solve the problem of implicit locking and parallel execution? 
  • How to Avoid Deadlock in java and Deadlock java code. 
  • Thread Limitations in Java.

Once you dive a little deeper into understanding threaded architectures you quickly realize that it has its challenges. Programming massive parallel systems with threads may not yield all that a machine has to offer. When threads race or lock the performance of the application is substantially impaired. Often times, in order to compensate, fewer threads are used and artificial delays are programmed to throttle things down a bit. Developers are well equiped and taking great care in designing threaded applications.

In the Very Large Database (VLDB) world, Massively Parallel Processing (MPP) systems Like Netezza, Vertica, and Teradata have gone to great lengths to make sure that their databases adhere to a “Shared NOTHING architecture”. Sharing disk, I/O BUS, Memory, CPU are all avoided to varying degrees. In doing so, they have the ability to scale out across multiple machines and process large data at incredible speeds. You can imagine how well these systems are designed considering that a threaded model is used to varying degrees.

Having worked with parallel processing architectures since the mid 90’s, I saw this as an opportunity to take my learnings and apply them to a new breed of big data processing. Over the last year, we have been pretty busy at Decooda solving our own real-time big data challenges and decided it was time to innovate our own MPP Grid platform. Many of the solutions on the market today are either to expensive or just not real-time enough to solve the problems we have in processing massive amounts of free-text. 

We opted to develop our MPP platform with an Event Based Actor model. Obvious choices are Erlang, Clojure and Scala. We opted for Scala because it closely resembles the Erlang model of concurrency and runs beautifully on the JVM. Scala also provides a nice transition for Java developers and has the ability to utilize existing Java libraries.

We designed the Decooda Liquid Data Platform in such a way that “Plugins” are the highly reusable units of business logic that extend the platforms functionality. Currently, the plugin API supports Scala and Java and we are now modifying the API support plugins written in Clojure, Javascript, and Groovy. 

As we brought our text analytics solution to the market, many organizations expressed an interest in using Decooda’s Liquid Data Platform to support some of their internal big data challenges as well. All though we are not in that space, we have been thinking about the possibility of opening up this platform to the community. We do feel that what we have may be game changing and we are looking for a few companies with a serious big-data problem willing to explore a paradigm shift. We hope that through this exercise it will become a viable big-data alternative for the community. 

Feel free to contact me if you are currently in your due diligence stage or have reached an upper limit with your current architecture.

All the best,
Charlie

blog comments powered by Disqus