Question on Exadata, MPP and Memory based BI

By vallabh on June 21st, 2010 at 8:29 am

Hello Charlie,

Thanks for such a quick response. It is definitely very helpful information. I am trying to fit in Oracle Exadata when compared to the other tools. Could you also tell me whatis a better way to handle ad-hoc analytics. Use a MPP in place of existing database or instead use microstrtegy or spotfire with the web logic.

Thanks, Vallabh.

Oracle Exadata is Oracle RAC at it’s core. Which means it’s based on Oracles OLTP Engine. With that said, there are locking and memory sharing issues that RAC needs to deal with (not to mention the shared disk in which it stores it’s data.) These could be potential reasons to lean more towards a BI tool that supports the slicing and dicing of cubes.

If you find that your queries are just not performing because of any underlying technology bottlenecks, this approach will give you options as long as you can fit the cube refreshes within your maintenance window. The problem with utilizing cubes, is that they need to be pre-defined and you lose you ability to do complete ad-hoc analysis outside of the cube definition.

The MPP Databases I have worked with have all been “Shared Nothing” architectures, so there were no potential bottlenecks with the data warehouse technology. I would simply have my BI go straight against my transactional (relational) model. A recent query I have going against 9 Billion records with 2 columns as my predicate AND performs a sum, completes in about 45 seconds. With speeds like this, you typically do not generate cubes or aggregate tables unless you absolutely have to. So the BI really becomes and issue of function and presentation.

Occasionally, I will run into adhoc queries will just put an overwhelming burden on the system. Only then will I try and aggregate, or denormalize the data. If I can avoid it, I would rather not add additional operational aspects to the data warehouse. (IE: Scheduling cube gerenation, report publication, maintaining the OLAP servers, etc..)

I came up with an architecture about 2 years ago and I coined it Executive Warehousing. It was primarily based on technologies like QlikView or Spotfire and you can consider it an alternative to a full blown DWH environment. I called it Executive Warehousing, because the architecture and components are within budget of most executives without having to get approval for a full blown enterprise data warehouse and the cost and committees that they bring.

It begins by keeping a copy of your purified source data extracted as flat files sitting on inexpensive commodity disk. You would then create a process to subset the flat files and populate the Qlikview or Spotfire repositories. The IN-MEMORY model of QlikView or Spotfire would allow you to maintain the data at the transactional level so you would not be loosing adhoc capabiliites as you would when generating a cube and at the same time, your adhocs would be fast as it will allow the slicing and dicing to happen in “near” real-time.

The size of your data, the allowable operations window for refreshes and your budget are all the driving factors here.  I hope that I have given you some food for thought and that you find this infomation helpful. In summary, if your query response times are fast, the world is your oyster for BI. Use what is easy, cost effective, and has great presentation capabilities. If your data warehouse is sluggish, opt for an IN-MEMORY or CUBE based approach to supplement your warehouse.  I always try my solution going with straight SQL against an MPP database first, aggregation tables second, and cubes last. IN-MEMORY based BI tools are great and may be all you need?

As always, I would really love to hear YOUR experiences out there in the Large Scale DWH world.

Best Regards,

Charlie

blog comments powered by Disqus