Blue Chip Blog
Gartner DWH and my Mystical Quarter Circle Survey
By Vallabh on June 24th, 2010 at 6:10 am eHello Charlie,I found an interesting article in gartner.Hope everybody will find it interesting too. This article compares all technologies with pros and cons we have been discussing:http://www.gartner.com/technology/media-products/reprints/microsoft/vol13/article5/article5.htmlThanks,Vallabh
Thanks Vallabh,
It’s really hard to use something like a magic quadrant to pick a technology for your firm. For example, if you were putting in a retail POS system and wanted to have all transactions feed a real-time DWH, I “might” select Teradata. Apart very specific applications where an enterprise strategy makes sense, Teradata would not be my first choice for many reasons.
Do you agree with Vertica and HP being closely grouped? How many implementations of Neoview are there? Would you consider Microsoft a real DWH player? I guess it’s all about your perception of what a data warehouse is.
You know, I would be really interested in seeing what YOU say.
Would you take the Mystical 1/4 of a Circle Survey?
The Survey is short and the intention is for you to answer the questions quickly and off the cuff so we can get a sense of your actual perception.
Adhoc, MPP, and IN-MEMORY BI
By vallabh on June 21st, 2010 at 8:29 am
Hello Charlie,
Thanks for such a quick response. It is definitely very helpful information. I am trying to fit in Oracle Exadata when compared to the other tools. Could you also tell me whatis a better way to handle ad-hoc analytics.Use a MPP in place of existing database or instead use microstrtegy or spotfire with the web logic.
Thanks, Vallabh.
Oracle Exadata is Oracle RAC at it’s core. Which means it’s based on Oracles OLTP Engine. With that said, there are locking and memory sharing issues that RAC needs to deal with (not to mention the shared disk in which it stores it’s data.) These could be potential reasons to lean more towards a BI tool that supports the slicing and dicing of cubes.
If you find that your queries are just not performing because of any underlying technology bottlenecks, this approach will give you options as long as you can fit the cube refreshes within your maintenance window. The problem with utilizing cubes, is that they need to be pre-defined and you lose you ability to do complete ad-hoc analysis outside of the cube definition.
The MPP Databases I have worked with have all been “Shared Nothing” architectures, so there were no potential bottlenecks with the data warehouse technology. I would simply have my BI go straight against my transactional (relational) model. A recent query I have going against 9 Billion records with 2 columns as my predicate AND performs a sum, completes in about 45 seconds. With speeds like this, you typically do not generate cubes or aggregate tables unless you absolutely have to. So the BI really becomes and issue of function and presentation.
Occasionally, I will run into adhoc queries will just put an overwhelming burden on the system. Only then will I try and aggregate, or denormalize the data. If I can avoid it, I would rather not add additional operational aspects to the data warehouse. (IE: Scheduling cube gerenation, report publication, maintaining the OLAP servers, etc..)
I came up with an architecture about 2 years ago and I coined it Executive Warehousing. It was primarily based on technologies like QlikView or Spotfire and you can consider it an alternative to a full blown DWH environment. I called it Executive Warehousing, because the architecture and components are within budget of most executives without having to get approval for a full blown enterprise data warehouse and the cost and committees that they bring.
It begins by keeping a copy of your purified source data extracted as flat files sitting on inexpensive commodity disk. You would then create a process to subset the flat files and populate the Qlikview or Spotfire repositories. The IN-MEMORY model of QlikView or Spotfire would allow you to maintain the data at the transactional level so you would not be loosing adhoc capabiliites as you would when generating a cube and at the same time, your adhocs would be fast as it will allow the slicing and dicing to happen in “near” real-time.
The size of your data, the allowable operations window for refreshes and your budget are all the driving factors here. I hope that I have given you some food for thought and that you find this infomation helpful. In summary, if your query response times are fast, the world is your oyster for BI. Use what is easy, cost effective, and has great presentation capabilities. If your data warehouse is sluggish, opt for an IN-MEMORY or CUBE based approach to supplement your warehouse. I always try my solution going with straight SQL against an MPP database first, aggregation tables second, and cubes last. IN-MEMORY based BI tools are great and may be all you need?
As always, I would really love to hear YOUR experiences out there in the Large Scale DWH world.
Best Regards,
Charlie
Microstrategy, Spotfire and Tableau?
By vallabh on June 18th, 2010 at 1:04 pm
I am also looking for comparison of Microstrategy, Spotfire and Tableau…. Please let me know on what parameters can I compare the tools. I am looking for a technology that offers ad-hoc analytics.
Ok, so this has the potential of opening up quite a bit of dialog. So I am making it a stand alone post. Please comment and contribute. I too would like to see what others are thinking in this space.
Your backend database and the size of your data is a big consideration.
MicroStrategy has the ability to to pull data into a repository for it’s multi-dimensional analysis OR perform pass through sql. They have accelerators for specific databases (even has optimizations for Aster, Vertica, Greenplum, Netezza, Teradata, on and on..) Obviously very powerful and the local cubes provide the slicing and dicing you would expect. They even have a free version that you can use. The reason why I mention “What is your backend, is because if you are using an MPP technology like Netezza, Teradata, Vertica, you may not need to pre-aggregate your data, andyou therefore just need a good visualization dashboard on top of sql queries.
Spotfire is somewhat of a different class and is fits in a space with QlikView as IN-MEMORY analytics. I have not had hands on with Spotfire, but in the Qlikview world, you extract your data into MEMORY-MAPPED files. Qlikview has pretty amazing compression and an AWESOME set of charting objects. I have been able to create incredible BI dashboards in a few hours that were extremely compelling. You need to keep your memory mapped files updated with scheduled extractions and publication. Qlikview provides the reporting and publishing servers to do so. With that said, I would imagine Spotfire to be very similar. Being owned by Tibco is not necessarily a bad thing either but Qlikview may be a bit more nimble. The cool thing is that the data is stored at the transactional level, so you can aggregate on the fly “IN RAM”. It’s 64BIT, can support large memory mapped files, and is pretty intelligent in how it retrieves and buffers the data off of disk.
Tableau may be more in the space with Pentaho, LogiXML, Jaspersoft. Unfortunately, I have not used Tableau either but did work briefly with Pentaho and LogiXML. The thing I like about Tableau and tools like Qlikview is the interactive nature in which you can work with the data and build reports. Tableau can connect to just about anything from flat-files to data warehouses.
An area where you may want to investigate is DUNDAS dashboard. I was extremely impressed with the visualization and the speed in which I could create dashboards. The rendering is based on silverlight in the browser and the objects looked awesome. The price is not bad either.http://www.dundas.com/Dashboard/Start/Samples/index.aspx
I happen to like Qlikview and Dundas, but all these products ALL have a free trial and in most cases even a free limited use version. I typically work with MPP databases, so I tend to avoid the need for CUBES and multi-dimensional analysis. I am fortunate that my options are usually wide open.
I hope that helps.
Site Status
Hi Folks,
i its been awhile since my last blog posting. I am making a new commitment to get some of my experiences with varying technologies in the MPP space posted. If you read my blogs, you can see that I have a passion for parallel processing and have had some pretty interesting opportunities to work with Netezza, Vertica, Teradata and Greenplum, I will be branching off into other areas of MPP technologies like Grid based processing but for the most part will keep my posts primarily related data warehousing and the effort to support my peers in branching out into the world of MPP, database appliances, and Column Store Technologies.
Please feel free to ask questions, I do not claim to know all the answers, but their are experts lurking about willing to share their insights.
So in my efforts to give back to the DWH community a little bit, I thought I would give you a one stop shop for relevant DWH News and comments. Kurt Monash at DBMS2.com has a lot of this stuff nailed down pretty well. His analysis is great and the frequency of the posts are timely. My slant however is from a slightly different perspective as I focus primarily on implementation, architecture, and designs. I am still very much in the battlefield architecting systems and deploying dash boards while using a variety of technologies..
JOB Board
The jobs posted on this site are relevant to technology. I know that “Technology” is a broad term, but I had to prime the pump with something. My hope is that overtime, firms looking for experts in the Various MPP technologies will post here. Take a look and let me know what you think. http://www.bcsolution.com/job-search/
VLDB Dashboard
I created a mashup for you to keep your finger on the pulse of the Very Large Database Market. Although it is not perfect, it gives a pretty good sense of interest across the competing technologies over time. Check out the VLDB Dashboard and see how your favorite technology vendor i doing.
Forums
I could never get those forums to take off. And I am considering taking them down. I am going to take another stab at rearranging some of the topics. Perhaps a slant on technical questions will drive some participation.
This is not a job offer, but rather an opportunity to share the Blue Chip soap box with other passionate technologist. If you have a passion for data warehousing / primarily in the MPP & appliance space let me know? Perhaps you specialize in one technology and would like to host a weekly column on your topic. Let me know there is any interest out there. I can be reached info@bcsolution.com
Need a Job?
Hi Folks,
The Job search engine is now on-line. The jobs posted on this site are all related to technology. I know that “Technology” is a broad term, but I had to prime the pump with something. My hope is that overtime, firms looking for experts in the Various MPP technologies will post here. Take a look and let me know what you think. http://www.bcsolution.com/job-search/
Best Regards,
Charlie




