Archive for the ‘Visualization’ Category

Messy data is fun.

Perpetuum mobile. Color gears on white isolated background.Whether you are an ETL developer, a business analyst or a data scientist, we all spend way too much time cleaning up and conforming our data set. We are bent on cleaning up our data to try to extract precious and valuable information. We spend long hours preparing our datasets and when we are finally ready to pursue our analysis, a new set of data is available. Then we need to start all over again! We will eventually ask a developer to build a job, if we want the solution to be sustainable. However even with tools like DataStage or Informatica, it’s not always easy to clean up datasets.

These “anomalies” that we seek to correct or structure have multiple origins: unstructured feed of information, human data entry errors,  wrong sensor or automated information, inconsistencies betweens IT processes.

New tools are now appearing on the market to help us quickly structure messy data. The beauty of it is that you don’t need to write a single line of code. Trifacta, for example uses what they call “predictive interaction.” This algorithm analyzes the interactions with the source file and will automatically suggest a list of possible transformations ordered by their probability of relevance. For example, in an unstructured file, after selecting a few email addresses, the program will automatically propose a rule to extract email addresses and display a distribution of data to which the rules apply. This application also facilitates the detection of anomalies, data aggregation based on advanced rules and the resolution of erroneous encoding problems on large sets without writing scripts or complex SQL.

A product a little less mature but also very interesting comes from Mountain View. “Google Refine” works on similar principles as Trifacta. Google Refine can quickly prepare, without coding, a dataset for a visualization tool or analysis. In addition, it is free. Videos are also available to quickly learn the tool.

The main advantage of these tools is to quickly transform a data set to actionable information and to maximize the analytical and decision-making time.

On the other hand, tools such as WhereScape Red can play a similar role to rapidly clean and merge multiple data sources and automate loading into a database. Redscape can also generate SQL code that could possibly serve as a basis for development with a traditional ETL tool or depending on the size of the solution and the company, eventually become the production system itself!

A range of products are now available to provide analysts datasets that can be rapidly exploited by visualization products (such as Tableau Software, SAP Lumira, Qlikview or Yellowfin), tools for predictive or statistical analysis (R, Tibco Spotfire, SAS, SPSS,..) or the good old Excel spreadsheet.

These applications also have some influence on the way we approach the preliminary phases of the BI project, since it is now easier to test our assumptions about data quality and dispersion of the data by performing fast and accurate proof of concepts. Our users can now have data available to validate their needs before the development of the solution starts. It’s another set of tools we can add to our agile project framework to be even more efficient.

In short, this new product line is expanding and certainly deserves the attention of all those for who data is their bread and butter.

Raphael Colsenet – Novembre 2014

Advertisement

Discovery tool; SAP Lumira or Tableau Software?

 Which one should I pick? This is a question I had to face a couple of weeks ago. So I played with each tool for a few days to come up with a decent answer.Colorful business statistics

But first, what is a discovery or visualization tool? The recent emergence of this new generation of BI tools is supposed to help our business users to better understand the amount of data available in the company. How is this supposed to happen? By creating stunning visualizations with an intuitive drag-and-drop interface. The premise of these tools is to deliver sexy environments in order to perform faster and more flexible self-service analysis. The users are able to quickly and with ease merge multiple data sources to produce meaningful information about their business. They no longer have to wait for the IT department to create heavy, expensive and painful solutions. Every sales rep will tell you that their “Discovery tools” are specially built for these kinds of needs.

These “magical tools” could raise a new set of questions about the accuracy of the data. Without any common semantic layer, how can you guaranty that the accounting and marketing departments will come up with the same number? I think that this tool should be used in conjunction with a unified semantic layer. But this is not the purpose of this post.

Let’s get back on track. So, I did a comparison of 2 tools available on the market but many others are available these days, they are popping up everywhere, SAS Visual Analytics, Qlickview and Tibco are only a few among many.

I used a trial version of Tableau (8.0.1) and Lumira (1.0.11) to do my evaluation. I thought that sharing my impressions could be interesting for those who are looking for a similar tool. Obviously, depending on your business context, other tools than Tableau and Lumira may be a better fit. In my case, these two tools were the most appropriate.

Let’s start with Tableau. Tableau is a powerful tool to explore data. Once the main functions are mastered, it is very easy to create meaningful visualizations. Tableau is a very mature tool, with a lot more visualizations and functions available.

Some great functions in Tableau:

  • Ability to use custom maps
  • A single GPS point could have multiple dimensions displayed (gender, age, marital status,…)
  • Very easy to customize the layout and information displayed
  • The visualizations can be integrated into a dashboard and linked to each other.

A training course is strongly recommended to accelerate the use of this tool. The company offers training on site, on line and they also organize a session in major North American cities (http://www.tableausoftware.com/learn/training?qt-training_tabs=3#qt-training_tabs ). The user community is also very active and it is relatively easy to find useful information on the web. Tableau also offers short videos on the web to help the users with the tool. The videos are very well done and fun to watch.

One big downside of Tableau in my evaluation context is the absence of a connector for SAP BusinessObjects 3.x or 4x (BO). According to a Tableau account manager, a beta version will be available to a few users in the coming months.

Until a connector is available, all semantic layers need to be rebuilt by IT or business users with strong knowledge of SQL and the database schema. Maintenance of the semantics layer will also be a constant effort to take into consideration. Despite the fact that no connection is offered to SAP BusinessObjects, Tableau has an impressive list of connectors such as Google Analytics, MapR Hadoop, Salesforce, OData, Microsoft Analysis Services, etc…

The program requires a lot of resources to run adequately. With a data set smaller than 100’000 rows, the performance was good on my computer. With a bigger sample (250’000 rows), Tableau starts to slowdown and closing other applications becomes mandatory. An experienced Tableau’s developer told me to try to keep the data set under 2 millions rows.  The POC was done with a computer equipped with Windows 7 64 bits with an Intel Corei3 2.93Ghz and 6 GB of RAM.

I opened a ticket because of a problem with my graphic card drivers and Tableau technical support was quick and efficient in resolving my issue.

Tableau offers exporting the data set to xls or access. Tableau will automatically create a MS Access database, pretty neat.  The ability to print and print to pdf is also available.

A server version of Tableau is required to share visualizations. A different license is required to access the server. Tableau reader is available for free but need to be installed on the computer.

Due to the intense competition in this market segment, the updates are very frequent. This means new functionalities on a regular basis for the business users but also more work for the windows administrator to package and upgrade the most recent releases.

A cloud version of Tableau is newly offered (500$/user/year) and according to their website has almost the same functionality as the desktop version and the same connectors.  I tried the cloud version (see below the visualization about the best skier nations).. expliquer les diff.

Overall, Tableau is an excellent tool with a lot functionalities and could really help users better understand their data.

Next is a brief review of my experience with Lumira (version 1.0.11)

The tool is very easy to learn. A couple of hours are sufficient to understand the basics and to create visualization. The learning curve for Lumira is slightly quicker than Tableau. This is probably a side effect of the limited number of functions available.

The main advantage of Lumira is the full integration with SAP products and especially BusinessObjects. All semantic layers and filters available in the universe are reusable in Lumira.

The user experience is similar to Webi, the users familiar with the universe will quickly understand the semantic layer in Lumira. The way to filter, exclude and drill through data is extremely well done and intuitive.

The user’s community is not very active on the web, probably due to the young age of the product. The online help and videos offered on the SAP website are not always useful and most of the time boring to watch.

Lumira allows you to export an image by email or export the dataset to excel, csv, SAP Hana, Streamwork and Explorer.

As Tableau, Lumira allows users to have more than one data source in their documents. Lumira offers connections to xls, csv, databases (additional configuration is required), SAP Hana and BusinessObjects 3 & 4.

A predictive module is available in the same interface. This module is also very promising. I saw a demo of the tool but I haven’t had a chance to play with it yet.

Lumira is HTML5 compatible and integrated to SAP BI4 Mobile suite, so the visualizations should be available to tablets and Smartphones.

Overall, the product is less stable than Tableau, in particular when the data source is modified. The installation was difficult and the help of a Windows Administrator might be required to use BusinessObjects universes (files that require administrator privileges need to be copied and updated manually).

This tool is limited compared to Tableau, but the functions available work fine and are easy to use.

Here are a few examples of geospatial limitations in Lumira:

  • Custom maps are not integrated.
  • The maximum point to display on a map is 3000. In my POC this was not enough to display the required information.
  • Only one measure at a time can be displayed on a single point.
  • Only bubble, Choropleth (map in which areas are shaded in proportion to a measure) or pie chart could be displayed on a map.

One interesting function in Lumira is the combination of Lat/Long and measure on a bar chart. Lumira will automatically display the country, region and sub-region on the x axis.

Another problem that will probably be fixed in the future release is the fact that you cannot use time dimension, Lumira only recognizes date dimension. It seems that this problem is specific to certain connectors, such as the one for BusinessObjects.

The integration of Intelli-sense in the editable field is very sharp; it makes it very easy and fast to create new variables or formula.

SAP Lumira helpdesk was also tested and here the service was also excellent.

Updates of Lumira are very frequent and we could assume to the product will evolve significantly in the near future. According to SAP, the focus of the development team for Lumira is SAP Hana; this could have a negative impact if you are planning to use Lumira with BO.

SAP offers a cloud version of Lumira to share and create visualization but right now the data source import is quite limited (csv and xlsx only). The cloud version is licensed as a subscription service with monthly charges with a minimum of a one year term. A free edition is available with 1GB of data. The Enterprise version starts at 24$USD per user.

 

Finally, here is my impression after testing these tools.

The current version of Lumira is not mature enough to justify an investment at this time. However based on the last release, this tool will quickly evolve and more functionalities will be added.

Tableau, on the other hand, is a tool that could potentially help the business users better understand their data. The negative aspect is that IT will have to invest significant amount of time to recreate the semantic layer before the users can use the tool (if you are planning to use a connection to a database).

Based on the cost perspective, both products are aligned (between 990$ to 1999$ depending on data source you want to use). Tableau Desktop is free for students!

If you need to select a tool right now, I think that Tableau is the best option. But SAP seems to invest massively in Lumira and this product could really have a bright future if you are an SAP shop.

Please share your comments or experience with these tools.

Raphael Colsenet – October 2013

2013-10-11 : Last week a new version of Lumira was released (sp12). This new version has a very nice user interface in HTML5 and it is now possible a create storyboards (composite of visualizations), a great feature! The personal version of Lumira is also now available for free! http://store.businessobjects.com/store/bobjamer/en_US/Content/pbPage.sap-lumira?source=text-na-estore-get-lumira&&resid=UlhAQwrR-gIAACnN0P4AAAAM&rests=1381515330685

%d bloggers like this: