allBlogsList

Quick Test of Tableau Data Extract With 86M Rows

It would be difficult to use Tableau for more than a few weeks without running across Tableau Data Extracts.  As you know these can be used to save copies, or subsets of your data, allowing you more rapid access.  There are also features for timed and/or incremental refreshing of data.

I’ve used these in a variety of ways, but generally where connectivity to databases was an issue, or I wanted a subset of data.  Lately I’ve been working on projects for several XCentium clients where a significant amount of data is involved, and query performance is poor for various real-life reasons.  I happened to have a database with 86 million rows of email productivity data on my laptop, so I thought I'd do a quick and dirty test to prove it would be worth our clients' dollar, setup cost, etc. 

My experiment involved using my (average developer) laptop, local SQL Server instance and local Tableau Desktop to compare performance of Tableau retrieving the data from SQL Server vs. retrieving from a Tableau Data Extract.  I did this by refreshing a dashboard that had 8 different views of this data and then by building a new workbook.  Each time I started a new test I recycled the SQL instance to make sure there were no caching benefits beyond what could be expected in real life.  My Tableau Data Extract did not use any additional features, such as rolling up the extract to visible dimensions. 

Here are the results:

Database

Tableau Data Extract

Storage size of the row data and indexes

5.2 GB – Relevant tables only

327.4 MB

Time to refresh test dashboard

426 seconds

14 seconds

With new workbook:

Add 1 measure

122 seconds

< 1 second

Add Dimension 1

19 seconds

< 1 second

Add Dimension 2

19 sec

< 1 second

Add Dimension 3

21 sec

< 1 second

Add Dimension 4

28 sec

< 1 second

Add Dimension 5

53 sec

3 sec

Add Dimension 6

51 sec

3 sec

Add Dimension 7

57 sec

4 sec

Add Dimension 8

67 sec

5 sec

Add Measure 2

114 sec

5 sec

As you can see, under the circumstances of my test, the usability improvement is significant. 

What are the costs?

  1. The data extract must be built.  This took 49 minutes in my environment – again average laptop with SQL Server and Tableau on the same box.  You can lessen the impact of building the extract by refreshing automatically during non-business hours.  You can also do incremental updates, if you have a dimension to indicate what should be refreshed.
  2. Extracted data is only as recent as the last extract refresh. But, if you can optimize your extract time, and can extract frequently, you can have close to live.

What can be concluded from this test?  I’m confident it will be worth the effort for us to prototype use of Tableau Data Extracts when clients need additional performance and don’t need up-to-the-minute data.  Even though our dataset was relatively slender, (11 mostly non-text dimensions and 8 measures), I will also recommend it to some clients with wider datasets to see how it fits their needs. 

Further reading indicates that this test may not come close to pushing the limits of Tableau Data Extracts.  I am looking forward to trying this with bigger, fatter data.  

Matt Glover, XCentium

Vice President, Commerce​

Matt Glover