Behind the Scenes of Tableau Server: Unveiling the Purpose & Functionality of It’s Components

10 min readMar 9, 2023

As a Tableau developer, I’ve spent a lot of time creating interactive dashboards using Tableau Desktop and publishing them to Tableau Server. However, I recently became curious about how Tableau Server actually works behind the scenes to enable such seamless collaboration and real-time data sharing.

In my quest to understand the intricacies of Tableau Server, I went through various resources. In this article, I’ll be sharing my findings and shedding light on the key components that power Tableau Server.

Whether you’re a Tableau enthusiast, an analytics professional, or simply someone interested in understanding the magic behind Tableau Server, this article will provide you with a comprehensive overview of its architecture and components. So, join me on this journey of exploration and discovery, as we unravel the mysteries of Tableau Server together.

Dhrumit will be accompanying us on this journey. (He is a very good friend of mine!)

Suppose Dhrumit is going to use Tableau server. We will discuss the flow of architecture accordingly.

As soon as he logs in he will interact with the Gateway!

Gateway: Gateway checks whether Dhrumit is even allowed to be here or not! :D

Functionalities:
1. Routing
-User in our case Dhrumit interacts with Tableau Server through HTTP requests! Weather from web browser, desktop.
-User sends HTTP traffic request to tableau server. And tableau server tells what to do.
-The gateway route the HTTP request to either Application Server or VizQL Server.
2. Load balancing
-If we have multi node environment or tableau server cluster, gateway decides → to which node the request must be directed to.
-It achieves this in round robin fashion.
3. Server static files
-Gateway serve up custom logo, static files to the user within the browser.
-Note: if multi node cluster
— if multiple gateways
— then custom files must be in the same place on each node to make sure they get the same custom logo every time.
Keep these in mind:
1. If multiple nodes are included. It is advised to have external load balancer.
-Note: external load balancer DOES NOT eliminates the need of multiple gateways!
2. External load balancer:
-If one of the node goes down → then it still communicates the traffic to gateway.
-Therefore, you must install gateway on each node!
3. Gateway is required where there are VizQL/ Viz Portal(application server) processes.
4. Can look into gateway log files to optimize it further.

After crossing the gateway post Dhrumit interacts with the Application Server!

2. Application Server It’s time to log in!

Functionalities:
1. Authentication & Authorization
-User credentials could be in:
— local: checks his/her/them entry in repository.
— active directory.
2. Web UI rendering
-Manages all of the web UI.
3. Publishing
-Application Server divides it into xml, raw code.
-Figures out flat file and store them in correct place. (Repository, File Store, Data Engine)
Keep these in mind:
1. Default: 1 application server per node.
— 2 in case of assurance.
2.Not resource intensive
— application server does not require intensive amount of CPU, RAM etc.
— If this happens: Reason: High number of REST APIs or Tab cmd requests can impact the service.

It’s time to do some searching now! For this tableau server got Search & Browse. Let us see how this can help Dhrumit.

3. Search and Browse:

Index behind repository
Need not to be in same place as repository but it does need to be in the same place as gateway and application server.

Now, Dhrumit knows he can search for anything in tableau server. Let us help him understand how Repository works!

4. Repository:

Functionalities:
-Repository is source of tonnes of application data.
— xml code is there behind workbook → which is published to tableau server → xml code is stored in repository.
-Source of rich usage data.
-Security objects
— permission granted to users.
Keep these in mind:
1. Default: 1 repository somewhere on the server.
— if needs highly available : have 2 repositories → one active and another passive.
— can’t have more than that.
2. Not resource intensive or a strictly licensed process.
3. Tableau automatically streams data from active to passive repository continuously all the time.

Up till now we’ve got content. Let us get the data behind it! For that Dhrumit is now going to interact with Data Server!

5. Data Server:
Data Server is the core and key to governance in the platform.

Functionalities:
-Extracts and live connections available on the tableau server is due to Data Server.
-allows multiple workbooks to use the same data source.
-Acts as a proxy to published data sources. Making sure that the data sent is in the language which is database understandable.
-Enforces permissions.
-Centralizes driver deployment.

Data Engine will get us the data to work on! Let us see how this happens!?

Now that Dhrumit got the data, he wants information which he can use to make some decisions. For that, we now need to query the data. We got a component in Data Engine for that as well! Drum rolls.. !! Query Engine!!

Query Engine uses Hyper to query the data.

Hyper:

Functionalities:
-Creates extracts
-Queries extracts
-Query federation
— Handles cross database joins like Hadoop and PostgreSQL.
— Performs locally within an hyper extract
Keep these in mind:
1. Super fast!
2. Consumes CPU (whole) as much as it can hold to perform operations!

Dhrumit knows that Hyper is super fast however it consumes CPU as much as it can. So, what should he do if he has extracts with long running queries in his setup?

Tableau Server got new deployment methodology!

Hyper On a Standalone Service (HOSS):

Keep these in mind:
1. Hyper is built to consume as much CPU of memory as it can get hold of. Therefore, it queries things fast! Also, scales up to many cores efficiently.
2. If interested in HOSS: put it on larger nodes. As lager nodes is directly proportional to faster query runs.
3. Hyper log file: if spooling flag is true it means that Hyper ran out of memory! And has to start writing to the disk.
4. Only 1 data engine per node. As it recovers itself.
5. Must exist on the same node as a Backgrounder to create extract, VizQL to handle federation queries, Application Server, Data Server or File Store just so it can do its job.

Now, that Dhrumit has started querying the data the next thing to point out is File Store!

6. File Store: Lives on the same place as the data engine.

Functionalities:
-Controls the storage of extracts.
In highly available environments, the File Store ensures that extracts are synchronized to other file store nodes so they are available if one file store node stops running.
Keep these in mind:
1. If removing a node with file store on, Decommission it first!
2. Use tsm topology filestore decommissioncommand.
3. This puts the File Store instance into read-only mode and copies any unique data contained in the instance to the other File Store(s) in the cluster.

We have gone through the data journey. Dhrumit’ most of the way through his server processes at this point and now it’s time for him to to load up his favorite dashboard. To do that he is going to use a little bit of help from VizQL Server!

7. VizQL Server:
-VizQL stands for Visual Query Language. It revolutionary visual language created by Chris Stotle and Pat Hanrahan.
-It is VizQL which turns vizs into some type of database language to then be processed to then turn it back into some type of viz.
-VizQL speaks multiple database languages frequently.

Keep these in mind:
1. Two VizQL per node is recommended.
2. Each process ~ up to 50 active users.
3. Number of VizQL is directly proportional to throughput.
— This may not result in more speed.
— Processes may not be queued as much.
4. Count of VizQL can be increased or decreased without restarting the server.
5. Requires Data Engine to be installed on the same node.

Now, Dhrumit wants Simoni (another really good friend of mine!) to check the dashboard he build. For this, Simoni hits the Cache Server!

8. Cache Server:

Functionalities:
-Handles the requests on behalf of the user.
— When navigate the same report or dashboard. If it is already been accessed all of it is stored in the Cache Server.
-Stores from Data Server as well same Application Server and Backgrounder.
Keep these in mind:
1. Single threaded! Means, number of Cache Server is directly proportional to throughput.
2. Not CPU intensive, only relies on memory.
3. Default: 2 Cache Servers are enough. Do NOT exceed 6!
4. Location is not important.

As Simoni successfully accessed the dashboard and complemented Dhrumit, we can say farewell to her now.

We will now talk about Backgrounder!

9. Backgrounder:

Functionalities:
-Refreshes extracts.
-Creates extracts.
-Responsible for prep flows.
-All user subscriptions → for most wanted content, top views etc.
— Backgrounder handles the process of triggering those subscriptions.
-Handles alerts too.
— Note: Alert set a threshold like anytime its true.
— -If live connection to dashboard → checks every 15 minutes!
— -If using extract → runs immediately once that extract is finished.
-Extract Refreshes, Subscriptions and alerts mention in the task section are run and handled by the Backgrounder.

Keep these in mind:
1. Default: 2 per node.
2. Count of Backgrounder can be increased or decreased without restarting the server.
— Advantage: Can increase or decrease the number of backgrounder according to the number of users in day and night.
3. Requires Data Engine on the same node.
4. Can be highly CPU, disk, I/O and network intensive.
Scenario:
Using lots of extracts. Tonnes of schedules running lots of extracts. In this scenario Backgrounder gets intensive.
To ensure this doesn’t impacts user’s loading content at the same time.
ISOLATE the Backgrounder.

As we mentioned Backgrounder runs prep flows. So, what does the Prep Conductor do. Let us see.

9. Prep Conductor

Functionalities:
-Publish flows to tableau server.
-Responsible for running the flow.
-Checks connection credentials are met.
-Tracks history and sends alert if flow fails.
-When flow needs to be run the Backgrounder manages that process.
Keep these in mind:
1. Preps are more complex than extracts.
2. Isolation is highly recommended in the event that you’re using Tableau Prep.
3. Use TSM command to set whether or not a backgrounder process handles flows.

Let us help Dhrumit understand the last component. Ask Data!

10. Ask Data
Ask Data has two main components.

Elastic Search:
-List of processes.
-When you go to any data source → Ask Data will kick in. It will start indexing. To figure out metadata behind it.
-Elastic search handles this process. Stores metadata. To figure out what types of fields are there, most common, highest results, minimum results etc.
Ask Data:
-List of processes on tableau server.
-Turns user typing into understandable query to be sent to the data source. Data Engine handles the query.
Things to keep in mind:
1. Set heap size for elastic search service. If large number of data sources, increase it!
2. Ask data is largely Data Engine constrained. Scaling will depend on Hyper.

Finally, we’ve gone through the user journey! At this time we can let Dhrumit go.. .

Now, it’s time to welcome the Tableau Server Admin. A Tableau Server Admin must know Cluster Controller and TSM. All of the six components mentioned in the bottom of the box play key role in management and governance of Tableau Server process.

We will first talk about Cluster Controller!

1. Cluster Controller

Tableau server status tab. Cluster Controller reports it.
Manages and monitors all of the services to make sure they are working.

2. Licensing

Lives on the initial node(the first installed node).
This component checks if the tableau server is licensed in every 72 hours.
Note: If initial node goes down. And somehow, you cannot recover it in 72 hour. Then move the licensing server to another node.

3. Client File Service

Manages certificates for the server.
If using single sign-on or kerberos then all the certificates are managed by the CFS.
Install it everywhere where the coordination services.

4. Coordination Service

Single source of truth!
-If setting up tableau server cluster that needs to be highly available.
-Should have 3 to 5 coordination services depending on how big your tableau server is.
Determines which node acts as the leader in the point of any decision.
Ensures the quorum is met. When the server is highly available.

5. Admin Agent

Handles all those TSM requests that changes the things like hot topology.
Added a new node? Admin Agent checks for configurations. Makes sure all are replicated.
Installs itself on every single node.

6. Tableau Services Manager

Also called as ‘Admin on the Beach!’
Where you can remotely connect and manage tableau server.
Automatically installs itself on each and every node!

Thank you for taking the time to read this article. I hope that the information presented has been useful to you in some way.

As I am always looking to improve the quality of my content, I would greatly appreciate any feedback that you have to offer. If there are any questions that you have, please don’t hesitate to reach out to me.
linked IN

Thank you again for your attention, and I look forward to hearing from you.

Behind the Scenes of Tableau Server: Unveiling the Purpose & Functionality of It’s Components

Written by Srishti Kanojiya