TRExt: Documentation¶
TRExt is short for Tableau Refresh Extract (Externally).
TRExt provides an API to create a .tde extract from a database and publish to a Tableau Server, and this does not need the Tableau Desktop to set up.
This documentation contains instructions on how to best use TRExt with your database, as well as the API documentation if you want to get into the details.
The repository is available on GitHub if you wish to fork, contribute or just install and play around.
Contents¶
TRExt¶
TRExt is short for Tableau Refresh Extract (Externally).
TRExt is a means to refresh a Tableau Extract (.tde files) externally so the Tableau Server can serve visual content without having to compete for resources while refreshing extracts internally.
Dependencies¶
The main dependencies are:
- Tableau SDK
- pyodbc
The repo also supports
- pyodbc wrapper such as EXASol Python SDK
Installation¶
You need pip to install TRExt.
You can install the latest version of the package straight from PyPI using:
$ pip install trext
You can also directly install from GitHub directly using:
$ pip install git+git@github.com:AtheonAnalytics/trext.git
or
$ pip install git+https://github.com/AtheonAnalytics/trext.git
Usage¶
Create an extract
>>> import trext
>>> tde = trext.Extract()
>>> connection_string = "appropriate db connection string"
>>> tde.create("db.schema.table", conn_string=connection_string, dbtype='exasol')
Created!
>>> tde.location
/temp/extract.tde
Publish to Tableau Server (overwrites existing extract)
>>> tableau_auth_details = ("username", "password")
>>> publish_details = ("site_content_url", "project_name")
>>> tde.publish("tableau server address", auth=tableau_auth_details, params=publish_details)
Published!
Refreshing an extract is now replaced with creating and publishing an extract. You can use this is conjunction with TabAuto (not yet open source) or with Tableau’s server-client-python library to get the datasource names that need refreshing.
Documentation¶
More detailed documentation is available at http://trext.rtfd.io
Disclaimer¶
TRExt is still a Work-in-Progress
I wrote most of this codebase when Tableau SDK was released for Tableau 8 and never got around to moving it from a POC/local copy to open source, so this a rough-and-ready type of library.
This is fair warning to anyone who uses this library: there will be bugs, bad documentation and no tests for a short while till I fix it up. So please use with care and if you find issues submit a bug report or a PR.
If you want to contribute and add tests, better documentation, new connectors, cleaner interface etc, please do and submit a PR.
Oh and don’t forget to add yourself to AUTHORS
Note: I have tested TRExt only on a Linux distro, so if you find any issues on other Operating Systems please do create a bug report and I can try to fix it, but if you do know how to fix it please also submit a PR.
Usage Instructions¶
The current version of TRExt works with:
- Python 2.7
- pyodbc == 4.0.16
- Tableau SDK <= 10.1.4 (see Notes)
- EXASol Python SDK >=5.0.13 and <= 5.0.17
Installation of dependencies¶
You will need to install Tableau SDK and pyodbc as a minimum to use this library. The EXASol library will be needed if you want to connect to EXASol to create an extract.
pyodbc¶
PyODBC should be installed when you install TRExt from pip see Installation. But if you have any issues setting up pyodbc to connect to a database you might find the PyODBC wiki useful.
Tableau SDK Installation¶
The documentation for Tableau SDK is available at the Tableau Documentation.
The library I have used to develop and test with is available in the vendor folder and this is only for Linux Distribution. If you prefer to use versions for other Operating systems you can find them here.
The installation of the Linux version is as follows:
Download the correct tar.gz file for Python (32 or 64 bit depending on your OS) and then,
$ tar -xzvf Tableau-SDK-Python-Linux-xxBit-10-x-x.tar.gz
$ cd /Tableau*/
$ python setup.py install
EXASol SDK Installation¶
Skip to the next section if you do not use EXASol.
The EXASol Python SDK can be downloaded from their Download section. We use version 5.0.17 for this version of TRExt but I have also tested on some older versions.
Download from Packages and SDK the version you want, preferably in the tar.gz format and install as follows,
$ tar -xzvf EXASolution_SDK-5.0.xx.tar.gz
$ cd /EXASolution_SDK-5.0.xx/Python/
$ python setup.py install
Additional Setup¶
To connect to MSSQL you need to set up the MSSQL driver and driver manager.
Installation of the ODBC driver for Linux is available from Microsoft.
Instructions on setting up the driver manager to connect to MSSQL is available from pyodbc.
Once you have your connection information in odbc.ini you should be able to test the connection with the DSN parameter using pyodbc. You should get a pyodbc.Connection object if the connection was successful.
$ python -c 'import pyodbc; print(pyodbc.connect("DSN=MySQLServerDatabase;UID=username;PWD=password"))'
<pyodbc.Connection object at 0x7f8597333200>
Usage of the api¶
Create a TRExt Extract¶
The TRExt Extract is the interface to the create and publish methods. Initialise the Extract,
>>> import trext
>>> tde = trext.Extract()
Create an extract for MSSQL¶
Assuming that you can connect to the MSSQL server, you can now create a .tde extract using the following,
>>> conn_string = "DSN=MySQLServerDatabase;UID=username;PWD=password"
>>> tde.create("db.schema.table", conn_string=conn_string)
Created!
Create an extract for EXASol¶
Note here that in the create api we use an extra argument called dbtype set to ‘exasol’. This is how TRExt extends to other databases. Currently only MSSQL and EXAsol have been tested.
>>> conn_string = "DSN=EXAServer"
>>> tde.create("db.schema.table", conn_string=conn_string, dbtype='exasol')
Created!
Location of the extract¶
Once you have created the extract and you want to know the location of your extract simply do,
>>> tde.location
/temp/location/of/extract.tde
Publish to Tableau Server¶
The default behaviour of publish is to overwrite the existing extract. This will be extended in the future versions.
>>> tableau_auth_details = ("username", "password")
>>> publish_details = ("site_content_url", "project_name")
>>> tde.publish("tableau server address", auth=tableau_auth_details, params=publish_details)
Published!
Close the Extract¶
Once you are done creating and/or publishing an extract, perform the close operation,
>>> tde.close()
This api ensures that the tde created locally gets destroyed.
Publish existing .tde to Tableau Server¶
You can also use this api to publish a local .tde file to the Tableau Server, simply set the location of the TRExt extract to the path of the .tde you want to publish
>>> tde.location = "local/path/to/extract.tde"
>>> tde.publish("tableau server address", auth=tableau_auth_details, params=publish_details)
Published!
Notes¶
(*) I have not tested to see if Tableau SDK still supports versions 8 and 9 but this code was based on Tableau SDK for Tableau 8; another area that needs testing and improvement for TRExt.
API Documentation¶
trext.api¶
trext.api.Extract¶
-
class
trext.api.
Extract
(is_temp=True)[source]¶ Bases:
object
A Tableau Extract.
Provides the endpoint to create and/or publish .tde extracts. Usage:
Create an extract >>> import trext >>> tde = trext.Extract() >>> connection_string = “appropriate db connection string” >>> tde.create(“db.schema.table”, conn_string=connection_string) Created! >>> tde.location /temp/extract.tde
Publish to Tableau Server (overwrites existing extract)
>>> tableau_auth_details = ("username", "password") >>> publish_details = ("site_content_url", "project_name") >>> tde.publish("tableau server address", auth=tableau_auth_details, params=publish_details) Published!
Clean up after create and/or publish >>> tde.close()
-
close
()[source]¶ Clean up on exit. Delete the extract only if temporary flag is True
Returns: if it is not a temporary extract
-
create
(view_or_table_name, conn_string, dbtype=None)[source]¶ Method to create an extract based on a view or a table on a database
Parameters: - view_or_table_name – view or table to create an extract from
- conn_string – connection string to the database
- dbtype – type of database to connect to
Returns: Created! or Failed! message on creation
-
location
¶ Returns: location of the .tde file if it was created or set up
-
publish
(host_address, auth, params)[source]¶ Publish to Tableau Server (overwrites existing extract)
Parameters: - host_address – Address of the Tableau server to publish to
- auth – a tuple of username and password for authentication
- params – currently a typle of two parameters: site to publish to and project name
Returns: Message on completing publishing
-
trext.db¶
The db package deals with connecting to the Database, pulling the column metadata and the data from the tables, and modifying the data to match the Tableau Type.
trext.db.conn¶
trext.db.consume¶
trext.db.fill¶
trext.db.typemap¶
trext.db.utils¶
-
trext.db.utils.
format_date
(date_to_format)[source]¶ Formats the date value from database to the tableau format
Parameters: date_to_format – date value from db Returns: formatted date
trext.extract¶
The Extract package deals with creating, filling up and puslishing a .tde extract.
trext.extract.build¶
-
class
trext.extract.build.
ExtractBuilder
[source]¶ Bases:
object
Builds the tableau Extract by creating a Tableau Extract, defines the table skeleton, adds the table to the extract and fills this table with the relevant data for the TDE.
-
connect_to_db
(view_or_table_name, conn_string, dbtype=None)[source]¶ Connect to the view or table that needs to be turned into a .tde extract
Parameters: - view_or_table_name – View or Table that needs to be an extract
- conn_string – connection string to the database where the view or table exists
- dbtype – type of db so the right pyodbc wrapper is used to connect
-
trext.extract.exceptions¶
trext.extract.server¶
connection for tableau
-
class
trext.extract.server.
Tableau
[source]¶ Bases:
object
Tableau server connection class
-
connect
(host, username, password, site_content_url='Default')[source]¶ Connect to the Tableau server
Parameters: - host – address of the Tableau server
- username – Tableau Server username
- password – Tableau Server password
- site_content_url – Site to publish to
-
publish
(tde_path, project_name='Default', datasource_name=None, overwrite=True)[source]¶ Publishes an extract to the Tableau Server
Parameters: - tde_path – path of tde to publish
- project_name – name of project on the Tableau site to publish to
- datasource_name – the name of the .tde to publish as
- overwrite – boolean to flag if the .tde needs an overwrite when publishing
-