Difference between revisions of "Using the Datonis Analytics Feature (Notebooks)"
m (Tag: Visual edit) |
|||
Line 71: | Line 71: | ||
DataFrames can also be accessed as SQL tables by performing the following operation | DataFrames can also be accessed as SQL tables by performing the following operation | ||
%pyspark | %pyspark | ||
− | + | table_name = "workcenter_hierarchy" | |
+ | dataframe.createOrReplaceTempView(table_name) | ||
+ | %sql | ||
+ | select * from workcenter_hierarchy | ||
Saving and publishing a Machine Learning Model: This function lets you | Saving and publishing a Machine Learning Model: This function lets you |
Revision as of 12:43, 13 June 2018
Datonis Documentation Home > Using the Datonis Analytics Feature (Notebooks)
Introduction
The Datonis Platform provides a python notebook feature for performing interactive and exploratory data analysis and running machine learning algorithms on your sensor data. This feature is generally useful when you have collected a sizeable amount of data (for a few days/months).
A notebook provides an integrated containerized big data computing environment. It lets the platform user run plain python programs or distributed programs using the pyspark framework which can be made scalable by provisioning a separate hadoop cluster based on the processing needs. The main advantage of the notebook feature is the ability to perform data analysis close to where the thing/sensor data is stored in the platform, thus eliminating the need for expensive and time consuming ETL in order to get the data to an environment where analytics can be performed. This is a premium feature and only available in the enterprise accounts.
Getting Started
A notebook features an interactive UI editor and the code is organized into paragraphs. A notebook can have multiple paragraphs for logically separating steps to be performed in a program. Paragraphs can be run individually or all together in an order. Each paragraph can optionally print some output or even display graphical elements like charts etc, when performing interactive analysis of data.
Datonis Provider
The DatonisProvider is a python library that provides APIs related to the Datonis environment. A datonis provider instance can be created as follows
%pyspark from datonis import DatonisProvider dp = DatonisProvider(sc)
This provides the 'dp' object that will be used to call the API methods
Here 'sc' is a special variable automatically created at the start of the notebook program depicting the Spark Execution Context. This is typical of all spark/pyspark programs.
The Datonis provider has 3 main APIs:
- get_thing_data: Loading This API lets you load data of your things/sensors from the timeseries store. Its returns a distributed collection of data points grouped into named columns called as a dataframe (See Dataframes). This API takes in the following inputs:
- time range: start time and end time between which to fetch the sensor data. These should be python datetime objects.
- thing_template_key: A string representing the thing_template key for which to fetch data.
- thing_keys: An array of strings represeting the thing keys for which to fetch data.
- metrics/properties: An array of properties of the things you are interested in.
- timegroup (optional): The aggregation/grouping level to be applied based on the time domain before returning data.
- raw: This is the default option. This returns all data for the thing/sensor without applying any aggregation or grouping.
- minute: Data is aggregated on a minute level. Hence you get upto 60 data points if you query for a hour of data.
- n_minute: Data is aggregated on a 'n' minute level. E.g. a value of '5_minute' will get upto 12 data points if you query for a hour of data.
- hour: Data is aggregated on a hourly level.
- month: Data is aggregated on a monthly level
- day: Data is aggregated on a day level.
- timezone: The timezone to use to return data. By default this is UTC.
A few examples of how to call the API are given below:
#Getting Raw Data thing_template_key = "972adct84d" # Thing_template_key of things thing_keys = ["t7f7b9a3ea","4bc27857f1","c241314e1a"] # Thing keys whose data we wish to aggregate metrics = ["pressure", "forging_temp", "job.value"] # List of metrics that we want to aggregate thing_data_frames = dp.get_thing_data(start_time, end_time, thing_template_key, thing_keys, metrics) thing_data_frames.show() +--------------------+----------+----------------+--------+------------+-----+ | ts| thing_key| thing_name|pressure|forging_temp|value| +--------------------+----------+----------------+--------+------------+-----+ |2018-05-16 01:38:...|c241314e1a|Forging Hammer-1| 28.0| 1196.0| 2.0| |2018-05-16 01:38:...|t7f7b9a3ea|Forging Hammer-2| 14.0| 1023.0| 3.0| |2018-05-16 01:38:...|4bc27857f1|Forging Hammer-3| 21.0| 815.0| 4.0| |2018-05-16 01:39:...|c241314e1a|Forging Hammer-1| 11.0| 1065.0| 2.0| |2018-05-16 01:39:...|t7f7b9a3ea|Forging Hammer-2| 12.0| 870.0| 2.0| |2018-05-16 01:39:...|4bc27857f1|Forging Hammer-3| 22.0| 786.0| 2.0| |2018-05-16 01:39:...|c241314e1a|Forging Hammer-1| 7.0| 1133.0| 2.0| |2018-05-16 01:39:...|t7f7b9a3ea|Forging Hammer-2| 8.0| 940.0| 4.0| |20
#Getting Minute level Data metrics = ["pressure", "forging_temp"] thing_data_frames = dp.get_thing_data(start_time, end_time, thing_template_key, thing_keys, metrics, timegroup="minute") thing_data_frames.show() +--------------------+----------+----------------+-----------------+-------------------+-----------------+-----------------+-----------------+-------------+---------------+-------------+-------------+-------------+ | ts| thing_key| thing_name|forging_temp::sum|forging_temp::count|forging_temp::max|forging_temp::min|forging_temp::avg|pressure::sum|pressure::count|pressure::max|pressure::min|pressure::avg| +--------------------+----------+----------------+-----------------+-------------------+-----------------+-----------------+-----------------+-------------+---------------+-------------+-------------+-------------+ |2018-05-16 01:38:...|t7f7b9a3ea|Forging Hammer-2| 1023.0| 1.0| 1023.0| 1023.0| 1023.0| 14.0| 1.0| 14.0| 14.0| 14.0| |2018-05-16 01:38:...|c241314e1a|Forging Hammer-1| 1196.0| 1.0| 1196.0| 1196.0| 1196.0| 28.0| 1.0| 28.0| 28.0| 28.0| |2018-05-16 01:38:...|4bc27857f1|Forging Hammer-3| 815.0| 1.0| 815.0| 815.0| 815.0| 21.0| 1.0| 21.0| 21.0| 21.0| |2018-05-16 01:39:...|t7f7b9a3ea|Forging Hammer-2| 1810.0| 2.0| 940.0| 870.0| 905.0| 20.0| 2.0| 12.0| 8.0| 10.0| |2018-05-16 01:39:...|4bc27857f1|Forging Hammer-3| 786.0| 1.0| 786.0| 786.0| 786.0| 22.0| 1.0| 22.0| 22.0| 22.0| |2018-05-16 01:39:...|c241314e1a|Forging Hammer-1| 2198.0| 2.0| 1133.0| 1065.0| 1099.0| 18.0| 2.0| 11.0| 7.0| 9.0| |2018-05-16 01:40:...|c241314e1a|Forging Hammer-1| 1148.0| 1.0| 1148.0| 1148.0| 1148.0| 25.0| 1.0| 25.0| 25.0| 25.0| |2018-05-16 01:40:...|4bc27857f1|Forging Hammer-3| 1695.0| 2.0| 874.0| 821.0| 847.5| 31.0| 2.0| 19.0| 12.0| 15.5|
DataFrames can also be accessed as SQL tables by performing the following operation
%pyspark table_name = "workcenter_hierarchy" dataframe.createOrReplaceTempView(table_name) %sql select * from workcenter_hierarchy
Saving and publishing a Machine Learning Model: This function lets you