Difference between revisions of "Using the Datonis Analytics Feature (Notebooks)"

From Datonis
Jump to: navigation, search
m
Line 69: Line 69:
  
  
 
+
DataFrames can also be accessed as SQL tables by performing the following operation
 +
%pyspark
 +
    table_name = "workcenter_hierarchy"
 
Saving and publishing a Machine Learning Model: This function lets you
 
Saving and publishing a Machine Learning Model: This function lets you

Revision as of 12:42, 13 June 2018

Datonis Documentation Home > Using the Datonis Analytics Feature (Notebooks)

Introduction

The Datonis Platform provides a python notebook feature for performing interactive and exploratory data analysis and running machine learning algorithms on your sensor data. This feature is generally useful when you have collected a sizeable amount of data (for a few days/months).

A notebook provides an integrated containerized big data computing environment. It lets the platform user run plain python programs or distributed programs using the pyspark framework which can be made scalable by provisioning a separate hadoop cluster based on the processing needs. The main advantage of the notebook feature is the ability to perform data analysis close to where the thing/sensor data is stored in the platform, thus eliminating the need for expensive and time consuming ETL in order to get the data to an environment where analytics can be performed. This is a premium feature and only available in the enterprise accounts.

Getting Started

A notebook features an interactive UI editor and the code is organized into paragraphs. A notebook can have multiple paragraphs for logically separating steps to be performed in a program. Paragraphs can be run individually or all together in an order. Each paragraph can optionally print some output or even display graphical elements like charts etc, when performing interactive analysis of data.

Datonis Provider

The DatonisProvider is a python library that provides APIs related to the Datonis environment. A datonis provider instance can be created as follows

%pyspark
from datonis import DatonisProvider
dp = DatonisProvider(sc)

This provides the 'dp' object that will be used to call the API methods

Here 'sc' is a special variable automatically created at the start of the notebook program depicting the Spark Execution Context. This is typical of all spark/pyspark programs.

The Datonis provider has 3 main APIs:

  • get_thing_data: Loading This API lets you load data of your things/sensors from the timeseries store. Its returns a distributed collection of data points grouped into named columns called as a dataframe (See Dataframes). This API takes in the following inputs:
    • time range: start time and end time between which to fetch the sensor data. These should be python datetime objects.
    • thing_template_key: A string representing the thing_template key for which to fetch data.
    • thing_keys: An array of strings represeting the thing keys for which to fetch data.
    • metrics/properties: An array of properties of the things you are interested in.
    • timegroup (optional): The aggregation/grouping level to be applied based on the time domain before returning data.
      • raw: This is the default option. This returns all data for the thing/sensor without applying any aggregation or grouping.
      • minute: Data is aggregated on a minute level. Hence you get upto 60 data points if you query for a hour of data.
      • n_minute: Data is aggregated on a 'n' minute level. E.g. a value of '5_minute' will get upto 12 data points if you query for a hour of data.
      • hour: Data is aggregated on a hourly level.
      • month: Data is aggregated on a monthly level
      • day: Data is aggregated on a day level.
    • timezone: The timezone to use to return data. By default this is UTC.

A few examples of how to call the API are given below:

   #Getting Raw Data
   thing_template_key = "972adct84d" # Thing_template_key of things
   thing_keys = ["t7f7b9a3ea","4bc27857f1","c241314e1a"] # Thing keys whose data we wish to aggregate
   metrics = ["pressure", "forging_temp", "job.value"] # List of metrics that we want to aggregate
   thing_data_frames = dp.get_thing_data(start_time, end_time, thing_template_key, thing_keys, metrics)
   thing_data_frames.show()
   +--------------------+----------+----------------+--------+------------+-----+
   |                  ts| thing_key|      thing_name|pressure|forging_temp|value|
   +--------------------+----------+----------------+--------+------------+-----+
   |2018-05-16 01:38:...|c241314e1a|Forging Hammer-1|    28.0|      1196.0|  2.0|
   |2018-05-16 01:38:...|t7f7b9a3ea|Forging Hammer-2|    14.0|      1023.0|  3.0|
   |2018-05-16 01:38:...|4bc27857f1|Forging Hammer-3|    21.0|       815.0|  4.0|
   |2018-05-16 01:39:...|c241314e1a|Forging Hammer-1|    11.0|      1065.0|  2.0|
   |2018-05-16 01:39:...|t7f7b9a3ea|Forging Hammer-2|    12.0|       870.0|  2.0|
   |2018-05-16 01:39:...|4bc27857f1|Forging Hammer-3|    22.0|       786.0|  2.0|
   |2018-05-16 01:39:...|c241314e1a|Forging Hammer-1|     7.0|      1133.0|  2.0|
   |2018-05-16 01:39:...|t7f7b9a3ea|Forging Hammer-2|     8.0|       940.0|  4.0|
   |20
   #Getting Minute level Data
   metrics = ["pressure", "forging_temp"]
   thing_data_frames = dp.get_thing_data(start_time, end_time, thing_template_key, thing_keys, metrics, timegroup="minute")
   thing_data_frames.show()
   +--------------------+----------+----------------+-----------------+-------------------+-----------------+-----------------+-----------------+-------------+---------------+-------------+-------------+-------------+
   |                  ts| thing_key|      thing_name|forging_temp::sum|forging_temp::count|forging_temp::max|forging_temp::min|forging_temp::avg|pressure::sum|pressure::count|pressure::max|pressure::min|pressure::avg|
   +--------------------+----------+----------------+-----------------+-------------------+-----------------+-----------------+-----------------+-------------+---------------+-------------+-------------+-------------+
   |2018-05-16 01:38:...|t7f7b9a3ea|Forging Hammer-2|           1023.0|                1.0|           1023.0|           1023.0|           1023.0|         14.0|            1.0|         14.0|         14.0|         14.0|
   |2018-05-16 01:38:...|c241314e1a|Forging Hammer-1|           1196.0|                1.0|           1196.0|           1196.0|           1196.0|         28.0|            1.0|         28.0|         28.0|         28.0|
   |2018-05-16 01:38:...|4bc27857f1|Forging Hammer-3|            815.0|                1.0|            815.0|            815.0|            815.0|         21.0|            1.0|         21.0|         21.0|         21.0|
   |2018-05-16 01:39:...|t7f7b9a3ea|Forging Hammer-2|           1810.0|                2.0|            940.0|            870.0|            905.0|         20.0|            2.0|         12.0|          8.0|         10.0|
   |2018-05-16 01:39:...|4bc27857f1|Forging Hammer-3|            786.0|                1.0|            786.0|            786.0|            786.0|         22.0|            1.0|         22.0|         22.0|         22.0|
   |2018-05-16 01:39:...|c241314e1a|Forging Hammer-1|           2198.0|                2.0|           1133.0|           1065.0|           1099.0|         18.0|            2.0|         11.0|          7.0|          9.0|
   |2018-05-16 01:40:...|c241314e1a|Forging Hammer-1|           1148.0|                1.0|           1148.0|           1148.0|           1148.0|         25.0|            1.0|         25.0|         25.0|         25.0|
   |2018-05-16 01:40:...|4bc27857f1|Forging Hammer-3|           1695.0|                2.0|            874.0|            821.0|            847.5|         31.0|            2.0|         19.0|         12.0|         15.5|


DataFrames can also be accessed as SQL tables by performing the following operation

%pyspark
    table_name = "workcenter_hierarchy"

Saving and publishing a Machine Learning Model: This function lets you