# Customer
# Documentation

Data Search¶

This functionality is in Beta state

Table of Contents

Search Request
Data Formats
Requests/Response Examples
Known Limitations

The data search API endpoint enables searching for available data sets and retrieve them either in their raw format, or pre-processed by an analysis pipeline.

The endpoint is located at:

https://access.hyprint.de/api/app/ext/v1/data/search

and accepts JSON POST requests protected by an Authorization: Bearer TOKEN header as described in API Setup.

Search Request¶

The Search Request format is a simple Json document which can be used to set the search parameters and return format.

The basic schema of the JSON format is:

{
    format : { // Optional, test against the data set data format
        match: "TEST", // String to use as test format
        contains: true|false //default: true, Matches datasets whose format property contains  the "match" string
        regexp: true|false // default, false, the "match" string is a regexp to match the dataset format property against
        case: true|false // default: false, standard match is case sensitive

    },
    forDevices: [{ // Array of search matching agains the device ID
        // Same schema as the format property
    }]
    timestamp: { // Optional, filter by push timestamp (preset parameter takes precende over since/until)
        preset: "day|week|month|year" // optional, select pushs for the day, the week, the month or the year,
        since: timestampEpochMs, // optional, Datasets after the provided timestamp in milliseconds
        until: timestampEpochMs // optional, Datasets before the provided timestamp in milliseconds
    },
    latestForEachDevice: true | false, //default: false - Limit to only the latest dataset for each device ID
    result: { // Request dataset results
        count: true | false, // default: true, only count the datasets matching the filters
        uuidOnly: true | false, // default: false, Only return the UUID of the datasets
        timestampOnly: true | false, // default: false, Only return the timestamp of the selected result
        raw: true | false,// default: true, will return the raw data of the dataset
        pipelines: [{
            id: "PIPELINE ID" // run the pipeline ID on the dataset and return the results
        }]

    }
}

Data Formats¶

The format {...} matcher requires to know the various data formats of datasets, which changes based on the sensor type.

At the moment, the only format available is represented by the string:

gzip+binary:temperature:kelvin:10.6

To search for these datasets, a simple match request with "temperature" string is enough:

{
     format : { // test against the data set data format
        match: "temperature"
    }
}

Requests/Response Examples¶

Common Response format¶

All the API Responses are wrapped in a common response format which tends to mirror the HTTP status, but can be used to report some application state to describe the returned content.

The common JSON response has the following format:

{
    "code": interger, // the response code, same as HTTP code most of the time
    "status": "OK", // a status message for the request. "OK" is reported for most successful responses
    "content": object | Array // The actual response payload for the called api endpoint
}

Counting the datasets¶

The search endpoint always returns a count of the found datasets, but won't return any data if a result return was not specified

To count all the Datasets, POST the following request, which you can save to a file and load from the command line

{

}

curl -H@auth_headers -H "Content-Type: application/json" -d @request.json https://access.hyprint.de/api/app/ext/v1/data/search

{
    "code": 200,
    "status": "OK"
    "content": {
        "count": 389 // For example There are 389 datasets
    },

}

or count the datasets of the day:

{
    "timestamp": {
        "preset": "day"
    }
}

Warning Only the timestamp filter "day" is available right now

To further limit only to the latest dataset for a specific Device ID:

{
  "latestForEachDevice": true
}

Response:

{
    "code": 200,
    "status": "OK"
    "content": {
        "count": 97 // There are only 97 datasets left
    },

}

Retrieving the latest dataset for a sensor¶

Take note of the Sensor ID, for example the NFC ID in the Android Application
In this example we'll assume the sensor ID is 04:42:C2:00:E7:26:0D, replace with your own

First we can use the count parameter to ensure some data is available:

{
  "latestForEachDevice": true,
  "forDevices": [{
      "match": "04:42:C2:00:E7:26:0D"
  }]
}

Response:

{
    "code": 200,
    "content": {
        "count": 1
    },
    "status": "OK"
}

Now we can request the UUID of the dataset, which is a unique ID set to each dataset:

{
  "latestForEachDevice": true,
  "forDevices": [
    {
      "match": "04:42:C2:00:E7:26:0D"
    }
  ],
  "result": {
    "uuidOnly": true
  }
}

Response:

{
    "code": 200,
    "content": {
        "count": 1,
        "results": [
            {
               "forDevice": "uid:hw:nfc/04:42:C2:00:E7:26:0D",
                "uuid": "61a319ce-5681-4187-9b05-ce74481a1f3c"
            }
        ] 
    },
    "status": "OK"
}

As we can see, the count is still 1, but the UUID can be saved so that further processing can be requested for that specific dataset later on.

To retrieve the raw data, we can now unconstraint the results. The search will then return the raw data by default:

{
  "latestForEachDevice": true,
  "forDevices": [
    {
      "match": "04:42:C2:00:E7:26:0D"
    }
  ],
  "result": {
      "count": false,
  }
}

Response:

{
    "code": 200,
    "content": {
        "count": 1,
        "results": [ {
            // DeviceID and UUID for the result are returned along side the requested extended data
            "uuid" : "UUID"
            "forDevice": "uid:hw:nfc/04:42:C2:00:E7:26:0D",
            "raw":  { // DATASET

                ...
                "uuid" : "UUID"
                "forDevice": "uid:hw:nfc/04:42:C2:00:E7:26:0D",
                "format": "gzip+binary:temperature:kelvin:10.6",
                ...
                "metadata": {
                    "classifiers": [
                        // Array of strings tagging the dataset
                        // "temperature" should be available for a temperature measuring sensor
                    ],
                    "descriptors": [ // Array of name+value parameters to describe the dataset- For example Hardware version
                        {
                            "name": NAME // String with the name of the descriptor,
                            "value": VALUE // String with the value of the descriptor
                        }
                    ]

                }
            }
        }]

    }
}

Retrieving a post-processed data set¶

Warning

This section is incomplete and will be completed shortly once the API has been improved

The raw datasets might not be convenient to use since they may not provide a lot of details about the data.

The Hyprint data backend most common usage thus relies on post-processsing the raw sensor data to extract useful information to be fed to other components like a Web UI, a report generator or a customer piece of software.

graph LR DataSet --> id1[[Post-Processing Pipeline]] --> PP[Processed and Enriched Data] PP --> R[Report] --> D[Download] PP --> PR[Presentation]

This first version of the API access module doesn't provide an endpoint to post-process a specific dataset, but the processing can be requested as part of the search request.

The main advantage of this architecture is that pipelines can be improved over time and the data consumers can always re-process raw datasets.

For the purpose of this example, we are going to use a Temperature Analysis pipeline to extract the temperature points and time information that can be used to reconstruct the temperature curve of a measurement:

{
  "latestForEachDevice": true,
  "forDevices": [
    {
      "match": "04:42:C2:00:E7:26:0D"
    }
  ],
  "result": {
      "count": false,
      "raw": false, // Remove the raw dataset from the results to limit the result size
      "pipelines": [{
          "id" : "apps.datacheck.analysis.temperature.v3"
      }]
  }
}

Response:

{
    "code": 200,
    "content": {
        "count": 1,
        "results": [ {
            // DeviceID and UUID for the result are returned along side the requested extended data
            "uuid" : "UUID"
            "forDevice": "uid:hw:nfc/04:42:C2:00:E7:26:0D",
            "pipelineResult": {  // Result of the Temperature analysis pipeline

                "pointSeries": [{
                    "points": [ ARRAY of DOUBLE ]
                }],
                // Statistics return extended information about the dataset
                "statistics:" : [
                    ...
                        {
                        "id": "count.points",
                        "name": "Number of points",
                        "value": 157
                    }
                    ...
                ],
            }
        }]

    }
}

Note

The Standard return format for pipelines is not documented yet

Optimising the JSON Pipeline return¶

As we have seeen in the previous example, the pipeline results can be quite expansive, especially the statistics. This behavior is desired since it can enable a presentation layer like a Web-App or a report generation tool to present the data in a generic manner based on the data characteristics returned by the pipeline.

However, if your application relies on a more compact dataformat where the data types is known in advance, the return structure can be optimised.

For this purpose, you can add a pipeline processing stage at the end of the chain that will re-map JSON values based on a provided specifications.

Let's improve the latest example from the previous section:

{
  "latestForEachDevice": true,
  "forDevices": [
    {
      "match": "04:42:C2:00:E7:26:0D"
    }
  ],
  "result": {
    "count": false,
    "raw": false,
    "pipelines": [
      {
        // First pass the dataset through the temperature analysis
        "id": "apps.datacheck.analysis.temperature.v3"
      },
      {
        // Second, the temperature analysis result is passed to the json mapper
        "id": "com.hyprint.core.json.mapper.v1",
        // Metadatas are parameters passed to the Pipeline
        "metadatas": [
          {
            "id": "mapper.format", // mapper.format is the parameter expected by the pipeline to process the input json
            "json": {
              "mapValues": { // Map values will map every sub-key to a new value based on a set of transformations
                "statistics": { // We are mapping the statistics json object
                  "objectKeysToArray": ["id", "value"], // Each statistic is transformed to an array with the values of the "id" and "value" keys
                  "toObject": true // Transform the current array of values to an object, using each value's first element as the element key.
                }
              }
            }
          }
        ]
      }
    ]
  }
}

The statistics element in the response would now be similar to this output, where the id of each statistic is now an object key, and the value key its value:

 "statistics": {
    "count.points": 157,
    "hw.id": "04:42:C2:00:E7:26:0D",
    "hw.version": "HLATL-R01/20.04.01.00",
    "temp.max": 26.62,
    "temp.min": 24.01,
    "time.interval": 2,
    "time.push": 1619464121625,
    "time.start": 1599470861000
}

Known Limitations¶

Timestamp filters month and year are not available
A new filter "day-X" for X days in the past will be added in the next version to easily make a daily search
There's no paging to limit the response size