データモデルとスキーマ

Architecture セクションで述べたように、STH は MongoDB データベースを利用してデータを格納します。

STH コンポーネントは、必要な属性が時間を費やした値の進展に関する未加工データおよび集約されたデータを格納します。

STH は 3つの主要なデータモデルをサポートしています。それぞれの機能のモードは、関連する処理データおよび集約されたデータを保存するために別個のスキーマが使用されます。

最初のデータモデルまたは機能モードはいわゆる "Collection per Service Path" です。この名前のとおり、MongoDB コレクションは、未加工データの各サービスパスに対して作成され、エンティティおよびエンティティの属性に関連付けられた集約されたデータの別のものが、この特定のサービスパスに関連して作成されるという特徴があります。

未加工データ・コレクションの場合、その中に含まれるドキュメントは、次の未加工データ・ドキュメント例によって公開されるスキーマに従います :

{
  "_id": "57f4d8657905c024630c41dc",
  "recvTime": "2016-10-05T10:39:33.291Z",
  "entityId": "Entity:001",
  "entityType": "Entity",
  "attrName": "attribute:numeric:001",
  "attrType": "Number",
  "attrValue": "333"
}

On the other hand and regarding the aggregated data, the documents the aggregated collections includes follow the schema exposed by the next aggregated data document example (associated to the previous raw data entry) in the case of numeric attribute values (notice the resolution in this concrete example is day):

{
  "_id": {
    "attrName": "attribute:numeric:001",
    "attrType": "Number",
    "entityId": "Entity:001",
    "entityType": "Entity",
    "origin": "2016-10-01T00:00:00Z",
    "resolution" : "day"
  },
  "points": [
    {
      "offset": 5,
      "samples": 1,
      "sum": 333,
      "sum2": 110889,
      "min": 333,
      "max": 333
    }
  ]
}

whereas for textual attribute values an example aggregated data document is the following one (including the occurrences of each one of the textual values an attribute may have had over time):

{
  "_id": {
    "attrName": "attribute:textual:001",
    "attrType": "Text",
    "entityId": "Entity:001",
    "entityType": "Entity",
    "origin": "2016-10-01T00:00:00Z",
    "resolution" : "day"
  },
  "points": [
    {
      "offset": 5,
      "samples": 1,
      "occur": {
        "text1": 1,
        "text2": 4,
        "text3": 7,
      }
    }
  ]
}

The second data model or functioning mode is the so called "Collection per Entity". As it name denotes, it is characterized by the fact that a MongoDB collection is created for entity for the raw data and another one for the aggregated data associated to the entity's attributes.

In the case of the raw data collection, the documents it includes follows the schema exposed by the next raw data document example:

{
  "_id": "57f4d8657905c024630c41dc",
  "recvTime": "2016-10-05T10:39:33.291Z",
  "attrName": "attribute:numeric:001",
  "attrType": "Number",
  "attrValue": "333"
}

On the other hand and regarding the aggregated data, the documents the aggregated collections includes follow the schema exposed by the next aggregated data document example (associated to the previous raw data entry) in the case of numeric attribute values (notice the resolution in this concrete example is day):

{
  "_id": {
    "attrName": "attribute:numeric:001",
    "attrType": "Number",
    "origin": "2016-10-01T00:00:00Z",
    "resolution" : "day"
  },
  "points": [
    {
      "offset": 5,
      "samples": 1,
      "sum": 333,
      "sum2": 110889,
      "min": 333,
      "max": 333
    }
  ]
}

whereas for textual attribute values an example aggregated data document is the following one (including the occurrences of each one of the textual values an attribute may have had over time):

{
  "_id": {
    "attrName": "attribute:textual:001",
    "attrType": "Text",
    "origin": "2016-10-01T00:00:00Z",
    "resolution" : "day"
  },
  "points": [
    {
      "offset": 5,
      "samples": 1,
      "occur": {
        "text1": 1,
        "text2": 4,
        "text3": 7,
      }
    }
  ]
}

The third and final data model or functioning mode is the so called "Collection per Attribute". As it name denotes, it is characterized by the fact that a MongoDB collection is created for attribute for the raw data and another one for the aggregated data associated to the attribute.

In the case of the raw data collection, the documents it includes follows the schema exposed by the next raw data document example:

{
  "_id": "57f4d8657905c024630c41dc",
  "recvTime": "2016-10-05T10:39:33.291Z",
  "attrType": "Number",
  "attrValue": "333"
}

On the other hand and regarding the aggregated data, the documents the aggregated collections includes follow the schema exposed by the next aggregated data document example (associated to the previous raw data entry) in the case of numeric attribute values (notice the resolution in this concrete example is day):

{
  "_id": {
    "origin": "2016-10-01T00:00:00Z",
    "resolution" : "day"
  },
  "points": [
    {
      "offset": 5,
      "samples": 1,
      "sum": 333,
      "sum2": 110889,
      "min": 333,
      "max": 333
    }
  ]
}

whereas for textual attribute values an example aggregated data document is the following one (including the occurrences of each one of the textual values an attribute may have had over time):

{
  "_id": {
    "origin": "2016-10-01T00:00:00Z",
    "resolution" : "day"
  },
  "points": [
    {
      "offset": 5,
      "samples": 1,
      "occur": {
        "text1": 1,
        "text2": 4,
        "text3": 7,
      }
    }
  ]
}

Take into consideration that the data model or functioning mode can be fine tuned via the environment variable DATA_MODEL or the config.database.dataModel property of the config.js file.