How to Setup a Hyperscale Compliance Job

Pre-checks

You must check the following before starting a job:

Storage space must be 3 times the size of the source data for NFS storage.
You must have sufficient storage in the target DB for loading the masked data.
You must check and increase the size of the temporary tablespace in Oracle. For example, if you have 4 billions rows, then you must use 100G.
You must check and provide the required permission (after VDB creation) on empty VDB mounted folder on the Hyperscale VM.

Note

Permission that is granted before VDB creation will not work. It happens so because Continuous Data Engine removes the write permission from VDB mounted folder after VDB creation.
Based on the umask value for the user that is used to mount, the permissions for the staging area directory could get altered after the NFS share has been mounted. In such cases, you must re-apply the permissions (i.e 770) on the staging area directory.
You must restart the containers/services after changing the permission on VDB mounted folder in case you already have created the containers.
Continuous Compliance Engine should be cleaned up before use and should only be used with Hyperscale Job. Any other masking job on Continuous Compliance Engine apart from Hyperscale Compliance Engine will impact the performance of Hyperscale Compliance jobs.
Currently, the Hyperscale Complinace Engine doesn’t provide the ability to allow you to configure masking job’s behaviour in case of non-conformant data and does not process non-conformant data warning from the Delphix Continuous Compliance Engine. Therefore, it is recommended to verify the value of DefaultNonConformantDataHandling algorithm group setting on all the Hyperscale Compliance Engines. For more information, refer to the Algorithm Group Settings section. It is recommended to set the value to FAIL so that Hyperscale Job will also fail instead of leaving the data unmasked.
If you want to redirect the logs of one or more containers to a particular directory, then you have an option to do the same by setting up a logging directory and exposing the same, as a volume binding, in the docker-compose.yaml file. This directory again must have a group ownership id of 50 and a permission mode of 770 as below:

volumes:
- hyperscale-controller-data:/data:rw
- /mnt/hyperscale:/etc/hyperscale
- /home/hyperscale_user/logs/controller_service:/opt/delphix/logs
If the table that you are masking has column type of BLOB/CLOB, then you must have a minimum of 2GB memory per CLOB/BLOB column. Depending upon unload-split you are using, you may need to increase this memory in multiple of that. For example, if you have 4 tables (each with 1 column as BLOB/CLOB type) and unload-split is 3, then your memory requirement on the Hyperscale Compliance host will be: (4(no. of tables) x 2(memory required per CLOB/BLOB column) x 3(unload-split used)GB + 16 GB (minimum required memory for running Hyperscale Compliance Engine) = 40 GB approx.

API Flow to Setup a Hyperscale Compliance Job

The following is the API flow for setting up and executing a Hyperscale Compliance job.

Register Continuous Compliance Engine(s)
Create a Mount Point
Create Connector Info
Create a Dataset
Create a Job
Create Execution

The following are the sample API requests/responses for a typical Hyperscale Compliance job execution workflow. The APIs can be accessed using a swagger based API client by accessing url https://<hyperscale-compliance-host-address>/hyperscale-compliance.

Note

APIs must be called only in the below order.

Engines API

POST /engines (Register an engine):

Request:

{
"name": "Delphix Continuous Compliance Engine 6.0.14.0 on AWS",
"type": "MASKING",
"protocol": "http",
"hostname": "de-6014-continuous-compliance.delphix.com",
"username": "hyperscale_compliance_user",
"password": "password123"
}

Response:

{
"id": 1,
"name": "Delphix Continuous Compliance Engine 6.0.14.0 on AWS",
"type": "MASKING",
"protocol": "http",
"hostname": "de-6014-continuous-compliance.delphix.com",
"username": "hyperscale_compliance_user",
"ssl": true,
"ssl_hostname_check": true
}

MountFileSystems API

POST /mount-filesystems (Create a File Mount)

Request:

{
"mountName": "staging_area",
"hostAddress": "de-6014-continuous-data.dlpxdc.co",
"mountPath": "/domain0/group-2/appdata_container-12/appdata_timeflow-13/datafile",
"mountType": "NFS4",
"options": "rw"
}

Response:

{
"id": 1,
"mountName": "staging_area",
"hostAddress": "de-6014-continuous-data.dlpxdc.co",
"mountPath": "/domain0/group-2/appdata_container-12/appdata_timeflow-13/datafile",
"mountType": "NFS4",
"options": "rw"
}

ConnectorInfo API

POST /connector-info (Create Connector Info for hyperscale compliance)

Request:

{
"source": {
"jdbc_url": "jdbc:oracle:thin:@oracle-19-src.dlpxdc.co:1521/VDBOMSRDC20SRC",
"user": "oracle_db_user",
"password": "password123"
},
"target": {
"jdbc_url": "jdbc:oracle:thin:@rh79-ora-19-tgt.dlpxdc.co:1521/VDBOMSRDC200B_TGT",
"user": "oracle_db_user",
"password": "password123"
}
}

Response:

{
"id": 1,
"source": {
"jdbc_url": "jdbc:oracle:thin:@oracle-19-src.dlpxdc.co:1521/VDBOMSRDC20SRC",
"user": "oracle_db_user"
},
"target": {
"jdbc_url": "jdbc:oracle:thin:@rh79-ora-19-tgt.dlpxdc.co:1521/VDBOMSRDC200B_TGT",
"user": "oracle_db_user"
}
}

Warning

A failure in the load or pre/post load steps (disabling/enabling constraints, triggers etc.) may leave the target database in an inconsistent state since the load step truncates the target tables when it begins. If the source and target connectors are configured to be the same database/tables, a best practice is to restore the single database from a backup after a failure since the source database may be in an inconsistent state (rather than only the target database).

DataSets API

POST /data-sets (Create DataSet for hyperscale compliance)

Request (With Single Table):

{
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
    "schema_name": "SCHEMA_1",
    "table_name": "TABLE_1",
    "unload_split": 4
},
"target": {
    "schema_name": "SCHEMA_1_TARGET",
    "table_name": "TABLE_1_TARGET",
    "stream_size": 65536
},
"masking_inventory": [
    {
    "field_name": "FIRST_NAME",
    "domain_name": "FIRST_NAME",
    "algorithm_name": "FirstNameLookup"
    },
    {
    "field_name": "LAST_NAME",
    "domain_name": "LAST_NAME",
    "algorithm_name": "LastNameLookup"
    }
]
}
]
}

Response (With Single Table):

{
"id": 1,
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
    "schema_name": "SCHEMA_1",
    "table_name": "TABLE_1",
    "unload_split": 4
},
"target": {
    "schema_name": "SCHEMA_1",
    "table_name": "TABLE_1",
    "stream_size": 65536
},
"masking_inventory": [
    {
    "field_name": "FIRST_NAME",
    "domain_name": "FIRST_NAME",
    "algorithm_name": "FirstNameLookup"
    },
    {
    "field_name": "LAST_NAME",
    "domain_name": "LAST_NAME",
    "algorithm_name": "LastNameLookup"
    }
]
}
]
}

Request (With multiple tables):

{
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"masking_inventory": [
{
"field_name": "col_VARCHAR",
"domain_name": "FIRST_NAME",
"algorithm_name": "FirstNameLookup"
}
]
},
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"masking_inventory": [
{
"field_name": "COL_TIMESTAMP",
"domain_name": "DOB",
"algorithm_name": "DateShiftVariable",
"date_format": "yyyy-MM-dd HH:mm:ss.SSS" -->(optional field, this needs to be added only while working with date/time masking)
}
]
}
]
}

Response (With multiple tables):

{
"id": 1,
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"masking_inventory": [
{
"field_name": "col_VARCHAR",
"domain_name": "FIRST_NAME",
"algorithm_name": "FirstNameLookup"
}
]
},
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"masking_inventory": [
{
"field_name": "COL_TIMESTAMP",
"domain_name": "DOB",
"algorithm_name": "DateShiftVariable",
"date_format": "yyyy-MM-dd HH:mm:ss.SSS"
}
]
}
]
}

Note

Algorithm and Domain names to be provided in Data Set request should be used from Continuous Compliance Engine. The Continuous Compliance Engine APIs that could be used to get these names are:

Get all algorithms (GET /algorithms) for Algorithm Names. Sample Endpoint: https://maskingdocs.delphix.com/maskingApiEndpoints/5_1_15_maskingApiEndpoints.html#getAllAlgorithms
Get all domains (GET /domains) for Domain Names. Sample Endpoint: https://maskingdocs.delphix.com/maskingApiEndpoints/5_1_15_maskingApiEndpoints.html#getAllDomains

To check about extra parameters that need to be provided in the Data Set request for Date and Multi Column Algorithms, refer to Model DataSet_masking_inventory on Hyperscale Compliance API Documentation page available in API Reference section of this Documentation.

Jobs API

POST /jobs (Create a Hyperscale Compliance Job)

Request:
{
"name": "job_1",
"masking_engine_ids": [
1,2,3
],
"data_set_id": 1,
"app_name_prefix": "app_1",
"env_name_prefix": "env_1",
"retain_execution_data": “NO”,
"masking_job_config": {
"max_memory": 2048,
"min_memory": 1024,
"description": "Job created by Hyperscale Masking",
"feedback_size": 100000,
"stream_row_limit": 10000,
"num_input_streams": 1
}
}

Note

For more information on retain_execution_data flag, see Cleaning Up Execution Data.

Response:

{
"id": 1,
"name": "job_1",
"masking_engine_ids": [
1,
2,
        3
],
"data_set_id": 1,
"app_name_prefix": "app_1",
"env_name_prefix": "env_1",
"retain_execution_data": “NO”,
"masking_job_config": {
"feedback_size": "100000",
"min_memory": "1024",
"description": "Job created by Hyperscale Masking",
"stream_row_limit": "10000",
"max_memory": "2048",
"num_input_streams": "1"
}
}

JobExecution API

POST /executions (Create an execution of a Hyperscale Job)

Request:

{
"job_id": 1
}

Response: (Immediate response will be like below. Realtime response can be fetched using GET /executions/{execution_id} endpoint)

{
"id": 1,
"job_id": 1,
"status": "RUNNING",
"create_time": "2022-06-14T12:46:54.139452",
"tasks": [
{
  "name": "Unload"
},
{
  "name": "Masking"
},
{
  "name": "Load"
},
{
  "name": "Post Load"
}
]
}

GET /executions/{execution_id} (Returns the Job Execution by execution_id)

Request:

iD: 1

Response:

{
"id": 1,
"job_id": 1,
"status": "SUCCEEDED",
"create_time": "2022-06-10T11:58:39.385186",
"end_time": "2022-06-10T11:59:26.030750",
"tasks": [
{
  "name": "Unload",
  "status": "SUCCEEDED",
  "start_time": "2022-06-10T11:58:39.401906",
  "end_time": "2022-06-10T11:58:46.042788",
  "metadata": [
    {
      "source_key": "SCHEMA_1_TARGET.TABLE_1_TARGET",
      "unloaded_rows": 5,
      "total_rows": 5
    }
  ]
},
{
  "name": "Masking",
  "status": "SUCCEEDED",
  "start_time": "2022-06-10T11:58:39.666638",
  "end_time": "2022-06-10T11:59:16.034657",
  "metadata": [
    {
      "source_key": "SCHEMA_1_TARGET.TABLE_1_TARGET",
      "masked_rows": 5,
      "total_rows": 5
    }
  ]
},
{
  "name": "Load",
  "status": "SUCCEEDED",
  "start_time": "2022-06-10T11:59:07.236429",
  "end_time": "2022-06-10T11:59:16.064497",
  "metadata": [
    {
      "source_key": "SCHEMA_1_TARGET.TABLE_1_TARGET",
      "loaded_rows": 5,
      "total_rows": 5
    }
  ]
},
{
  "name": "Post Load",
  "status": "SUCCEEDED",
  "start_time": "2022-06-10T11:59:16.072760",
  "end_time": "2022-06-10T11:59:16.072760"
}
]
}

Only in case of execution failure, the below API can be used to restart the execution:

PUT /executions/{execution_id}/restart (Restart a failed execution)
Below API can be used only for manually cleaning up the execution:

DELETE /executions/{execution_id} (Clean-up the execution)