How to Setup a Hyperscale Compliance Job
Pre-checks
You must check the following before starting a job:
- Storage space must be 2 times the size of the source data for NFS storage.
- You must have sufficient storage in the target DB for loading the masked data.
- You must check and increase the size of the temporary tablespace in Oracle. For example, if you have 4 billions rows, then you must use 100G.
- 
You must check and provide the required permission (after VDB creation) on empty VDB mounted folder on the Hyperscale VM. Note Permission that is granted before VDB creation will not work. It happens so because Continuous Data Engine removes the write permission from VDB mounted folder after VDB creation. 
- 
Based on the umask value for the user that is used to mount, the permissions for the staging area directory could get altered after the NFS share has been mounted. In such cases, you must re-apply the permissions (i.e 770 or 700) on the staging area directory. 
- You must restart the containers/servicesafter changing the permission on VDB mounted folder in case you already have created the containers.
- Continuous Compliance Engine should be cleaned up before use and should only be used with Hyperscale Job. Any other masking job on Continuous Compliance Engine apart from Hyperscale Compliance Engine will impact the performance of Hyperscale Compliance jobs.
- Currently, the Hyperscale Complinace Engine doesn’t provide the ability to allow you to configure masking job’s behaviour in case of non-conformant data and does not process non-conformant data warning from the Delphix Continuous Compliance Engine. Therefore, it is recommended to  verify the value of  DefaultNonConformantDataHandlingalgorithm group setting on all the Hyperscale Compliance Engines. For more information, refer to the Algorithm Group Settings section. It is recommended to set the value toFAILso that Hyperscale Job will also fail instead of leaving the data unmasked.
- 
If you want to redirect the logs of one or more containers to a particular directory, then you have an option to do the same by setting up a logging directory and exposing the same, as a volume binding, in the docker-compose.yamlfile. TThis directory again must have a group ownership id of 50 with a permission mode of 770 or a user id of 65436 with a permission of 700 as below:volumes: - hyperscale-controller-data:/data:rw
- /mnt/hyperscale:/etc/hyperscale
- /home/hyperscale_user/logs/controller_service:/opt/delphix/logs
 
- 
If the table that you are masking has column type of BLOB/CLOB, then you must have a minimum of 2GB memory per CLOB/BLOB column. Depending upon unload-split you are using, you may need to increase this memory in multiple of that. For example, if you have 4 tables (each with 1 column as BLOB/CLOB type) and unload-split is 3, then your memory requirement on the Hyperscale Compliance host will be: (4(no. of tables) x 2(memory required per CLOB/BLOB column) x 3(unload-split used)GB + 16 GB (minimum required memory for running Hyperscale Compliance Engine) = 40 GB approx.
API Flow to Setup a Hyperscale Compliance Job
The following is the API flow for setting up and executing a Hyperscale Compliance job.
- Register Continuous Compliance Engine(s)
- Create a Mount Point
- Create Connector Info
- Create a Dataset
- Create a Job
- Create Execution
The following are the sample API requests/responses for a typical Hyperscale Compliance job execution workflow. The APIs can be accessed using a swagger based API client by accessing url https://<hyperscale-compliance-host-address>/hyperscale-compliance.
Note
- APIs must be called only in the below order.
Engines API
POST /engines (Register an engine):
Request:
{
"name": "Delphix Continuous Compliance Engine 6.0.14.0 on AWS",
"type": "MASKING",
"protocol": "http",
"hostname": "de-6014-continuous-compliance.delphix.com",
"username": "hyperscale_compliance_user",
"password": "password123"
}
Response:
{
"id": 1,
"name": "Delphix Continuous Compliance Engine 6.0.14.0 on AWS",
"type": "MASKING",
"protocol": "http",
"hostname": "de-6014-continuous-compliance.delphix.com",
"username": "hyperscale_compliance_user",
"ssl": true,
"ssl_hostname_check": true
}
MountFileSystems API
POST /mount-filesystems (Create a File Mount)
Request:
{
"mountName": "staging_area",
"hostAddress": "de-6014-continuous-data.dlpxdc.co",
"mountPath": "/domain0/group-2/appdata_container-12/appdata_timeflow-13/datafile",
"mountType": "NFS4",
"options": "rw"
}
Response:
{
"id": 1,
"mountName": "staging_area",
"hostAddress": "de-6014-continuous-data.dlpxdc.co",
"mountPath": "/domain0/group-2/appdata_container-12/appdata_timeflow-13/datafile",
"mountType": "NFS4",
"options": "rw"
}
ConnectorInfo API
POST /connector-info (Create Connector Info for hyperscale compliance)
Request:
{
"source": {
"jdbc_url": "jdbc:oracle:thin:@oracle-19-src.dlpxdc.co:1521/VDBOMSRDC20SRC",
"user": "oracle_db_user",
"password": "password123"
},
"target": {
"jdbc_url": "jdbc:oracle:thin:@rh79-ora-19-tgt.dlpxdc.co:1521/VDBOMSRDC200B_TGT",
"user": "oracle_db_user",
"password": "password123"
}
}
Response:
{
"id": 1,
"source": {
"jdbc_url": "jdbc:oracle:thin:@oracle-19-src.dlpxdc.co:1521/VDBOMSRDC20SRC",
"user": "oracle_db_user"
},
"target": {
"jdbc_url": "jdbc:oracle:thin:@rh79-ora-19-tgt.dlpxdc.co:1521/VDBOMSRDC200B_TGT",
"user": "oracle_db_user"
}
}
Warning
A failure in the load or pre/post load steps (disabling/enabling constraints, triggers etc.) may leave the target database in an inconsistent state since the load step truncates the target tables when it begins. If the source and target connectors are configured to be the same database/tables, a best practice is to restore the single database from a backup after a failure since the source database may be in an inconsistent state (rather than only the target database).
DataSets API
Info
Table and schema names are case sensitive.
POST /data-sets (Create DataSet for hyperscale compliance)
Request (With Single Table):
{
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
    "schema_name": "SCHEMA_1",
    "table_name": "TABLE_1",
    "unload_split": 4
},
"target": {
    "schema_name": "SCHEMA_1_TARGET",
    "table_name": "TABLE_1_TARGET",
    "stream_size": 65536
},
"masking_inventory": [
    {
    "field_name": "FIRST_NAME",
    "domain_name": "FIRST_NAME",
    "algorithm_name": "FirstNameLookup"
    },
    {
    "field_name": "LAST_NAME",
    "domain_name": "LAST_NAME",
    "algorithm_name": "LastNameLookup"
    }
]
}
]
}
Response (With Single Table):
{
"id": 1,
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
    "schema_name": "SCHEMA_1",
    "table_name": "TABLE_1",
    "unload_split": 4
},
"target": {
    "schema_name": "SCHEMA_1",
    "table_name": "TABLE_1",
    "stream_size": 65536
},
"masking_inventory": [
    {
    "field_name": "FIRST_NAME",
    "domain_name": "FIRST_NAME",
    "algorithm_name": "FirstNameLookup"
    },
    {
    "field_name": "LAST_NAME",
    "domain_name": "LAST_NAME",
    "algorithm_name": "LastNameLookup"
    }
]
}
]
}
Request (With multiple tables):
{
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"masking_inventory": [
{
"field_name": "col_VARCHAR",
"domain_name": "FIRST_NAME",
"algorithm_name": "FirstNameLookup"
}
]
},
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"masking_inventory": [
{
"field_name": "COL_TIMESTAMP",
"domain_name": "DOB",
"algorithm_name": "DateShiftVariable",
"date_format": "yyyy-MM-dd HH:mm:ss.SSS" -->(optional field, this needs to be added only while working with date/time masking)
}
]
}
]
}
Response (With multiple tables):
{
"id": 1,
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"masking_inventory": [
{
"field_name": "col_VARCHAR",
"domain_name": "FIRST_NAME",
"algorithm_name": "FirstNameLookup"
}
]
},
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"masking_inventory": [
{
"field_name": "COL_TIMESTAMP",
"domain_name": "DOB",
"algorithm_name": "DateShiftVariable",
"date_format": "yyyy-MM-dd HH:mm:ss.SSS"
}
]
}
]
}
Note
Algorithm and Domain names to be provided in Data Set request should be used from Continuous Compliance Engine. The Continuous Compliance Engine APIs that could be used to get these names are:
- 
Get all algorithms (GET /algorithms) for Algorithm Names. Sample Endpoint: https://maskingdocs.delphix.com/maskingApiEndpoints/5_1_15_maskingApiEndpoints.html#getAllAlgorithms 
- 
Get all domains (GET /domains) for Domain Names. Sample Endpoint: https://maskingdocs.delphix.com/maskingApiEndpoints/5_1_15_maskingApiEndpoints.html#getAllDomains 
To check about extra parameters that need to be provided in the Data Set request for Date and Multi Column Algorithms, refer to Model DataSet_masking_inventory on Hyperscale Compliance API Documentation page available in API Reference section of this Documentation.
Jobs API
POST /jobs (Create a Hyperscale Compliance Job)
Request:
{
  "name": "Test_Job",
  "masking_engine_ids": [
    1,2,3
  ],
  "data_set_id": 1,
  "app_name_prefix": "Test_App",
  "env_name_prefix": "Test_Env",
  "retain_execution_data": "NO",
  "source_configs": {
    "max_concurrent_source_connection": 30
  },
  "target_configs": {
    "max_concurrent_target_connection": 30
  },
  "masking_job_config": {
    "max_memory": 1024,
    "min_memory": 1024,
    "description": "Job created by Hyperscale Masking",
    "feedback_size": 100000,
    "stream_row_limit": 10000,
    "num_input_streams": 1
  }
}
Note
For more information on retain_execution_data flag, see Cleaning Up Execution Data.
Response:
{
  "id": 1,
  "name": "Test_Job",
  "masking_engine_ids": [
    1,
    2,
    3
  ],
  "data_set_id": 1,
  "app_name_prefix": "Test_App",
  "env_name_prefix": "Test_Env",
  "retain_execution_data": "NO",
  "source_configs": {
    "max_concurrent_source_connection": 30
  },
  "target_configs": {
    "max_concurrent_target_connection": 30
  },
  "masking_job_config": {
    "max_memory": 1024,
    "min_memory": 1024,
    "description": "Job created by Hyperscale Masking",
    "feedback_size": 100000,
    "stream_row_limit": 10000,
    "num_input_streams": 1
  }
}
JobExecution API
POST /executions (Create an execution of a Hyperscale Job)
Request:
{
"job_id": 1
}
Response: (Immediate response will be like below. Realtime response can be fetched using GET /executions/{execution_id} endpoint)
{
"id": 1,
"job_id": 1,
"status": "RUNNING",
"create_time": "2022-06-14T12:46:54.139452",
"tasks": [
{
  "name": "Unload"
},
{
  "name": "Masking"
},
{
  "name": "Load"
},
{
  "name": "Post Load"
}
]
}
GET /executions/{execution_id} (Returns the Job Execution by execution_id)
Note
The execution response may initially return approximate number of rows at the start of execution and provide actual values latter during the execution.
Request:
iD: 1
Response:
{
"id": 1,
"job_id": 1,
"status": "SUCCEEDED",
"create_time": "2022-06-10T11:58:39.385186",
"end_time": "2022-06-10T11:59:26.030750",
"tasks": [
{
  "name": "Unload",
  "status": "SUCCEEDED",
  "start_time": "2022-06-10T11:58:39.401906",
  "end_time": "2022-06-10T11:58:46.042788",
  "metadata": [
    {
      "source_key": "SCHEMA_1_TARGET.TABLE_1_TARGET",
      "unloaded_rows": 5,
      "total_rows": 5
    }
  ]
},
{
  "name": "Masking",
  "status": "SUCCEEDED",
  "start_time": "2022-06-10T11:58:39.666638",
  "end_time": "2022-06-10T11:59:16.034657",
  "metadata": [
    {
      "source_key": "SCHEMA_1_TARGET.TABLE_1_TARGET",
      "masked_rows": 5,
      "total_rows": 5
    }
  ]
},
{
  "name": "Load",
  "status": "SUCCEEDED",
  "start_time": "2022-06-10T11:59:07.236429",
  "end_time": "2022-06-10T11:59:16.064497",
  "metadata": [
    {
      "source_key": "SCHEMA_1_TARGET.TABLE_1_TARGET",
      "loaded_rows": 5,
      "total_rows": 5
    }
  ]
},
{
  "name": "Post Load",
  "status": "SUCCEEDED",
  "start_time": "2022-06-10T11:59:16.072760",
  "end_time": "2022-06-10T11:59:16.072760"
}
]
}
- 
Only in case of execution failure, the below API can be used to restart the execution: PUT /executions/{execution_id}/restart(Restart a failed execution)
- 
Below API can be used only for manually cleaning up the execution: DELETE /executions/{execution_id}(Clean-up the execution)