How to Setup a Hyperscale Compliance Job
Pre-checks
You must check the following before starting a job:
- Storage space must be 3 times the size of the source data for NFS storage.
- You must have sufficient storage in the target DB for loading the masked data.
- You must check and increase the size of the temporary tablespace in Oracle. For example, if you have 4 billions rows, then you must use 100G.
-
You must check and provide the required permission (after VDB creation) on empty VDB mounted folder on the Hyperscale VM.
Note
Permission that is granted before VDB creation will not work. It happens so because Continuous Data Engine removes the write permission from VDB mounted folder after VDB creation.
-
Based on the umask value for the user that is used to mount, the permissions for the staging area directory could get altered after the NFS share has been mounted. In such cases, you must re-apply the permissions (i.e 770) on the staging area directory.
- You must restart the
containers/services
after changing the permission on VDB mounted folder in case you already have created the containers. - Continuous Compliance Engine should be cleaned up before use and should only be used with Hyperscale Job. Any other masking job on Continuous Compliance Engine apart from Hyperscale Compliance Engine will impact the performance of Hyperscale Compliance jobs.
- Currently, the Hyperscale Complinace Engine doesn’t provide the ability to allow you to configure masking job’s behaviour in case of non-conformant data and does not process non-conformant data warning from the Delphix Continuous Compliance Engine. Therefore, it is recommended to verify the value of
DefaultNonConformantDataHandling
algorithm group setting on all the Hyperscale Compliance Engines. For more information, refer to the Algorithm Group Settings section. It is recommended to set the value toFAIL
so that Hyperscale Job will also fail instead of leaving the data unmasked. -
If you want to redirect the logs of one or more containers to a particular directory, then you have an option to do the same by setting up a logging directory and exposing the same, as a volume binding, in the
docker-compose.yaml
file. This directory again must have a group ownership id of 50 and a permission mode of 770 as below:volumes:
hyperscale-controller-data:/data:rw
/mnt/hyperscale:/etc/hyperscale
/home/hyperscale_user/logs/controller_service:/opt/delphix/logs
-
If the table that you are masking has column type of BLOB/CLOB, then you must have a minimum of 2GB memory per CLOB/BLOB column. Depending upon unload-split you are using, you may need to increase this memory in multiple of that. For example, if you have 4 tables (each with 1 column as BLOB/CLOB type) and unload-split is 3, then your memory requirement on the Hyperscale Compliance host will be:
(4(no. of tables) x 2(memory required per CLOB/BLOB column) x 3(unload-split used)GB + 16 GB (minimum required memory for running Hyperscale Compliance Engine) = 40 GB approx
.
API Flow to Setup a Hyperscale Compliance Job
The following is the API flow for setting up and executing a Hyperscale Compliance job.
- Register Continuous Compliance Engine(s)
- Create a Mount Point
- Create Connector Info
- Create a Dataset
- Create a Job
- Create Execution
The following are the sample API requests/responses for a typical Hyperscale Compliance job execution workflow. The APIs can be accessed using a swagger based API client by accessing url https://<hyperscale-compliance-host-address>/hyperscale-compliance
.
Note
APIs must be called only in the below order.
Engines API
POST /engines (Register an engine):
Request:
{
"name": "Delphix Continuous Compliance Engine 6.0.14.0 on AWS",
"type": "MASKING",
"protocol": "http",
"hostname": "de-6014-continuous-compliance.delphix.com",
"username": "hyperscale_compliance_user",
"password": "password123"
}
Response:
{
"id": 1,
"name": "Delphix Continuous Compliance Engine 6.0.14.0 on AWS",
"type": "MASKING",
"protocol": "http",
"hostname": "de-6014-continuous-compliance.delphix.com",
"username": "hyperscale_compliance_user",
"ssl": true,
"ssl_hostname_check": true
}
MountFileSystems API
POST /mount-filesystems (Create a File Mount)
Request:
{
"mountName": "staging_area",
"hostAddress": "de-6014-continuous-data.dlpxdc.co",
"mountPath": "/domain0/group-2/appdata_container-12/appdata_timeflow-13/datafile",
"mountType": "NFS4",
"options": "rw"
}
Response:
{
"id": 1,
"mountName": "staging_area",
"hostAddress": "de-6014-continuous-data.dlpxdc.co",
"mountPath": "/domain0/group-2/appdata_container-12/appdata_timeflow-13/datafile",
"mountType": "NFS4",
"options": "rw"
}
ConnectorInfo API
POST /connector-info (Create Connector Info for hyperscale compliance)
Request:
{
"source": {
"jdbc_url": "jdbc:oracle:thin:@oracle-19-src.dlpxdc.co:1521/VDBOMSRDC20SRC",
"user": "oracle_db_user",
"password": "password123"
},
"target": {
"jdbc_url": "jdbc:oracle:thin:@rh79-ora-19-tgt.dlpxdc.co:1521/VDBOMSRDC200B_TGT",
"user": "oracle_db_user",
"password": "password123"
}
}
Response:
{
"id": 1,
"source": {
"jdbc_url": "jdbc:oracle:thin:@oracle-19-src.dlpxdc.co:1521/VDBOMSRDC20SRC",
"user": "oracle_db_user"
},
"target": {
"jdbc_url": "jdbc:oracle:thin:@rh79-ora-19-tgt.dlpxdc.co:1521/VDBOMSRDC200B_TGT",
"user": "oracle_db_user"
}
}
Warning
A failure in the load or pre/post load steps (disabling/enabling constraints, triggers, etc.) may leave the target database in an inconsistent state since the load step truncates the target tables when it begins. If the source and target connectors are configured to be the same database/tables, a best practice is to restore the single database from a backup after a failure since the source database may be in an inconsistent state (rather than only the target database).
DataSets API
POST /data-sets (Create DataSet for hyperscale compliance)
Request (With Single Table):
{
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
"schema_name": "SCHEMA_1",
"table_name": "TABLE_1",
"unload_split": 4
},
"target": {
"schema_name": "SCHEMA_1_TARGET",
"table_name": "TABLE_1_TARGET",
"stream_size": 65536
},
"masking_inventory": [
{
"field_name": "FIRST_NAME",
"domain_name": "FIRST_NAME",
"algorithm_name": "FirstNameLookup"
},
{
"field_name": "LAST_NAME",
"domain_name": "LAST_NAME",
"algorithm_name": "LastNameLookup"
}
]
}
]
}
Response (With Single Table):
{
"id": 1,
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
"schema_name": "SCHEMA_1",
"table_name": "TABLE_1",
"unload_split": 4
},
"target": {
"schema_name": "SCHEMA_1",
"table_name": "TABLE_1",
"stream_size": 65536
},
"masking_inventory": [
{
"field_name": "FIRST_NAME",
"domain_name": "FIRST_NAME",
"algorithm_name": "FirstNameLookup"
},
{
"field_name": "LAST_NAME",
"domain_name": "LAST_NAME",
"algorithm_name": "LastNameLookup"
}
]
}
]
}
Request (With multiple tables):
{
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"masking_inventory": [
{
"field_name": "col_VARCHAR",
"domain_name": "FIRST_NAME",
"algorithm_name": "FirstNameLookup"
}
]
},
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"masking_inventory": [
{
"field_name": "COL_TIMESTAMP",
"domain_name": "DOB",
"algorithm_name": "DateShiftVariable",
"date_format": "yyyy-MM-dd HH:mm:ss.SSS" -->(optional field, this needs to be added only while working with date/time masking)
}
]
}
]
}
Response (With multiple tables):
{
"id": 1,
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"masking_inventory": [
{
"field_name": "col_VARCHAR",
"domain_name": "FIRST_NAME",
"algorithm_name": "FirstNameLookup"
}
]
},
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"masking_inventory": [
{
"field_name": "COL_TIMESTAMP",
"domain_name": "DOB",
"algorithm_name": "DateShiftVariable",
"date_format": "yyyy-MM-dd HH:mm:ss.SSS"
}
]
}
]
}
Note
Algorithm and Domain names to be provided in Data Set request should be used from Continuous Compliance Engine. The Continuous Compliance Engine APIs that could be used to get these names are:
-
Get all algorithms (GET /algorithms) for Algorithm Names. Sample Endpoint: https://maskingdocs.delphix.com/maskingApiEndpoints/5_1_15_maskingApiEndpoints.html#getAllAlgorithms
-
Get all domains (GET /domains) for Domain Names. Sample Endpoint: https://maskingdocs.delphix.com/maskingApiEndpoints/5_1_15_maskingApiEndpoints.html#getAllDomains
To check about extra parameters that need to be provided in the Data Set request for Date and Multi Column Algorithms, refer to Model DataSet_masking_inventory on Hyperscale Compliance API Documentation page available in API Reference section of this Documentation.
Jobs API
POST /jobs (Create a Hyperscale Compliance Job)
Request:
{
"name": "job_1",
"masking_engine_ids": [
1,2,3
],
"data_set_id": 1,
"app_name_prefix": "app_1",
"env_name_prefix": "env_1",
"retain_execution_data": “NO”,
"masking_job_config": {
"max_memory": 2048,
"min_memory": 1024,
"description": "Job created by Hyperscale Masking",
"feedback_size": 100000,
"stream_row_limit": 10000,
"num_input_streams": 1
}
}
Note
For more information on retain_execution_data
flag, see Cleaning Up Execution Data.
Response:
{
"id": 1,
"name": "job_1",
"masking_engine_ids": [
1,
2,
3
],
"data_set_id": 1,
"app_name_prefix": "app_1",
"env_name_prefix": "env_1",
"retain_execution_data": “NO”,
"masking_job_config": {
"feedback_size": "100000",
"min_memory": "1024",
"description": "Job created by Hyperscale Masking",
"stream_row_limit": "10000",
"max_memory": "2048",
"num_input_streams": "1"
}
}
JobExecution API
POST /executions (Create an execution of a Hyperscale Job)
Request:
{
"job_id": 1
}
Response: (Immediate response will be like below. Realtime response can be fetched using GET /executions/{execution_id} endpoint)
{
"id": 1,
"job_id": 1,
"status": "RUNNING",
"create_time": "2022-06-14T12:46:54.139452",
"tasks": [
{
"name": "Unload"
},
{
"name": "Masking"
},
{
"name": "Load"
},
{
"name": "Post Load"
}
]
}
GET /executions/{execution_id} (Returns the Job Execution by execution_id)
Request:
iD: 1
Response:
{
"id": 1,
"job_id": 1,
"status": "SUCCEEDED",
"create_time": "2022-06-10T11:58:39.385186",
"end_time": "2022-06-10T11:59:26.030750",
"tasks": [
{
"name": "Unload",
"status": "SUCCEEDED",
"start_time": "2022-06-10T11:58:39.401906",
"end_time": "2022-06-10T11:58:46.042788",
"metadata": [
{
"source_key": "SCHEMA_1_TARGET.TABLE_1_TARGET",
"unloaded_rows": 5,
"total_rows": 5
}
]
},
{
"name": "Masking",
"status": "SUCCEEDED",
"start_time": "2022-06-10T11:58:39.666638",
"end_time": "2022-06-10T11:59:16.034657",
"metadata": [
{
"source_key": "SCHEMA_1_TARGET.TABLE_1_TARGET",
"masked_rows": 5,
"total_rows": 5
}
]
},
{
"name": "Load",
"status": "SUCCEEDED",
"start_time": "2022-06-10T11:59:07.236429",
"end_time": "2022-06-10T11:59:16.064497",
"metadata": [
{
"source_key": "SCHEMA_1_TARGET.TABLE_1_TARGET",
"loaded_rows": 5,
"total_rows": 5
}
]
},
{
"name": "Post Load",
"status": "SUCCEEDED",
"start_time": "2022-06-10T11:59:16.072760",
"end_time": "2022-06-10T11:59:16.072760"
}
]
}
-
Only in case of execution failure, the below API can be used to restart the execution:
PUT /executions/{execution_id}/restart
(Restart a failed execution) -
Below API can be used only for manually cleaning up the execution:
DELETE /executions/{execution_id}
(Clean-up the execution)