Hyperscale Compliance is an API-based interface that is designed to enhance the performance of masking large datasets. It allows you to achieve faster masking results using the existing Delphix Continuous Compliance offering without adding the complexity of configuring multiple jobs. Hyperscale Compliance first breaks the large and complex datasets into numerous modules and then orchestrates the masking jobs across multiple Continuous Compliance Engines. In general, datasets larger than 10 TB in size will see improved masking performance when run on the Hyperscale architecture.
Hyperscale Compliance Deployment Architecture
For achieving faster masking results, Hyperscale Compliance uses bulk import or export utilities of data sources. Using these utilities, it exports the data into smaller chunks of delimited files. Hyperscale Compliance engine then configures the masking jobs of all the respective chunks across multiple Continuous Compliance Engines. Upon successful completion of the masking jobs, the masked data is imported back into the database.
Hyperscale Compliance Components
The Hyperscale Compliance architecture consists of four components mainly; the Hyperscale Compliance Engine, Source/Target Connectors, the Continuous Compliance Engine Cluster, and the Staging Server.
Hyperscale Compliance Engine
The Hyperscale Compliance Engine is responsible for unloading the data from source and horizontally scaling the masking process by initiating multiple parallel masking jobs across nodes in the Continuous Compliance Engine cluster. Once data is masked, it loads it back to the target data sources. Depending on the number of nodes in the cluster, you can increase or decrease the total throughput of an individual masking job. In the case of relational databases as source and target data sources, it also handles the pre-load (disabling indexes, triggers and constraints) and post load (enabling indexes, triggers and constrainst) tasks like disabling and enabling indexes, triggers and constraints. Currently, the Hyperscale Compliance Engine supports the following two strategies to distribute the masking jobs across nodes available :
- Intelligent Load Balancing (Default): This strategy considers each Continuous Compliance Engine’s current capacity before assigning any masking jobs to the node Continuous Compliance Engines. It calculates the capacity using available resources on node Continuous Compliance Engines and already running masking jobs on the engines.
- Round Robin Load Balancing: This strategy simply distributes the masking jobs to all the node Continuous Compliance Engines using the round robin algorithm.
The Staging Area is where data from the SOR is unloaded to a series of files by the Hyperscale Compliance Engine. It can be a file system that supports NFS protocol. The file system can be attached to volumes, or it can be supplied via the Delphix Continuous Data Engine empty VDB feature. In either case, there must be enough storage available to hold the dataset in an uncompressed format. The staging area should be accessible by Continuous Compliance Engine cluster as well for masking.
Continuous Compliance Engine Cluster
The Continuous Compliance Engine Cluster is a group of Delphix Continuous Compliance Engines (version 22.214.171.124 and later) leveraged by the Hyperscale Compliance Engine to run large masking jobs in parallel. For installing and configuring the Continuous Compliance Engine procedures, see Continuous Compliance Documentation.
Source and Target Data Sources
The Hyperscale Compliance Engine is responsible for unloading data from the source datasource into a series of files located in the staging area. The Hyperscale Compliance Engine require network access to the source from the host running the Hyperscale Compliance Engine and credentials to run the appropriate unload commands. After files are masked, the masked data from the files get uploaded to the target datasource.
In case of Oracle, a failure in the load may leave the target datasource in an inconsistent state since the load step truncates the target when it begins. If the source and target datasource are configured to be the same datasource, a best practice is to restore the single datasource from a backup after a failure since the source datasource may be in an inconsistent state (rather than only the target datasource).
The Continuous Compliance Platform
Delphix Continuous Compliance is a multi-user, browser-based web application that provides complete, secure, and scalable software for your sensitive data discovery, masking, and tokenization needs while meeting enterprise-class infrastructure requirements. To read further about Continuous Compliance features and architecture, read the Continuous Compliance Documentation.