Thursday, January 5, 2012

HDS Hitachi Content Platform HCP

Back in 2007, HDS purchased a small software company named Archivas. Archivas was formed by some ex-EMC folks. They set out to create a product that was better than EMC's Centerra archive platform. The product became known as the HDS HCAP platform after the HDS acquisition. It was specifically designed to be an archive platform for fixed content. It answered the need to comply with recent regulations requiring certain industries to retain immutable copies of their data. The HCAP platform provided client access via standard network attached file system APIs such as CIFS and NFS.
HDS realized that the market desired more than just a strict compliance platform. Decreasing storage cost in the SATA and nearline disk technologies provided the cost justification for organizations to utilize online archive/repositories rather than cutting tapes. With the maturation of virtualization technology, why purchase multiple silos or platforms. Hence, HDS removed the "A" from HCAP and created HCP.

HCP can still provide the strict compliance archive functionality of its predecessor, but the new version adds the capability to have virtual entities for sharing the platform across business units, groups and even allows for service provisioning. The virtual entities are comprised of two levels: Tenants and Name Spaces. The Tenants provide an administration domain for its Name Spaces and the Name Spaces provide data space for authenticated users.

The supreme user for the platform is the Service Console Administrator which manages physical attributes of the platform and creates Tenants. Upon Tenant creation, an Admin user is specified along with a quota of storage space. From there, the administrator for the tenant can create Name Spaces with quotas carved from the Tenant quota. The Tenant administrator manages Data Access Accounts which are used to authenticate users for file manipulation in the name space. The Data Access Accounts are the actual end user accounts for storing and retrieving data from the Name Space. Quotas and access rights are inherited from the parent and can be reduced by the administrator at each level. So each administration level can tweak the permissions that it allows for its children. Note the Service Console Administrator does not have access to data in the Name Spaces, so security is maintained within a Tenant.

Two different operating modes are available for Tenants. They can be run in a strict Compliance mode or the default Enterprise mode. Compliance mode offers similar functionality to the original HCAP. The Enterprise mode offers more flexibility in a repository type implementation. In Enterprise mode a "Privileged Delete" function can be passed down to allow a Tenant Administrator to delete files that are under retention. All activities are recorded and the privileged delete function requires that a description of the activity be entered into the log. And on that note, the log is never truncated or deleted. So a permanent record exists for all administrative activities.

In order for HDS to provide the virtual functionality, they had to make a tradeoff on the APIs that would be available. Traditional CIFS and NFS access are provided for the "Default Name Space," which mimics the initial HCAP implementation. As for the Tenant name spaces, access is limited to the REST API interface via HTTP or HTTPS. REST (Representational State Transfer) is a software architecture that defines communications between clients and servers via HTTP(S). HDS provides a few utilities with the platform that can migrate and copy data to HCP via the REST interface. Otherwise, you need an application that can communicate with HCP. Many popular backup and archive packages support communication with HCP. In addition, HDS provides a product named Hitachi Data Ingestor (HDI) which provides a traditional CIFS/NFS interface, but communicates to the HCP via HTTPS/REST. The HDI appliance serves as a cache and can be used for remote site to central site archiving.

So how does HCP store data and ensure its integrity? In general, two types of HCP appliances are available: HCP300 & HCP500. Each utilizes Hitachi's CR220 – 2U rackmount server. The HCP300 uses the internal drive bays on the server for the repository data and appliance boot. In contrast, the HCP500 has no internal disks and uses a Hitachi disk array for both boot and data repository storage. The HCP500 Appliance version comes with an AMS2000 series array and the diskless version relies on customer provided Hitachi storage. Of course, the HCP500 is a little more cumbersome to setup, but provides more scaling and flexibility. Both versions checksum the data and keep two copies of the checksum. In the event of a disagreement, the data or checksum are repaired if possible. In replication configurations (one HCP replicates to another local or remote) corrupt data blocks can be recovered/repaired from the remote HCP system.

In order to locate and manage data, HCP provides a search function via Search Nodes. Search Nodes can be added to the cluster as necessary to increase search index scalability. The search function allows a Data Access Account user with search permission to locate files using a variety of metadata attributes and perform subsequent operations on the search set.

As for scaling, the HCP300 scales horizontally by adding nodes, whereas, the HCP500 can scale by adding nodes or mapping additional storage to existing nodes. The HCP300 requires that two copies of all data be stored, whereas, the HCP500 requires only one copy due to RAID technology in the AMS2000 array. Up to four copies of data can be stored depending on user requirements. Both configurations are configured as "clusters." Failure of a single node in either configuration will allow continued data access. Even failure of multiple nodes without loss of data access is possible, unless it is the wrong two nodes. In the HCP300 most likely some data will be inaccessible for a multiple node failure situation. Unaffected data will remain available. In the case of the HCP500, multiple node failures can bring the entire HCP "cluster" down due to loss of quorum. It depends on the number of nodes in the configuration.

So, what about the CLOUD? As discussed previously, the HCP300 or HCP500 can provide a central repository for remote locations utilizing HDI as a gateway. In addition to being just a gateway, the Hitachi HDI appliance provides local caching, compression and encryption. When we put all the pieces together, we have an internal storage cloud that is secure, searchable and maintains necessary compliance attributes.

Share/Save/Bookmark

No comments:

Post a Comment