With the proliferation and easy availability of cloud computing resources it is inevitable that lawyers and legal IT departments will eventually start using clouds. It is also inevitable that they will ask many questions about cloud security. This paper deals with general cloud security, as well as with specific security requirements that lawyers and legal/IT departments of the corporation usually deal with.
We specifically discuss Amazon Web Services (AWS) and the SHMcloud (open source solution for eDiscovery) from SHMsoft. At the end we provide the Q&A section.
What is a “cloud?”
We will define cloud computing as elastic computing resources. A perfect example of this definition is the Elastic Compute Cloud (EC2) provided by the Amazon Web Services (AWS). Its characteristics are:
- Easy availability of computing resources: You can provision any amount of storage and any number of computing instances within the matter of minutes;
- Self-service: you have the ability to manipulate your computing resources through the browser in your AWS account, or programmatically, without the need of the formal provisioning process, phone calls, and human involvement;
- Pay-as-you go: you pay only for the computing resources you use, while you use them.
Since AWS is one of the most prevalent providers, and since SHMcloud is primarily implemented on EC2, we will limit our discussion to these two services. However, the general approach, conclusions, and recommendation are applicable to other cloud services.
Cloud security today
The IT departments of the corporations and law firms used to be in complete control of their web-accessible applications, or at least the responsibility was lying with them. Now the situation is changing. With the availability of cloud resources, and with multiple reasons driving cloud adoption, these same people often have to rely on the security furnished by the cloud provider.
A recent Forbes article, Data Privacy And The Cloud: Fact Versus Fiction, highlights the growing understanding and adoption of the cloud in general.
Amazon’s EC2 - the one we concentrate upon here - sets very high standards for security protection. It also recommends, and in some cases mandates the best security practices. However, it is in large measure up to the users to make sure that they use the EC2-provided security correctly. In case of applications built on top of EC2, the users also need to verify that the applications designers have implemented the Amazon’s guidelines. If any additional security measures have been used, they need to be documented, and in some cases subject to an audit.
Components of an EC2 application
There are three components in an EC2 application: the Amazon Machine Image (AMI) that you run, in one or multiple copies; security key pairs; and security groups.
AMI is what you run, and here what is the important is the source where you get your AMI. If you use community-provided AMI, it falls on you to check for the backdoors that may have been left by the creators. You would need to delete credentials, remove certificate and key material. About 30-40% of all the community AMIs have some form of backdoors left behind, most likely by oversight, but it does not matter how this happened, the AMI that you run must be clean.
If you are using a Marketplace AMI, then you can be sure that the Amazon Marketplace team has taken these steps before placing the AMI into general availalbity through the Marketplace. Thus, for example, all of the AMIs used by SHMcloud are guaranteed by Amazon to be secure in this sense, and free of backdoors.
Key pair is the pair of private/public keys. The public part is stored on Amazon, and it allows you to access your AMI’s, provided that it agrees with the private key that is stored on your machine. The recommended practice here is for every person to generate and use his/her own key pair. This generation is easy with Amazon, but doing this right has multiple advantage: better security, better accountability, and easier reassignment of rights, when a person leaves the company.
Security group can be viewed as a firewall to the application. It lists the user(you)-defined access rules for ingress and egress. The security group includes its name, protocol (CTP, UDP)
to and from ports, and the source (where the traffic is coming from).
There can be many security group (up to 500), and in addition there are some known problems with them, such as the use of memcache server. The best way to deal with this is to have a person responsible for security group setup, and auditor, and to use automation to verify the common problems with the security group, such as the Scout tool, https://github.com/iSECPartners/scout. Whatever the method, you need to highlight potentially dangerous security groups, and compare what it should be to what it is really there.
Simple Storage Service - S3
S3 is the storage part of the AWS in general, and of SHMcloud in particular. It has the following three security mechanisms:
- ACL, or Access Control List;
- Bucket policy; and
- IAM (Identity Access Management) policy.
Let’s look at each one separately.
ACL’s work together with bucket and IAM security.
With bucket-level permissions, one can have fine-grained permissions, on the level of specific objects. One can also use more granular bucket policies: such as specific actions and conditions. For example, one can enforce permissions based on object size.
IAM, or Identity Access Management, is the way to have multiple users for the same Amazon AWS account. The best practice of using it is outlined below:
- Create IAM principal, attach IAM policies to it;
- Create departments and buckets, set policies;
- Attach users to departments.
Since IAM includes Identity federation, it is a convenient and powerful for users and groups permissions and controls.
Access Logs allow you to verify how the data is being access. They can be used for security auditing.
IAM allows for coarse-grained permission: read, write, etc. The grantee can be a human user, or a system user (software agent).
Encryption is another layer for the security protection of your data. The Amazon AMI images are already encrypted, to guarantee that Amazon’s employees do not have access to your data. However, a targeted hacking attempt or a human glitch to lead to data exposure. To mitigate this risk, sensitive data should additionally encrypted.
There are two ways of data encryption, client side and server side.
With server-side encryption, Amazon manages keys with AES-256. This is more convenient to implement. In this scheme, objects are encrypted, not buckets. Furthermore, there is no need to manage keys, and risk is transferred to AWS services.
With client-side encryption, you manage the keys, using AWS SDK. There is additional implementation load, but one has even greater flexibility. Also, the chain of custody includes only the known elements and excludes the third party of AWS.
With encryption, as with all other levels of security, it is a recommended practice to use automated tools (such as Scout) should verify and enforce encryption.
Questions often asked by lawyers
As we have seen, AWS platform has all the necessary elements to implement the best practices of security in web-base applications. SHMcloud, an eDiscovery system based on AWS, implements all of these best practices.
Of course, SHMcloud can be deployed internally, in a hosted or internal computing center, and then it will carry over all of its security practices to this implementation.
In addition, below is a list of questions that the law practitioners usually ask about cloud-based eDiscovery, as it relates to specific areas of legal responsibility and various geographical jurisdictions. We have also include some questions related to price/benefits analysis.
How well is my data protected against accidental loss?
S3 stores all its data with the replication factor of 3. Currently S3 stored one billion of new objects daily. Yet, since the beginning of its operation in 2002, AWS does not have a documented case of customer data loss. Given the public nature of all outages, this is a remarkable record.
Some lawsuit case require storing data for years, can S3 accommodate this?
S3 has no time limit on data storage. In addition, you can implement selective backup for the important information.
The price of 20 cents / month/ GB of data can be quite high. Is there a way to mitigate this cost?
Amazon Glacier store the data at 1/20 of this cost, at 1 cent/month/GB. You can think of it as inexpensive long-term backup.
What about human errors, such as deletions?
There is no complete protection against human errors, but best practices help mitigate this risk. These include storing multiple copies of the important information, some of with read-only permissions for all but the project administrator.
How can I make sure that my data is indeed deleted after the necessary retention period is finished?
There are multiple measures that you can take
- Delete the data from S3, shut down your EC2 instances. This has the effect of erasing the data from your hard drive, only more so. In the case of the local hard drive, there is undelete and forensics restore. By contrast, in the case of S3 and EC2, there data is encrypted by Amazon in the first place, so now it is essentially gone;
- If on top of that you used your own encryption, the data is unrecoverable;
- You can overwrite your data with another, bogus data. This is not needed, but some people like to run PC Eraser type program a number of times, for their comfort, and this has about the same effect here.
Some jurisdictions, such as the European Union, impose the data locality requirement, such as that the data should never leave the European region, for example. Can this be accommodated by AWS, and by extension, by SHMcloud?
Amazon AWS provides “Regions” for this exact purpose. Data deployed in one region (such as Ireland for Europe) is guaranteed to never leave the particular computing center. In addition to satisfying the legal requirements, regions provide for better application latency and responsiveness.
SHMcloud takes full advantage of this Amazon Regions, and it has AMI instances that can be deployed in any Amazon region. Below is a screenshot showing SHMcloud instances and their regions.