Cloud Architectures – Storage in the Cloud

roadchimp clouds

Brief

Cloud technology is deployed across a wide variety of industries and applications. The term ‘Cloud’ itself has become so widely prevalent that we’ve devised additional terms in an effort to describe what type of cloud we’re talking about. What’s your flavor? Iaas, Paas or Saas? Or perhaps it’s Public, Private or Hybrid?

Regardless of the type of cloud you’re using or planning to implement, there’s no denying that storage is an essential component of every cloud architecture that simply cannot be overlooked. In this post, we will look into some of the most common usages of storage in the cloud and peel back the layers to discover exactly what makes them tick. Our goal is to come up with a yardstick to measure storage design.

Drivers towards Cloud Storage adoption

dropbox box_logo

What do Dropbox and Box Inc have in common? Both companies are less than 5 years old, offer services predominantly centered around cloud storage and file sharing and have been able to  attract significant amounts of capital from investors. In fact, Dropbox raised $250 million at a $4 billion dollar valuation from investors with  Box Inc raising another $125 million in mid 2012. It looks like Silicon Valley sees Cloud Storage services as a key piece in the future of cloud. So why is there such a tremendous interest around cloud storage? Consumers are drawn to a number of benefits of using cloud:

  • Redundancy: Large clouds incorporate redundancy at every level. Your data is stored in multiple copies on multiple hard drives on multiple servers in multiple data centers in multiple locations (you get the picture).
  • Geographical Diversity: With a global audience and a global demand for your content, we can place data physically closer to consumers by storing it at facilities in their country or region. This dramatically reduces round trip latency, a common complaint for dull Internet performance.
  • Performance: Storage solutions in the cloud are designed to scale dramatically upwards to support events that may see thousands or millions more consumers accessing content over a short period of time. Many services provide guarantees in the form of data throughput and transfer.
  • Security & Privacy:  Cloud storage solutions incorporate sophisticated data lifecycle management and security features that enable companies to fulfill their compliance requirements. More and more cloud providers are also providing services that are HIPAA compliant.†
  • Cost: As clouds get larger, the per unit costs of storage go down, primarily due to Economies of Scale. Many service providers choose to pass on these cost savings to consumers as lower prices.
  • Flexibility: The pay as you use model takes away concerns for capacity planning and wastage of resources due to cyclical variations in usage.

It should be noted that a Draft opinion paper released by the EU Data Protection Working Party while not explicitly discouraging Cloud adoption, recommended that Public Sector agencies perform a thorough risk analysis prior to migrating to the cloud. You can read the report here.

Storage Applications for the Cloud

We’ve listed some of the most common applications for cloud storage in this section:

  • Backup: The cloud is perceived to be a viable replacement for traditional backup solutions, boasting greater redundancy and opportunities for cost savings. The Cloud backup market is hotly contested in both the consumer and enterprise markets.
    • In the consumer market, cloud backup services like Dropbox, Microsoft SkyDrive and Google Drive offer a service that takes part of your local hard drive and syncs them up with the cloud. The trend for these pay for use services are on the rise, with Dropbox hosting data for in excess of 100 million users within four years of launching their service.
    • In the Enterprise Space, Gartner’s magic quadrant for enterprise backup solutions featured several pureplay Cloud backup providers including Asigra, Acronis and i365. Even leading providers such as CommVault and IBM have launched cloud-based backup solutions. Amazon’s recently launched Glacier service provides a cost-effective backup tier for around $0.01 per gigabyte per month.
      01 Gartner Magic Quadrant
  • File Sharing: File sharing services allow users to post files online and then share the files to users using a combination of Web links or Apps.  Services like Mediafire, Dropbox and Box offer a basic cloud backup solution that provides collaboration and link sharing features. On the other end of the spectrum, full-blown collaboration suites such as Microsoft’s Office 365 and Google Apps feature real-time document editing and annotation services.
  • Data Synchronization: (between devices): Data synchronization providers such as Apple’s iCloud as well as a host of applications including the productivity app Evernote allow users to keep files  photos and even music synchronized across array of devices (Desktop, Phone, Tablet etc.) to automatically synchronize changes
    evernote
  • Content Distribution: Cloud content distribution network (CDN) services are large networks of servers that are distributed across datacenters over the internet. At one point or another, we’ve used CDNs such as Akamai to enhance our Web browsing experience. Cloud providers such as the Microsoft Windows Azure Content Distribution Network (CDN) and the Amazon CDN offer affordable CDN services for serving static files and images to even streaming media to global audience.
  • Enterprise Content Management Companies are gradually turning to the cloud to manage Organizational compliance requirements such as eDiscovery and Search. Vendors such as HP Autonomy and EMC provide services that feature secure encryption and de-duplication of data assets as well as data lifecycle management.
  • Cloud Application Storage: The trend towards hosting applications in the cloud is driving innovations in how  we consume and utilize storage. Leading the fray are large cloud services providers such as Amazon and Microsoft who have developed cloud storage services to meet specific applications needs.
    • Application Storage Services: Products like Amazon Simple Storage Service (S3) and Microsoft Windows Azure Storage Account support storage in a variety of formats (blob, queue and table data) and scaling to very large sizes (Up to 100TB volumes).  Storage services are redundant (at least 3 copies of each bit stored) and can be accessed directly via HTTP, XML or a number of other supported protocols. Storage services also support encryption on disk.
      02 Azure storage
    • Performance Enhanced Storage: Performance enhanced storage emulates storage running on a SAN and products like Amazon Elastic Block Storage provide persistent, block-level network attached storage that can be attached to virtual machines running and in cases VMs can even boot directly from these hosts. Users can allocate performance to these volumes in terms of IOPs.
    • Data Analytics Support: Innovative distributed file systems that support super-fast processing of data have been adapted to the cloud. For example, the Hadoop Distributed File System (HDFS) manages and replicates large blocks of data across a network of computing nodes, to facilitate the parallel processing of Big Data. The Cloud is uniquely positioned to serve this process, with the ability to provision thousands of nodes, perform compute processes on each node and then tear down the nodes rapidly, thus saving huge amounts of resources. Read how the NASA Mars Rover project used Hadoop on Amazon’s AWS cloud here.

Storage Architecture Basics

03 Generic Storage Architecture

So how do these cloud based services run? If we were to peek under the hood, we would see a basic architecture that is pretty similar to the diagram above. All storage architectures comprise of a number of layers that work together to provide users with a seamless storage service. The different layers of a cloud architecture are listed below:

  • Front End: This layer is exposed to end users and typically exposes APIs that allow access to the storage. A number of protocols are constantly being introduced to increase the supportability of cloud systems and include Web Service Front-ends using REST principles, file-based front ends and even iSCSI support. So for example, a user can use an App running on their desktop to perform basic functions such as  creating folders,  uploading and modifying files, as well as defining permissions and share data with other users. Examples of Access methods and sample providers are listed below:
    • REST APIs: REST or Representational State Transfer is a stateless Web Architecture model that is built upon communications between clients and servers. Microsoft Windows Azure storage and Amazon Web Services Simple Storage Service (S3)
    • File-based Protocols: Protocols such as NFS and CIFS are supported by vendors like Nirvanix, Cleversafe and Zetta*.
  • Middleware: The middleware or Storage Logic layer supports a number of functions including data deduplication and reduction; as well as the placement and replication of data across geographical regions.
  • Back End: The back end layer is where the actual physical hardware is implemented and we refer to read and write instructions in the Hardware Abstraction Layer.
  • Additional Layers: Depending on the purpose of the technology, there may be a number of additional layers
    • Management Layer: This may supporting scripting and reporting capabilities to enhance automation and provisioning of storage.
    • Backup Layer: The cloud back end layer can be exposed directly to API calls from Snapshot and Backup services. For example Amazon’s Elastic Block Store (EBS) service supports a incremental snapshot feature.
    • DR (Virtualization) Layer: DR service providers can attach storage to a Virtual hypervisor, enabling cloud storage data to be accessed by Virtual Hosts that are activated in a DR scenario. For example the i365 cloud storage service automates the process of converting backups of server snapshots into a virtual DR environment in minutes.

Conclusion:

This brief post provided a simple snapshot of cloud storage, it’s various uses as well as a number of common applications for storage in the cloud. If you’d like to read more, please visit some of the links provided below.

Roadchimp, signing out! Ook!

Reference:

* Research Paper on Cloud Storage Architectures here.
Read a Techcrunch article on the growth of Dropbox here.
Informationweek Article on Online Backup vs. Cloud Backup here.
Read more about IBM Cloud backup solutions here.
Read about Commvault Simpana cloud backup solutions.

Exchange 2013 Architecture Series – Part 3: Mailbox and Storage Design

Hello all, in this third section on Exchange 2013 Architecture, we will look into the Exchange storage subsystem and build around some best practices on Exchange 2013 design.

Exchange Storage Types

Before you start provisioning your Exchange Server 2013 environment, it’s useful to think about the different types of storage that a healthy Exchange environment uses. Each storage classification has its own unique performance requirements, and a well-architected Exchange storage architecture will be designed in order to support these needs.

  • Operating System and Page File Volumes: At the most fundamental level, all Exchange Servers run on top of an Operating System. In addition to storing the OS Kernel, the Operating System volume manages all I/O operations on the Exchange Server, as well as memory allocation and disk management.
  • Exchange Binaries: These volumes contain the actual application files that Exchange needs to run and follows the path: <Drive:>Program FilesMicrosoftExchange ServerV15. Microsoft requires at least 30GB of free space on the drive you wish to install the Exchange binaries on.
  • Exchange Database Volumes: These volumes store the actual Exchange Database files, which follow the format ‘.edb’. Enhancements in the Exchange Server database engine have resulted in reductions in disk resource requirements. However, database volumes should still be optimized for high performance and commonly use RAID striped volumes with parity to support high IOPS.
  • Exchange Database Log Volumes: Exchange 2013 uses a relational database technology that utilizes transaction logs to record changes to the Exchange Databases. The database log volumes are write intensive in nature.
  • Exchange Transport Database Volumes: Changes to how Exchange 2013 manages mailflow have resulted in the creation of several new features known as Shadow Redundancy and Safety Net. Read my previous post for more information on these new features. For Shadow Redundancy, the transport server makes a redundant copy of any messages it receives before it acknowledges successfully receiving the message back to the sending server. The Safety Net feature is an updated version of the Transport Dumpster and retains copies of retained messages in a database for a default of 2 days, via the SafetyNetHoldTime parameter. You should design your storage to accommodate two full days of additional e-mails within a high-availability boundary.
  • Backup/Restore Volumes: With database replication and resiliency features of Exchange now providing fault tolerance and high availability, backup and restore services are less crucial. However, they must be considered in the event of restoring historical or archived data. Most organizations consider less expensive storage types such as WORM (Write one, read many)

Storage hardware technologies

Exchange 2013 supports the following storage hardware technologies:

  • Serial ATA (SATA): SATA disks are cost effective, high capacity storage options that come in a variety of form factors. Microsoft recommends that you do not store your Exchange databases across a spanned volume comprising of multiple SATA drives.
  • Serial-attached SCSI: SCSI is a mature disk access technology that supports higher performance than SATA, but at a higher cost.
  • Fibre Channel (FC): Fibre Channel (note the spelling) support high performance and more complex configuration options such as connectivity to a SAN at a higher cost. FC disks are typically used in a collection of disks known as an Array and support the high-speed transmission of data (up to 16Gbps and 3200 MBps) and potentially require expensive fibre-channel infrastructure (known as switch-fabric) supporting single-mode or multi-mode fiber cables. The advantages of Fibre-channel are that the disks can be colocated a significant distance away from the actual servers (hundreds of meters up to Kilometers) without experiencing any loss in performance, which means that an organization can consolidate the disks used by numerous applications into one part of the datacenter and configure them optimally with high-redundancy features. This is the basic justification for a SAN.
  • Solid-state Drive (SSD): SSD drives use flash-type memory to store data and have a number of advantages of conventional SATA and SCSI based disk technologies which still employ rotary mechanical operations to access and write data on spinning platters. SSD drives are a relatively newer technology and currently support lower disk capacities but feature very high performance, boasting low disk access times (0.1ms compared to SCSI 4-12ms times). Due to their high cost, it is common for organizations to build servers with a combination of disk types, using SSD for Operating System partitions as well as volumes that benefit from write-heavy access, such as transaction log volumes for relational database systems.

Logical Storage Architectures

Exchange 2013 has changed how it addresses storage architecture from the ground up. Since the Extensible Storage Engine was rewritten via managed code and optimized for multiple threading, the storage design for Exchange has had to change as well to keep up. Microsoft provides the following recommendations with respect to the following storage architectures:

  • Direct/Locally Attached Storage (DAS): DAS storage architecture featured disks and arrays that are locally attached to the Exchange Server and are commonly supported by Microsoft Exchange 2013 and include Serial ATA (SATA) and SCSI hardware architectures.
  • Storage Area Networks (SANS): SAN architecture is fully supported in Microsoft Exchange 2013 both over Fibre and iSCSI interfaces.
    • Microsoft recommends that you allocate dedicated volumes (spindles) to Exchange Server and not share the same underlying physical disks with other applications on your SAN.  This recommendation is in support of ensuring that you have reliable and predictable storage performance for Exchange and not have to worry about resource contention and bottlenecks that may be common in poorly designed or managed SAN environments.
    • Fibre-channel over Ethernet (FCoE): Microsoft has yet to release any design information pertaining to whether Exchange 2013 supports implementations of FCoE. While technically, FCoE should have no issues supporting the latency requirements and frame sizes of an Exchange 2013 environment, I would recommend that you proceed with caution when implementing any Microsoft technology over a non-supported environment. In the event of a support incident, Microsoft support has the right to declare that your environment is non-supportable due to inconsistencies with their Hardware Compatibility list (HCL).
  • Network Attached Storage (NAS): At this point in time, Exchange Server 2013 does not suppor the use of NAS-based storage devices for either Physical or Virtual implementations of Exchange.

Allocating Storage

Let’s start with the basic storage requirements of Exchange Server 2013. Microsoft indicates that Exchange 2013 requires that you allocate the following amount of storage space to accommodate the following Storage Types:

  • Exchange System Volumes: At least 200 MB of available disk space on the system drive
  • Exchange binaries: At least 30GB of free space on the drive you wish to install the Exchange binaries on. With an additional 500 MB of available disk space for each Unified Messaging (UM) language pack that you plan to install
  • Exchange Database Volumes: Amount of storage required would vary depending on a number of factors including the number of and size of mailboxes in your organization, the mailbox throughput (average number of emails sent/received), high availability features (database copies), as well as email retention policies (how long you need to store emails). These factors will determine an optimal number of mailboxes per database, number of databases and database copies and finally the amount of storage allocated for growth.
  • Exchange Database Log Volumes: You should provision sufficient storage to handle the transaction log generation volume of your organization. Factors that affect the rate of transaction log generation include message throughput (number of sends/receives), size of message payloads and high-availability features such as database copies. If you plan to move mailboxes on a regular basis, or need to accomodate large numbers of mailbox migrations (import/export to Exchange), this will result in a higher number of transaction logs generated. If you implement lagged database copies, then you need to provision additional storage on the transaction log volume for the number of days of lag you have configured. The following requirements exist for log file truncation when lagged copies are configured:
    • The log file must be below the checkpoint for the database.
    • The log file must be older than ReplayLagTime + TruncationLagTime.
    • The log file must have been truncated on the active copy.
  • Message Queue Database: A hard disk that stores the message queue database on with at least 500 MB of free space.

Volume Configuration

Microsoft recommends the following configuration settings for each volume that hosts Exchange-related data:

  • Partitioning: GPT is a newer technology over traditional MBR partitioning formats to accommodate much larger disk sizes (up to 256 TB), While MBR is supported, Microsoft Recommends that you use GPT partitions to deploy Exchange. Partitions should also be  aligned to 1MB.
  • File System: NTFS is the only supported file system type supported by Exchange 2013. Microsoft recommends an optimal allocation unit size of 64KB for both Exchange Database and Log File volumes. NTFS features such as NTFS Compression, Defragmentation and Encrypted File System (EFS) are not supported by Exchange 2013.
  • Windows BitLocker: BitLocker is a form of Drive Encryption supported in newer versions of Microsoft Windows. BitLocker is supported for all Exchange Database and Log File volumes. However, there is limited supportability for Windows BitLocker on Windows Failover Clusters. Link here.
  • SMB 3.0: SMB is a Network File Sharing Protocol over TCP/IP with the latest version available in Windows Server 2012. SMB 3.0 is only supported in limited deployment configurations of Exchange 2013, where fixed virtual hard disks (VHDs) are provisioned via sMB 3.0 only in the Windows Server 2012 Hyper-V or later version. Direct storage of Exchange data is not supported on SMB. Read this article.

Storage Best Practices

The following best practices offer useful guidelines on how to configure your Exchange 2013 Environment

  • Large Disk Sector Sizes: With ever-increasing disk capacities in excess of 3TB, hardware manufacturers have introduced a new physical media format known as Advanced Format that increases physical sector sizes. Recent versions of the Windows Operating System, including Windows Vista, Windows 7 and Windows Servers 2008, 2008 R2 and 2012 with certain patches applied (link) support a form of logical emulation known as 512e which presents a logical sector size of 512k, whereas the physical hardware can actually read or write to a larger sector size, known as atomicity.
  • Replication homogeneity: Microsoft does not support Exchange databases copies that are stored across different disk types. For example, if you store one copy of an Exchange database on a 512-byte sector disk, you should not deploy database copies to volumes on another Exchange server that are configured with a different sector size (commonly 4KB).
  • Storage Infrastructure Redundancy: If you choose externally attached storage such as a SAN solution, Microsoft recommends that you implement multi-pathing or other forms of path redundancy in order to ensure that the Exchange  Server’s access to the storage networks remains resilient to single points of failure in the hardware and interconnecting infrastructure (Ethernet for iSCSI and Fibrechannel for conventional SCSI-based SANs).
  • Drive Fault Tolerance: While RAID is not a requirement for Exchange 2013, Microsoft still recommends RAID deployments especially for Stand-alone Exchange servers. The following RAID configurations are recommended based on the type of storage:
    • OS/System or Pagefile Volume: RAID 1/10 is recommended, with dedicated LUNs being provisioned for the System and Page file volumes.
    • Exchange Database Volume: For standalone (non-replicating) Servers, Microsoft recommends deploying a RAID 5 with a maximum array size of 7 disks and surface scanning enabled. For larger array sizes, Microsoft recommends deploying RAID 6 (5+1) for added redundancy. For high-availability configurations with database replication, redundancy is provided by deploying more than one copy of any single Exchange Database, therefore Microsoft recommends less stringent hardware redundancy requirements. You should have at least 2 or more lagged copies of each database residing on separate servers and if you can deploy three or more database copies, you should have sufficient database redundancy to rely on JBOD (Just a Bunch of Disks) storage. The same recommendations for RAID 5/6 apply for high-availability configurations. In both cases of standalone and high-availability configurations that use slower disk speeds (5400 – 7200 rpm), Microsoft recommends deploying disks in a RAID 1/10 for better performance.
    • Exchange Mailbox Log Volumes: For all implementations, Microsoft supports all RAID types, but recommends RAID 1/10 as a best practice, with JBOD storage only being used if you have at least three or more replica copies of an Exchange Database. If deploying lagged database copies, you should implement JBOD storage only if you have two or more copies of a database.
    • RAID configuration parameters: Furthermore, Microsoft recommends that any RAID array be configured with a block size of 256KB or greater and with storage array cache settings configured for 75% write cache and 25% read cache. Physical disk write caching should be disabled when a UPS is not in use.
  • Database volume size and placement: Microsoft Exchange Server 2013 supports database sizes of up to 16TB, however for optimal database seeding, replication and restore operations, Microsoft recommends that you limit each database size to 200GB and provision the size of each volume to accommodate a 120% of the maximum database size.
  • Transaction Log volume size and placement: Microsoft recommends implementing Database and Log file isolation by deploying your database files and transaction log files in separate volumes. This is actually a good practice, since the performance requirements of databases and transaction logs differ.
  • Basic and Dynamic Disk types: The Windows Operating System supports a form of  disk initialization known as Dynamic Disks, which allows you to configure options such as software-based RAID and dynamic volume sizes. While Dynamic Disks are a supported storage type in Exchange 2013, Microsoft recommends that you deploy Exchange on the default Basic Disk storage.

Conclusion:

In this section, we explored the various storage components supported by Microsoft Exchange Server 2013 and reviewed some deployment best practices for implementing Exchange. There are a number of different types of storage that the various components of Exchange utilize and a well architected storage solution should seek to optimize performance of these various components.

Reference:

Link to Microsoft Technet article on Storage Configuration Options.

Article on MSDN site on Windows support for Advanced Format disk types

Link to Microsoft Technet article on Virtualization Support