Cloud Architecture – Serving Static Content

Introduction

A number of cloud hosting providers provide optimized static content delivery services, such as Amazon AWS Cloudfront and Microsoft Windows Azure CDN (Content delivery network). In this article, we will explore the elements of a scalable infrastructure that can be used to deliver static content at high capacity peak loads and build a test platform that we can perform load testing and benchmark performance.

Scenario

Let’s assume that a fictitious company Chimpcorp wants to offload the serving of static files from it’s primary website and stream this data to partners and customers around the world over the Internet. They want a cost-effective, yet scalable solution that can allow large numbers of users to download the files at the same time. Furthermore, the data must be available at all times and resilient to failure and hacking attempts. Our task is to build a web infrastructure to handle this. We can summarize the goals below:

  • Serve static content
  • Cost effective
  • Low latency Scalable
  • Fault Tolerant
  • World-wide access

We have also developed a list of assumptions and verified them with Chimpcorp in order to narrow down our design:

  • The content consists of files approximately 5 KB in size
  • Content is static and will not change frequently
  • Servers must be able to host more than 10,000 simultaneous connections
  • All users will connect to a public Internet website (download.chimpcorp.com)
  • All users must access the content via HTTP
  • All users access identical content
  • There is no requirement to track user sessions
  • The servers must be secured to prevent tampering of data

Deconstructing the problem

Our first step is to break down the entire conversation between the end user and the web server serving up the content. The conversation can be described as follows:

  1. User launches a browser from their device
  2. User types in a URL into their browser
  3. The browser checks its cache; if requested object is in cache and is fresh, skip to #13
  4. Browser asks OS for server’s IP address corresponding to the DNS name in the URL
  5. OS verifies that the DNS name is not already cached in it’s host file
  6. OS makes a DNS lookup and provides the corresponding IP address to the browser
  7. Browser opens a TCP connection to server (Socket to Port 80)
  8. HTTP traffic traverses the Internet to the server
  9. Browser sends a HTTP GET request through TCP connection
  10. Server looks up required resource (if it exists) and responds using the HTTP protocol
  11. Browser receives HTTP response
  12. After sending the response the server closes the socket
  13. Browser checks if the response is a 2xx (200 OK) or redirect (3xx result status codes), authorization request (401), error (4xx and 5xx), etc. and handles the request accordingly
  14. If cacheable, response is stored in cache
  15. Browser decodes response (e.g. if it’s gzipped)
  16. Browser determines what to do with response (e.g. is it a HTML page, is it an image, is it a sound clip?)
  17. Browser renders response, or offers a download dialog for unrecognized types
  18. User views the data .. and so on

Evaluating components and areas for optimization

Our next step is to analyze the individual elements in the conversation for opportunities for optimization and scaling.

  1. End users’ browser/bandwidth/ISP – Beside the fact that users must access our servers via HTTP over the Internet, we do not have any control over the type and version of browser, the quality and bandwidth of ISP service, or the device that the user accesses the content from.
  2. DNS Lookup – It take approximately 20-120 milliseconds for DNS to resolve an IP Address. Users connecting from around the world can either use Geo-aware redirection or Anycast DNS for smart resolution of IP addresses to a web host close to the server.
  3. Server Location – As users will be accessing the host from locations around the world, the servers should be co-located close to where the users are in order to reduce round trip times. We can use Geo-aware DNS to relay users to servers that are located in their geographical region.
  4. TCP Session Parameters – As we are serving small static content over our website, we can analyze the specific parameters of the TCP session in order to identify potential areas for optimization. Examples of TCP parameters are listed below
    1. TCP Port Range
    2. TCP Keepalive Time
    3. TCP Recycle and Reuse times
    4. TCP Frame Header and Buffer sizes
    5. TCP Window Scaling
  5. TCP Header Expiration/Caching – We can set an expiry header to expire far into the future to reduce page loads for static content that does not change. We can also use Cache control headers specified in the HTTP 1.1 standard to tweak caching.
  6. HTTP Server Selection – With the wide variety of HTTP Servers available in the market, our optimal selection should take into account the stated project objectives. We should be looking for a Web Server that can efficiently server static content to large numbers of users, be able to scale out and have some degree of customization for effectiveness.
  7. Server Resource Allocation – Upon our selection of the appropriate Web server, we can select the appropriate hardware setup, bearing in mind specific performance bottlenecks for our chosen HTTP server, such as Disk I/O, Memory allocation and Web caching.
  8. Optimizing Content – We can optimize how content is presented to users. For example, compressed files take less time to be downloaded from the server to the end user and image files should can be optimized and scaled accordingly.
  9. Content Offloading – Javascripts, images, CSS and static files can be offloaded to Content Delivery Networks. For this scenario, we will rely on our web servers to host this data.
  10. Dynamic Scaling – Depending on the load characteristics of our server, we should find a solution to rapidly scale out our web performance either horizontally (adding nodes) or vertically (adding resources).

Design Parameters

Our next stage is to compile the analysis into tangible design parameters that will shape our final design. The design components and related attributes are listed as follows:

  • Server Hosting Platform: A global cloud services provider is a cost efficient way to deploy identical server farms around the world.
  • DNS Hosting: A highly available DNs forwarding solution that incorporates anycast DNS is preferrable to a GEO-aware resolution service.
  • HTTP Server Selection: An Nginx web server configured on an optimized Linux platform will provide highly-scalable and resource efficient platform for our task. The primary advantage of Nginx over other more popular Web Server technologies such as Apache is that Nginx doesn’t spawn a new thread for each incoming connection. Existing worker processes accept new requests from a shared listen socket. Nginx is also widely supported and has an active user community and support base.

Nginx Optimization

The following parameters were used to optimize the Nginx server deployment

  1. CPU and Memory Utilization – Nginx is already very efficient with how it utilizes CPU and Memory. However, we can tweak several parameters based on the  type of workload that we plan to serve. As we are primarily serving static files, we expect our workload profile to be less CPU intensive and more disk-process oriented.
    1. Worker_processes –  We can configure the number of single-threaded Worker processes to be 1.5 to 2 x the number of CPU cores to take advantage of Disk bandwidth (IOPs).
    2. Worker_connections – We can define how many connections each worker can handle. We can start with a value of 1024 and tweak our figures based on results for optimal performance. The ulimit -n command gives us the numerical figure that we can use to define the number of worker_connections.
    3. SSL Processing – SSL processing in Nginx is fairly processor hungry and if your site serves pages via SSL, then you need to evaluate the Worker_process/CPU ratios. You can also turn off Diffie-Hellman cryptography and move to a quicker cipher if you’re not subject to PCI standards. (Examples: ssl_ciphers RC4:HIGH:!aNULL:!MD5:!kEDH;)
  2. Disk Performance – To minimize IO bottlenecks on the Disk subsystem, we can tweak Nginx to minimize disk writes and ensure that Nginx does not resort to on-disk files due to memory limitations.
    1. Buffer Sizes – Buffer size defines how much data we can store in the host. A buffer size that is too low will result in Nginx having to upstream responses on disk, which introduces additional latency due to disk read/write IO response times.
      1. client_body_buffer_size: The directive specifies the client request body buffer size, used to handle POST data. If the request body is more than the buffer, then the entire request body or some part is written in a temporary file.
      2. client_header_buffer_size: Directive sets the headerbuffer size for the request header from client. For the overwhelming majority of requests it is completely sufficient to have a buffer size of 1K.
      3. client_max_body_size: Directive assigns the maximum accepted body size of client request, indicated by the line Content-Length in the header of request. If size is greater the given one, then the client gets the error “Request Entity Too Large” (413).
      4. large_client_header_buffers: Directive assigns the maximum number and size of buffers for large headers to read from client request. The request line can not be bigger than the size of one buffer, if the client sends a bigger header nginx returns error “Request URI too large” (414). The longest header line of request also must be not more than the size of one buffer, otherwise the client get the error “Bad request” (400). These parameters should be configured as follows:
        client_body_buffer_size 8K;
        client_header_buffer_size 1k;
        client_max_body_size 2m;
        large_client_header_buffers 2 1k;
    2. Access/Error Logging – Access Logs record every request for a file and quickly consume valuable disk I/O. Error logs should not be set too Low unless it is our intention to capture every single HTTP error. A warm level of logging is sufficient for most production environments. We can configure Logs to store data in chunks, defining chunk sizes in (8KB, 32KB, 128KB)
    3. Open File Cache – The open file cache directive stores Open file descriptors, including information of the file, location and size.
    4. OS File Caching – We can define parameters around the size of the cache used by the underlying server OS to cache frequently accessed disk sectors. Caching the web server content will reduce or even eliminate disk I/O.
  3. Network I/O and latency – There are several parameters that we can tweak in order to optimize how efficiently the server can manage a given amount of network bandwidth due to peak loads.
    1. Time outs – Timeouts determine how long the server maintains a connection and should be configured optimally to conserve resources on the server.
      1. client_body_timeout: Directive sets the read timeout for the request body from client. The timeout is set only if the body is not obtained in one read step. If after this time the client send nothing, nginx returns error “Request time out” (408).
      2. client_header_timeout: Directive assigns timeout with reading of the title of the request of client. The timeout is set only if a header is not obtained in one readstep. If after this time the client send nothing, nginx returns error “Request time out” (408).
      3. keepalive_timeout: The first parameter assigns the timeout for keep-alive connections with the client. The server will close connections after this time. The optional second parameter assigns the time value in the header Keep-Alive: timeout=time of the response. This header can convince some browsers to close the connection, so that the server does not have to. Without this parameter, nginx does not send a Keep-Alive header (though this is not what makes a connection “keep-alive”).  The author of Nginx claims that 10,000 idle connections will use only 2.5 MB of memory
      4. send_timeout: Directive assigns response timeout to client. Timeout is established not on entire transfer of answer, but only between two operations of reading, if after this time client will take nothing, then nginx is shutting down the connection.

        These parameters should be configured as follows:

        client_body_timeout 10; client_header_timeout 10; keepalive_timeout 15; send_timeout 10;

    2. Data compression – We can use Gzip to compress our static data, reducing the size of the TCP packet payloads that will need to traverse the web to get to the client computer. Furthermore, this also reduces CPU load when serving large file sizes. The Nginx HTTP Static Module should be used gzip on; gzip_static on;
    3. TCP Session parameters – The TCP_* parameters of Nginx
      1. TCP Maximum Segment Lifetime (MSL) – The MSL defines how long the server should wait for stray packets after closing a connection and this value is set to 60 by default on a Linux server.
    4. Increase System Limits – Specific parameters such as the number of open file parameters and the number of available ports to serve connections can be increased.

Solution Design A: Amazon AWS

The design parameters in the section above were used to build our scalable web solution. The components of our solution are as follows:

  • Amazon EC2 AMIs: Elastic Compute Cloud will be used to host our server farms. Nginx offers a fully supported AMI instance in EC2 that we can tweak to further optimize performance to suit our needs. This AMI is readily deployable from the AWS marketplace and includes support from Nginx Software Inc. We will deploy Nginx on a High-CPU Medium Instance featuring the following build specs:
    • 1.7 GB RAM
    • 5 EC2 Compute Units (2 Virtual Cores)
    • 350 GB instance storage
    • 64-bit architecture
    • Moderate I/O performance
  • Elastic IPs: Amazon provides an Elastic IP Service that allows us to associate a static Public IP Address to our virtual host.
  • Amazon Route 53: This scalable DNS service allows us to implement an Anycast DNS solution for resolving hosts to an environment that is closest to them.
  • Additional Options: A number of automation and deployment tools were utilized to enhance the efficiency of the environment:
    • EC2 Command Line tools
    • Automated deployment and lifecycle management via Chef
    • Development testing via Vagrant
    • Centralized code repository and version control via Github
  • Solution VariantsThe following design variants were introduced in order to benchmark our original build selection against alternative deployment scenarios. The scenarios are as follows:
    • Nginx Systems AMI
    • CentOS optimized Nginx with PHP-FPM
    • Apache Web Server on Ubuntu

Measuring our solution’s effectiveness Here we define a number of simple measures that we can use to benchmark the effectiveness of our solution

  • Cost per user – defined as the cost of the solution divided by the number of users within a given period, this measures the cost effectiveness of the solution
  • Server Connectivity Metrics – These metrics are relevant to Web Server performance
    • Number of Requests per Second
    • Number of Connections per Second
    • Average/Min/Max Response rates
    • Response Times (ms)
  • System Performance Metrics – these metrics relate specifically to system performance
    • CPU/Memory Load

Testing our design

We provisioned an M2.Medium tier EC2 instance in the same availability zone to test host level performance without the complications of network latency between test host and server. We ran tests at increasing levels of concurrency and the number of requests per second. We used the following tools to test the performance of our solution in relation to our test scenario:

Test Results:

Baseline (Nginx AMI):  Httperf tests returned 1.1ms reply times for linearly increasing loads up to 30,000 connections/ second over large sample sizes. The host started to display increasing standard deviations closer to 30,000 simultaneous connections, indicating potential saturation. Memory usage on the host remained stable at around 17 MB even during peak loads.

Optimized CentOS (Nginx AMI): Httperf tests returned similar response times and response rates as the baseline host, up to 30,000 connections/second ( >1000 concurrent connections), however results showed higher consistency over large samples and lower standard deviation.

Apache (Ubuntu Host): Httperf tests returned 100+ms response times for linearly increasing loads up to 10,000 connections/second, quickly saturating at 6,000 connections/sec. Each httpd instance occupied approximately 100MB of memory and quickly consumed the majority of system resources on the 1.7GB RAM virtual host.

Baseline

Fig: Nginx Optimized Performance

Conclusions:

Overall performance on the Nginx platform for delivering static content (HTML files) was far superior to Apache. Nginx performance out-of-the-box is respectable and with further customization of settings, can provide highly optimal and scalable results.

Recommendations:

In order to further increase the performance of the solution, we propose the following recommendations:

  • Increase sample sizes – Due to time constraints, the number of repetitions run for load testing was low. For real-life production environments. we recommend running httperf in a wrapper like ab.c over a larger number of repetitions (>1000) at varying load factors in order to build a respectable sample. At this point, trends will be easier to identify.
  • Implement in-memory caching – not currently supported natively in Nginx
  • Implement Elastic Load Balancing  – ELB has been load tested to over 20k simultaneous connections
  • Migrate Static Content to Cloud Front – While Nginx can provide superior performance, it’s is most popularly deployed as a reverse proxy to offload static content from dynamic code like PHP. Amazon’s Cloud Front is optimized to provide superior and scalable web content delivery that can synchronize across multiple locations.
  • Cost Management – If costs are a concern, we can certainly move to the open source Nginx solution in comparison to the provisioned AMI instance and save on support and licensing costs.

References:

http://stackoverflow.com/questions/2092527/what-happens-when-you-type-in-a-url-in-browser http://www.hongkiat.com/blog/ultimate-guide-to-web-optimization-tips-best-practices/ http://yuiblog.com/blog/2007/04/11/performance-research-part-4/
http://geekexplains.blogspot.co.uk/2008/06/whats-web-server-how-does-web-server.html http://nbonvin.wordpress.com/2011/03/24/serving-small-static-files-which-server-to-use/ Overview of Nginx architecture Implement Nginx in AWS Httperf command reference Performance  testing
http://gwan.com/en_apachebench_httperf.html

Exchange 2013 Sample Architecture Part 4: Mailbox Server Role Design

Overview

The Mailbox Server role primarily serves the purpose of hosting mailbox databases and providing access to the mailboxes contained within. In this post, we will focus on several key elements of Mailbox Server Design:

  • Storage Design: Not only must we plan and allocate sufficient storage capacity to ensure that the environment can scale to support the increase in size and number of mailboxes in use, we also need to pay careful attention to the type of storage infrastructure we decide to use, be it Direct Attached Storage (DAS) in each server or disks consolidated in a SAN or other storage appliance. We also need to consider the type of disk technology we adopt, such as SCSI, SATA, Fibre Channel (FC) or even Solid State Drives (SSD). Each decision point has the potential to impact performance, cost, complexity and a number of other factors in our overall design.
  • Fault Tolerant Design: Users need constant access to their mailboxes and to be able to send and receive messages. A crucial element of a resilient and fault tolerant messaging system depends on it’s ability to survive failures at multiple levels, including logical or physical corruption of a mailbox, loss of a database, a server and even an entire site. We will focus on Server build, quantity and placement, DAG design and other availability features.

Before we start

There are a couple of considerations that I want to point out before you try applying this design towards your implementation. I’ve listed them below:

  • Timing: I’m writing this post at a time when Exchange 2013 hasn’t seen major deployment across the industry yet. Also, there aren’t many tools out there at this time to assist in calculating buil sizes for Exchange 2013, which we anticipate will be released in the coming months by Microsoft.
  • Exchange 2010 reference: I will be using Microsoft’s Exchange 2010 Mailbox Server Role Requirements calculator tool to build this design. Performance improvements in Exchange 2013 over it’s previous version mean that the calculator will give me exaggerated values for a number of key metrics such as disk IOPS (a measure of the speed of data transfer ) and spindle allocation (the number of physical disk drives that we want to spread across our blog). This isn’t a bad thing, it just means that our design will probably call for more disks and storage that would probably be required which we can think of as a buffer. I update this post when the new calculator comes out. But in the mean time, you should use the version which can be downloaded here.
  • Virtualization: As stated in the previous post, we will be deploying out Mailbox Servers in a virtual environment. This is an important consideration because we will need to consider additional resource overhead for the Virtual Hypervisors.

Design Parameters

Prior to assembling our design, we need to identify what possible constraints there may be. Constraints could include budget which will affect the number of servers and size and type of disks used, or available bandwidth for replication between datacenters. The following information has been collected from the existing messaging environment as well as high level design specifications for the Exchange 2013 implementation.

Table 3

With a total of 9000 mailboxes, designing a fault tolerant system woud require further input from the business, including existing SLAs and any key performance indicators (KPIs) that are relevant to the business. The following parameters have been collected after a number of interviews with key stakeholders:

Table 4

Sizing Assumptions

With our basic design parameters in place, we are ready to begin assembling our design. We commence at the mailbox level and work our way upwards, looking at the database, server, DAG and then site level.

1. Mailbox Design

The basic component of our design comprises of a mailbox. With a mailbox size of 5GB, we need to factor in additional overheads and also accommodate for growth. The actual mailbox size on disk is calculated by the following formula:

Mailbox Size = Mailbox Size Limit + Whitespace + Dumpster Size

Where,

Mailbox Size Limit = 5120 MB

Whitespace = 100 messages per day X 100/1024 MB = 9.7MB
Dumpster Size = (100 messages per day X 100/1024 MB X 30 days) = 293 MB

Total Mailbox Size = 5120 + 9.7 + 293 = 5423 MB or 5.3GB

2. Database Design

The mailbox database design needs to meet the performance requirements of Chimp Corp while providing high availability for mailbox data and fault tolerant capabilities should the need arise. Exchange Server 2013 allows multiple copies of mailbox databases to be distributed across any of the servers within a Database Availability Group (DAG). A component of Exchange 2013 called the Active Manager determines the active mailbox database out of a set of copies and if required initiates the failover of the database copy to another server.

The Mailbox Database size has been restricted to units of 1TB based on the amount of time it would take for reseeding of a restored database. Given that the average on-disk mailbox size is 5.3GB the ideal allocation of mailboxes per database must be calculated, factoring overheads and growth.

The Maximum Database Size can be used to derive the optimal number of mailboxes as follows:

Maximum Database Size = (Number of Mailboxes * Ave Mailbox Size) + 20% Overhead

1TB = (Number of Mailboxes * 5.3GB) * 1.2

Solving above, this gives us the number of mailboxes per database = 160

We also need to factor in 20% growth in the number of mailboxes across our messaging environment, therefore the optimal allocation of mailboxes per database is:

160 / 120% = 134 mailboxes allocated per database

Based on this figure, we can now calculate the total number of databases

Table 5

3. Database Availability Group Design

This design comprises of two DAGs deployed across two datacenters in an Active/Active configuration. This configuration allows the active mailboxes to be spread across the datacenter sites, with each DAG hosting active databases in each datacenter site. The design requires a total of 12 Exchange Server 2013 mailbox servers (6 mailbox servers per DAG).

To maintain Chimp Corp’s requirements for availability of the messaging environment, a total of 6 copies per database will be deployed in each DAG (one of which is a lagged copy). The table below lists the high-level configuration requirements for Chimp Corp Database availability group, derived from the Microsoft Exchange Mailbox Server Role Requirements Calculator.

Table 6-3

* Note that values factor an estimated 50% IOPS improvement over Exchange 2010, from figures derived in the Storage Calculator for Exchange 2010.

4. Data Disk Partition Requirements

The disk partition assignments for Exchange are optimized for the underlying hardware infrastructure, taking into account vendor hardware recommendations for Exchange 2013. As mailbox storage is provisioned via SAN, it is assumed that the underlying SAN infrastructure will provide a sufficient degree of hardware fault tolerance and therefore there is no need to adhere to a Microsoft recommended LUN to Database ratio of 2:1. Databases are provisioned into groups of 8 per individual storage LUN as depicted below:

Table 7

5. Database Distribution on the Servers

This section details the preferred sequence numbers of the mailbox databases servicing Chimp Corp Database Availability Group. To facilitate deployment and planning, Mailbox Databases have been categorized into groups of 8, which conveniently map to the physical hardware LUN groupings of 8 databases per LUN (see previous section).

An active copy of each database group is hosted on an individual server in order to evenly distribute load. Each database has a minimum of one redundant one copy and one redundant offsite copy as well as an offsite lagged copy. Database allocations are illustrated in the table below:

Table 8

As the design encompasses 2 DAGs, the database allocation for the second DAG will mirror the design described above.

6 Lagged Database Copies

Lagged Database copies will be implemented with a lag period of 24 hours. A dedicated lag server will be configured in the alternate datacenter site for each DAG for the purpose of hosting the lag database copies.

Note:  A lagged mailbox database copy can be activated to a specific point in time by first suspending the lagged database copy via the Suspend-MailboxDatabaseCopy CMDlet and then running the ESEutil.exe command to replay selected log files into the database to return it to a required restore point. 

7 Datacenter Activation Coordination (DAC) mode

The Datacenter Activation Coordination mode is a high-availability feature designed for use in Database availability groups that span across multiple sites. This mode is disabled by default and needs to be enabled.

DAC mode is used to prevent “Split Brain Syndrome”, in which connectivity between datacenters is lost but both datacenters are still functioning. If the DAG in one datacenter loses connectivity with the other site and tries to bring online any database copies that were active in the other datacenter, the environment will now have multiple active copies of the same database.  Under DAC, any mailbox databases that were active in an alternate site will not be automatically mounted until all other members of the DAG can be contacted, also known as the “Mummy may I” protocol. DAC is activated as follows:

8 Mailbox Storage Quotas

Chimp Corp ITS has defined a requirement that the mailbox size limited to 5 GB per mailbox for all user classes.  Implementing mailbox quotas reduces the risks associated with unexpected growth in mailbox database size and provides a reliable maximum disaster recovery timeframe based on assumptions that a mailbox database size cannot exceed 1TB in size. Several parameters can be configured below to facilitate

Table 9

9 Public Folder Design

Public Folders will be required as part of the design in order to support access to Free/Busy information during the period of migration and coexistence. In Exchange 2013, public folders have now been integrated into Exchange databases and are hosted inside of a public folder mailbox. Client access to public folders is enabled via web-based RPC calls over HTTP to CAS servers which then proxy the communications to a hosted Mailbox server. Public folders are now fully integrated with the replication and high availability of Exchange Mailbox Servers and DAG and no longer utilize a separate replication mechanism. Office 365 now fully supports modern Public folders to provide users with the functionality of public folders that they like.

Changes in Public Folder architecture in Exchange 2013 mean that since only a single active copy of a mailbox database can be available at any time, public folder mailboxes should be collocated as close to users as possible. In the Chimp Corp design, low latency connections between datacenters minimizes this issue.

Conclusion

In this section, we covered several design considerations when it comes to implementing a Mailbox Server Design in Exchange 2013. Standardization is key, as it supports the scalability and predictability of the messaging system. We start by defining mailbox sizes that represent the various user classes in the messaging environment. Mailbox sizes can be used as basic building blocks which we can subsequently use to define optimal mailboxes to allocate per databases, DAG design and finally server allocation.

References

Exchange team blog article explaining the Exchange Mailbox Server Role Requirements Calculator here.

Cloud Architectures: The Strategy behind homogeneity

Extract

So what is homogeneity? Homogeneity refers to keeping things identical or non differentiable from one another on purpose. For example, Cloud Service Providers create homogenous infrastructure on a massive scale by deploying commodity hardware in their datacenters, thus enjoying lower incremental costs which they pass on to the consumer via lower prices. Technology architects can similarly design their applications and services to run over these homogenous computing nodes or building blocks, facilitating horizontal scaling capabilities commonly referred to as elastic computing. In this article, we will explore how homogeneity has been in use long before the era of modern technology and also derive meaningful takeaways from each particular adaptation of homogeneity.

Agility  – Military Tactics

From the dawn of greatest of ancient civilizations, battles were fought and won by ambitious generals relying on the strength of their strategies and ability to quickly adapt their tactics to changing battlefield situations. The conquering Romans provide a fascinating study on the implementation of homogeneity and standardization at the fundamental unit level. A Roman Legion was subdivided into 10 units or cohorts, with each cohort further subdivided into six centuries,  numbering 80 men on average.

Legion

While each Century served as an individual unit in battle and had autonomy of movement under the command of a centurion. the true tactical advantage of the Roman Legion was seen through the combination of these units on the battlefield, known as formations. Each formation presented a different pattern of deploying military units, optimized for various scenarios and terrains, including troop movement and battle formations depending on the relative strength of one’s army and that of opposing forces. For example, the Wedge formation highlighted below, drew the strongest elements of the force into the center that could be used to drive forward through opposing forces. Similarly, the Strong Right Flank formation provided tactical strength versus an opposing army under the principle that a strong right flank could quickly overrun an opposing force’s left flank since the enemy’s left-hand side would be encumbered by shields and less agile to sideways strikes.

RomanFormations

Under the watchful eye of a skilled and experienced commander, the Roman Legion was a force to be reckoned with, but this ultimately relied on the abilities of each individual unit to be deployed rapidly in battle and  to act reliably in performing their role. The Romans dominated the battlefield, besting even the famed Greek Phalanx due to the agility by which the Legion could change formations in the heat of battle. Our takeaway here is that homogeneous building blocks provide scalability and flexibility in response to rapidly changing situations.

Operational Efficiency – Southwest Airlines

Southwest

Southwest Airlines has reached great commercial success in part to its strategy of operating a homogenous fleet of Boeing 737 aircraft. The utilization of a single build of aircraft greatly enhances operational efficiencies in terms of technical support, training, holding spare parts and even route planning, since passenger and baggage loads are fairly consistent across each plane. It’s also less complicated to plan for fleet growth and to allocate resources in anticipation of future demand with a homogenous unit. The task of staffing employees for flights is also greatly simplified, since aircrews are trained to operate one type of aircraft across several variants and the task of scheduling crews to support multiple routes along Southwest’s Point-to-Point system becomes a much simpler endeavor. Southwest Chairman Gary Kelly announced plans in April 2012 to acquire 74 Boeing 737-800s by 2013 in order to augment their existing fleet of 737-700s. This expansion represents an incremental upgrade to their existing fleet capacity, in response to greater demand across Southwest’s expanding network and also to capture greater economy amidst rising fuel costs. An important takeaway from Southwest Airlines is that homogenization leads to operational efficiencies and also drives competitive advantage especially in highly price sensitive markets. Most importantly, homogenization has tangible benefits by directly enhancing profits.

Build Standardization – MacDonald’s Fast Food Restaurants

MacDonalds

Probably one of the most classic examples of enhanced operational efficiency derived from developing a homogenous product, this iconic fast-food provider was founded on the principles of consistency, homogeneity and ease of preparation. While in recent years, MacDonald’s has stepped up its efforts to localize its menu in an effort suit local palates, over 30% of revenues are driven by sales of core items that include the Big Mac, hamburger, cheeseburger, Chicken McNuggets and their world-famous french fries as stated by CEO Jim Skinner during a 2011 earnings call. Sticking to a shorter, standardized menu was part of the company’s push to adhere to stringent quality standards and became a crucial factor in building operational efficiencies from resource procurement capabilities during company’s international expansion in the 1970s. MacDonald’s was specific and exacting in it’s standards when expanding into new markets, to the point of defining the genetic variety of potatoes to use in it’s french fries. This is an important takeaway particularly from the perspective of  a business that is expanding into International markets and whose products are not exactly exportable from one country to another. Standardization and homogenization of a product catalog makes it easier to decide early on whether to local source for materials or to bring your own.

Resource Sharing – Automobile Twinning

Auto Twinning

A long-perceived benefit of consolidating automobile manufacturers into several large corporations that govern the production of numerous brands such as America’s General Motors Group and Ford Motor Company as well as Germany’s Volkswagen Group lies in the ability to utilize a common drive train or chassis that can be deployed across different brands or makes of automobiles. For example, the Ford Escape, Mercury Mariner and Mazda Tribute are all built on the same chassis and share a large proportion of their components. The differences in these cars tend more to be aesthetic in nature, since they’re designed to appeal to different consumer segments, nevertheless the financial benefits to the automobile manufacturer are far more tangible. By getting design teams across various brands to collaborate at an early stage in a car’s development, manufacturers can assemble  a common design platform, of which elements can be later customized to fulfill individual brand attributes. The results are that  manufacturers are able to reap huge benefits from doubling or in some cases tripling their economies of scale as well as accelerating returns on their R&D investment dollars. Our takeaway here is that common components or build elements can be identified and jointly developed as shared resources to prevent duplication of effort and wastage of resources. Service Oriented Architectures that subscribe to a shared services model are a great example of this philosophy in action.

Conclusion

In this article, we explored how the fundamental Cloud Architecture principle of homogenization could provide agility to the Roman Legions, operating efficiencies to Southwest Airlines, standardization to MacDonald’s restaurants and cost savings and improved ROI for major automotive manufacturers. The fact is, examples abound in the world to showcase the rational concepts behind building applications in the Cloud and for this very reason, the Cloud is an exciting and fundamentally compelling evolution of computing for our generation because it makes sense! Thank you for reading.

About RoadChimp

RoadChimp is a published author and trainer who travels globally, writing and speaking about technology and how we can lend paradigms from other industries in order to build upon our understanding of a rapidly emerging technologies. The Chimp started off his technology career by building one of the first dot.com startups of its type in Asia and subsequently went on to gain expertise in Large Scale computing in North America, the Caribbean and Europe. Chimp is a graduate of Columbia Business School and

References

Article on Roman Military Tactics (http://romanmilitary.net/strategy/legform/)

Blog Article on Southwest Airlines (http://blog.kir.com/archives/2010/03/the_genius_of_s.asp)

MacDonald’s Menu breakdown of profitability (http://adage.com/article/news/mcdonald-s-brings-u-s-sales-years/232319/)

FTC study on MacDonald’s International Expansion (http://www.ftc.gov/be/seminardocs/04beyondentry.pdf)

Report on Automotive Twinning (http://www.edmunds.com/car-buying/twinned-vehicles-same-cars-different-brands.html#chart)

Exchange 2013 Sample Architecture Part 3: Design Feature Overview and Virtualization Considerations

Overview

In this part of the Sample Architecture Series, we will hone in on several elements of the Exchange Solution design, namely a description of the overall Exchange 2013 solution design, followed by some basic system configuration parameters as well as virtualization considerations.

Design Features

Exchange 2013 Design

The Exchange 2013 Environment for Chimp Corp features the following design elements:

  • Internal Client Access: Internal clients can automatically locate and connect to available CAS Servers through the Availability and Autodiscover services. CAS Servers are configured in arrays for high-availability and the locations of the CAS servers are published through Service Connection Points (SCPs) in Active Directory.
  • External Client Access:  External clients can connect to Exchange via Outlook Web Access (OWA), Outlook Anywhere and Exchange ActiveSync. Exchange 2013 now supports L4 load balancing for stateless failover of connections between CAS servers in the same Array. Client traffic arrives at the Network Load Balancer, which uses Service Connection Points to locate the internal Mailbox servers and distribute load accordingly.
  • Single Domain Name URL: Exchange 2013 relies on a feature in the TCP/IP protocol stack of client computers that supports the caching of multiple IP addresses that correspond to the same name resolved from DNS. In the event of an individual site failure, the IP address corresponding to the CAS array in that site will become unresponsive. Clients automatically connect to the next cached IP address for the CAS Array in order to reestablish client connections. This IP address corresponds to the CAS Servers in the alternative site and failover occurs without any intervention.
  • Mailbox High availability: This feature is  provided by implementing Database Availability Groups (DAG). A single DAG will be configured to protect the messaging service.  It is preferred to deploy a high number of smaller mailbox databases in order to reduce mailbox restoration or reseed times in the event of a failure of a database copy.
  • Message Routing: All External SMTP traffic will be routed securely via Microsoft’s Exchange Online Protection (EOP) cloud-based services and the Internet. Inter-site messages between the premise and online users will also be routed via EOP. Internal messages between on-premise users in either datacenter site will be routed automatically via the transport service on the on-premise Mailbox servers.
  • Hybrid Deployment: The Exchange 2013 environment will be deployed in tandem with an Exchange Online Organization. The purpose of the Exchange Online Organization will be to host mailbox accounts that have been flagged as non-compliance sensitive and reduce the costs of the on-premises deployment. The hybrid implementation will feature a seamless experience between users in the on-premise and online environments, including Single Sign-on for users through the configuration of trusts between the Microsoft Online ID and the on-premises Active Directory Forest; unified GAL access and the ability for online and on-premise users to share free/busy information through the configuration of a Federation Trust with the Microsoft Federation Gateway; as well as secure encrypted message transport between on-premise and online environments, encrypted, authenticated and transported via Transport Layer Security (TLS)
  • Message Archiving: All Messages will be transferred to the Exchange 2013 via the Exchange Online Archiving Service. The existing on-premises archiving solution will be decommissioned after existing message archives are ingested into the Exchange Online Archive.

Exchange 2013 Virtualization

All Exchange 2013 server roles are fully supported for virtualization by Microsoft. Virtualization can assist an organization in consolidating its computing workload and enjoying benefits from cost reduction and efficient hardware resource utilization. According to Microsoft recommended Best Practices, load calculations when provisioning Exchange deployments in a virtual environment must accommodate for additional overheads from the virtualization hypervisor. Therefore, this solution design has factored in an additional resource overhead of 12% to accommodate virtualization.

The following server roles will be virtualized:

  •     Exchange 2013 Mailbox Servers
  •     Exchange 2013 CAS Servers

Microsoft provides further guidance on implementing Exchange Server 2013 in a virtualized environment. Relevant factors have been listed below:

  1. Exchange Servers may be combined with virtual host-based failover clustering migration technology, provided that the virtual machines are configured to not save and restore disk state when moved or taken offline. Host-based failover must result in a cold boot when the virtual machine is activated on a target node.
  2. The root machine should be free of all applications save the virtual hypervisor and management software.
  3. Microsoft does not support taking a snapshot of an Exchange virtual machine.
  4. Exchange supports a Virtual Processor to Physical Processor ratio of no greater than 2:1 and Microsoft recommends an ideal processor ratio of 1:1. Furthermore, virtual CPUs required to run the host OS should be included in the processor ratio count
  5. The disk size allocated to each Exchange Virtual machine must use a disk that is of size equal to 15GB plus the size of virtual memory allocated to the virtual server.
  6. The storage allocated for Exchange data can either be virtual storage of a fixed site, such as fixed Virtual Hard Disks (VHDs), SCSI pass-through storage or iSCSI storage.
  7. Exchange 2013 does not support NAS storage. However, fixed VHDs that are provisioned on block level storage and accessed via SMB 3.0 on Windows Server 2012 Hyper-V are supported.
  8. Exchange 2013 is designed to make optimal usage of memory allocations and as such, dynamic memory features for Exchange are not supported.

Conclusion

Subsequent sections of this series will focus on the Exchange Mailbox Design and CAS Design, as well as the Hybrid Implementation and additional features.

Please click here for the next part: Exchange 2013 Mailbox Server Role Design.

Cloud Architectures: Session Handling

Introduction

Deploying applications to the cloud, requires a critical re-think of how applications should be architected and designed to take advantage of the bounty that the cloud has to offer. In many cases, this requires a paradigm shift in how we design the components of our applications to interact with each other. In this post, we shall explore how web applications typically manage session state and how cloud services can be leveraged to provide greater scalability.

Web Application Tiers

It is a common practice to design and deploy Scalable Web Applications in a 3-tiered configuration, namely as follows:

1. Web Tier: This tier consists of anywhere from a single to a large number of identically configured web servers that are primarily responsible for authenticating and managing requests from users as well as coordinating requests to subsequent tiers in the web architecture. Cloud-enabled Web Servers commonly utilize the HTTP protocol and the SOAP or REST styles to facilitate communication with the Service Tier.

2. Service Tier: This tier is responsible for managing business logic and business processing. The Service Tier comprises of a number of identical stateless nodes that host services that are responsible for performing a specific set of routines or processes.

3. Data Tier: The data tier hosts business data across a number of structured or unstructured formats and most cloud providers commonly host a variety of storage formats, including Relational Databases, NoSQL and simple File Storage, commonly known as BLOBS (Binary Large OBjects).

4. Load Balancing (Optional): An optional tier of load balancers can be deployed on the perimeter of the  Web Services tier to load balance requests from users and distribute load among servers in the Web Tier.

Managing Session State

Any web application that serves users in a unique way needs an efficient and secure method of keeping track of each user’s requests during active session.  For example, an e-commerce shopping site that provides each user with a unique shopping cart needs to be able to track the individual items in each user’s shopping cart for the duration of their active web session. More importantly, the web application that serves the user needs to be designed to be resilient to failures and potential loss of session data. In a Web Services architecture, there are a number of methods which can be employed to manage the session state of a user. We shall explore the most common methods below:

  • Web Tier Stateful (Sticky) Sessions: A web application can be designed such that the active Web Server node that a user get’s redirected to stores the session information locally and all future requests from the user are served by that node. This means that the Server Node becomes stateful in nature. Several disadvantages of this design are that the node serving the user becomes a single point of failure and also that any new nodes added to the collection of Web Servers can only share the load of subsequently established sessions since existing sessions continue to be maintained on existing nodes, thus severely limiting the scalability of this design and its ability to evenly distribute load
  • Web Tier Stateless Session Management: This design solves several limitations stated in the previous design by storing user session state externally, that can be referenced by any of the connected Web Server nodes. An efficient way to store small amounts of session data can be via a small cookie that stores a Session ID for the individual user. This Session ID can serve as a pointer for any inbound request between the user and the Web Application. Cloud Service Providers offer various types of storage to host Session State data, including NoSQL, Persistent Cloud Storage and Distributed Web Cache storage. For example, a web-tier request would be written to use AJAX to call REST services that would in turn pull JSON data relating to an individual user’s session state.
  • Service Tier Stateless Session Management: In most web architectures, the Service Tier is designed to be insulated from user requests. Instead, the Web Tier acts as a proxy or intermediary, allowing the Service Tier to be designed to be run in a trusted environment. Services running in this tier do not require state information and are completely stateless. Due to this statelessness, the service tier enjoys the benefits of the loose coupling of services, which allows individual services to be written in different forms of code such as Java, Python, C#, F# or Node.js based on the proficiency of the development teams and are still able to communicate with each other.

Summary

Stateless Session Management allows us to build scalable compute nodes within a Web Application Architecture that are easy to deploy and manage, and reduce single points of failure and take advantage of scalability and resiliency offered by Cloud Services providers to host session state data.

Cloud Architectures

Hiya, RoadChimp here!

I’ve spent a lot of time over the past few years working on building scalable computing systems and like many of you, lived through the proliferation of virtualization into mainstream technology infrastructure. In all these years of work around the globe across a variety of industries and scopes, I have not seen any other classification of technology that promises the same amount of technological advancement that Cloud  Platform Services provides. I strongly believe that the next few years in our industry will revolve mostly around building a new breed of services that rapidly make our lives easier and more efficient and that live and interact entirely in the cloud.

Therefore, I’ve decided to devote a section of this blog to Cloud Architectures and the paradigms that we must adopt in order to take advantage of this avalanche of new technologies.

Exchange 2013 Sample Architecture Part 2: High-level Architectural Design Document

Overview:

This section provides an introduction into the key elements of the Exchange 2013 Architectural solution. It provides high-level solution overview and is suitable for all technical project stakeholders. The excerpts of the final design document are listed under this post and the full High Level Design document can be downloaded here: RoadChimp Sample Architectural Doc (ADOC) v1.1

1. Messaging Infrastructure function

The Messaging Infrastructure serves the primary function of providing electronic mail (E-mail) functionality to Chimp Corporation. The messaging infrastructure supports E-mail access from network connected computers and workstations as well as mobile devices. E-mail is a mission critical application for Chimp Corp and it serves as an invaluable communications tool that increases efficiencies and productivity, both internally to an organization, and externally to a variety of audiences. As a result, it is of paramount importance for the Chimp Corp to maintain a robust infrastructure that will meet present and future messaging needs.

Key requirements of the messaging infrastructure are as follows:

  • Accommodate service availability requirements
  • Satisfy archiving requirements
  • Satisfy growth requirements
  • Provide required business continuity and disaster recovery capabilities

1.A. About IT Services (ITS)

The IT Services Organization is responsible for managing the IT environment for the Chimp Corp as well as ensuring adherence to published standards and operational compliance targets throughout the enterprise.

1.B. Service Level Agreement (SLA)

The Email infrastructure is considered mission critical and, therefore, has an SLA requirement of 99.99% availability.
The full SLA for the messaging environment can be found in the document <Link to SharePoint: Messaging SLA> 

1.C.  Locations

The messaging infrastructure is hosted from two separate datacenters being at:

  • Datacenter A (DCA)
    Chimp Center Prime
    1 Cyber Road,
    Big City
  • Datacenter B (DCB)
    Chimp Center Omega
    10 Jungle Way,
    Banana Town

The messaging infrastructure is supported by the IT Services Support Organization located at:

  • Chimp Corp Headquarters
    Chimp Center Prime
    Bldg 22, 1 Cyber Road,
    Big City

1.D.     E-mail User Classifications

The primary users of the messaging system are Chimp Corp employees. The user base is divided in two groups as follows:

  •     Exec: users performing Senior or Critical corporate functions
  •     Normal: the rest of the user population

2. Existing Platform

This section of the Asset document provides an overview of the present state of the asset, as well as a chronological view of changes based on organizational or technological factors.

2.A.     Existing Exchange 2003 design

A third-party consulting company performed the initial implementation of the messaging environment in 2000. The messaging platform was Microsoft Exchange 2003 and Windows 2003 Active Directory. The diagram below provides a representation of the existing design blueprint.

Exchange 2003 Environment

Fig. 1 Existing Messaging Environment

A single unified Active Directory Domain namespace chimpcorp.com was implemented in a Single Domain, single Forest design.

2.B. Change History

Over the years the Chimp Corp messaging environment has undergone various changes to maintain service level and improve functionality. The timeline below shows the changes over time.

Chimpcorp Timeline
Fig. 2 Chimp Corp Messaging Infrastructure Timeline

2.B.1   Initial Implementation

The Exchange 2003 messaging infrastructure was implemented by IT Services in 2005 and the entire user base was successfully migrated over to Exchange 2003 by September 2005.

2.B.2   Linux Virtual Appliances deployed for Message Hygiene

A decision was made by IT to deploy a Message Hygiene environment for the company in Windows 2013.

 This change was scheduled as maintenance and was executed early 2009.

2.B.3   Additional Datacenter Site (Omega)

In order to improve infrastructure availability and to support additional growth of the corporate environment, a second datacenter site, codenamed Omega was commissioned and fully completed by March of 2009.

2.B.4   Two Exchange Mailbox Clusters (A/P) deployed in the Omega Datacenter Site

To improve the availability of e-mail for users and also to meet plans for storage and user growth, two additional Exchange Mailbox Servers were deployed in Datacenter Omega (DCB).

2.B.5   Third-party archiving solution

A third party archiving solution was deployed by IT Services in 2010 as part of efforts to mitigate growth of the Exchange Information Stores, based on recommendations from their primary technology vendor. The archiving solution incorporates a process known as e-mail stubbing to replace messages in the Exchange Information Stores with XML headers.

2.B.6   Acquisition by Chimp Corp

After being acquired by Chimp Corp in September 2011, immediate plans were laid out to perform a technology refresh across the entire IT infrastructure.

2.B.7   Active Directory 2008 R2 Upgrade

The Windows Active Directory Domain was updated to version 2008 R2 in native mode in anticipation of impending upgrades to the Messaging infrastructure. The replacement Domain Controllers were implemented as Virtual Machines hosted in the Enterprise Virtual Server environment running VMWare vSphere 5. This change was completed in March 2012.

2.C. Existing Hardware Configuration

The current hardware used in the messaging platform consists of the following elements:

2.C.1   Servers

Existing server systems comprising the messaging environment include:

    • 12 x HP DL 380 G4 servers at DCA with between 2 – 4 GB of RAM
    • 10 x HP DL 380 G4 servers at DCB with between 2 – 4 GB of RAM

2.C.2   Storage characteristics

Exchange storage used for databases, backups, transaction logs and public folders have been provisioned on:

    • 2 TB of FC/SAN attached storage provisioned for 5 Exchange Storage Groups and 21 Databases and Transaction Logs
    • 2 TB ISCSI/SAN attached storage Archiving

2.D. Network Infrastructure

The Chimp Corp email infrastructure network has two main physical locations at the DCA and DCB datacenter sites. These are currently connected via the Chimp Corp LAN/WAN. The core switches interconnecting all hardware are Cisco 6500 Series Enterprise class switches.

2.E.  Present Software Configuration

Software and licenses presently in use include:

  • Microsoft Windows 2003 Standard
  • Microsoft Windows 2003 Enterprise
  • Microsoft Exchange 2003 Standard
  • Microsoft Exchange 2003 Enterprise
  • Third Party SMTP Appliances
  • A Stub-based third-party Email Archiving Tool

3. Messaging Infrastructure Requirements

The design requirements for the Exchange 2013 messaging environment have been obtained from the project goals and objectives, as listed in the Project Charter for the E13MAIL Project.

The primary objective for the E13MAIL Project is to ensure continued reliability and efficient delivery of messaging services to users and applications connecting to Chimp Corp from a variety of locations. Stated design goals are to increase performance, stability and align the operational capabilities of the messaging environment with Industry Best Practices.

The requirements/objectives for the messaging infrastructure are:

  • Redundant messaging solution deployed across 2 datacenter locations; DCA and DCB.
  • Capable of Audit and Compliance requirements
  • High Availability (99.99%)
  • Monitoring of services and components
  • Accurate configuration management for ongoing support
  • Adherence to Industry Best Practices for optimal support by vendors and service delivery organizations
  • Reliable Disaster-Recoverable backups, with object level recovery options
  • Message Archiving functionality with a maximum retention period of 7 years

4. Design Components

The primary messaging solution is to deploy new Exchange 2013 environment that spans Chimp Corp’s physical data center locations and extends into Microsoft’s Office 365 cloud to take advantage of the latest user productivity and collaboration features of Microsoft Office 2013.

The main goals for this solution are:

  • Minimize end-user impact: Minimizing the end-user impact is a key goal for Chimp Corp. Significant effort must be made to ensure that the transition of all e-mail related services are seamless to the end-user.
  • Reliable delivery of services: The messaging environment is a mission critical component of Chimp Corps IT infrastructure and adheres to strict Change Management practices. The solution must be able to integrate with existing Operational and Change processes.
  • Longevity of solution: The new messaging solution must endure beyond the initial implementation as it evolves into a production state. This requires the necessary attention to ensuring that operational knowledge is transferred to IT Services technical teams such that they can maintain uptime requirements.

The individual design components were subjected to a stringent evaluation process that included the following design criteria:

  •     Costs of Ownership
  •     Technological engineering quality
  •     Scalability
  •     Fault Tolerance / Reliability
  •     Industry best practices
  •     Supportability
  •     Ease of administration
  •     Compatibility with existing systems
  •     Reliability
  •     Vendor specifications

4.A. Hardware technology

IT Services researched the server solutions from a number of hardware vendors and made a final decision in favor of HP, Brocade and Cisco vendor equipment.

4.A.1   Server hardware

The server platform used is the eighth generation (G8) HP Blade 400-series server. This is an Intel based server system. The CPUS’s in these systems are standardized to Intel Xeon E5-2640 processors; these are hex-core processors with a 2.5 GHz speed. The servers are equipped with 128 GB of memory to accommodate their specific functions. The blade servers are provisioned in HP Blade C 7000 class enclosures.

4.A.2   Storage hardware

To accommodate the storage requirements two storage arrays are implemented. The primary array is an HP EVA 6400 class Storage Area Network. This array is equipped with 25TB of RAW storage and is used for on-line, active data. The secondary array is an HP P2000 G3 MSA class storage area network. This array is equipped with 15TB of RAW storage and is used for secondary storage like archives, backups etc.

4.A.3   Interconnect technology

HP’s Virtual Connect technology is used to accommodate network connectivity to both the storage network and the data networks. The virtual connect technology acts as a virtual patch panel between uplink ports to the core switching infrastructure and the blade modules. The virtual connect backplane will connect the network connections into a Cisco based core network. The storage area network is interconnected via a Brocade switch fabric.

4.A.4   Server Operating Systems technology

The majority of the messaging infrastructure components will be deployed onto the Microsoft Windows Server 2012 Operating System platform, licensed to the Enterprise version of the operating system. For systems that do not support Windows Server 2012, Windows Server 2008/R2 will utilized.

4.A.5   Messaging platform technology

A pristine Microsoft Exchange Server 2013 will be implemented in a hybrid configuration, featuring two major components:

  • On-premise Exchange 2013: The on-premise environment to support core business functions that cannot be moved to the cloud due to compliance reasons.
  • Office 365: All non-compliance restricted users will be migrated onto the Office 365 cloud.

The hybrid deployment will feature full interoperability between on-premise and cloud-based users, featuring single sign-on, sharing of calendar Free/busy information and a single unified OWA login address.

4.A.6   Back-end Database technology

Microsoft SQL Server 2012 was selected as the database platform to support all non-Exchange application requirements. The selection criterion for this product was partly dictated by the usage of technologies that depend on the SQL server back-end. As part of simplification and unification, it is preferred to keep all back-end databases in the messaging infrastructure on the same database platform.

4.A.7   Systems Management Solution

Due to the diversity of software applications and hardware in this infrastructure, a mix of management tools and products are used to manage all aspects of the messaging infrastructure. Major components are listed below:

(a)    Server hardware management: Vendor provided HP System Insight Manager hardware tools are used in combination with Microsoft System Center Operations Manager (SCOM) to provide hardware-level monitoring and alerting.

(b)    Server event management: Microsoft Systems Center Operations Manager (SCOM) 2012 is used for server event consolidation, management and alerting.

(c)     Server Applications management: Server software management comprises of systems patch management and server provisioning.

    • Systems patch management: Windows Systems Update Server (WSUS) integrated into Systems Center Configurations Manager (SCCM) provides patch management of all Windows Server Operating Systems in the messaging environment.
    • Server Provisioning: Server Provisioning for both bare metal and virtual server deployments are managed via the HP rapid deployment pack (HP/RDP)

4.A.8   Message Security and Protection technology

The following Security and Protection products have been selected:

  • Server Virus protection: McAfee Antivirus has been selected to protect the server operating system.
  • Message hygiene: Microsoft Exchange Online Protection (EOP) will be used for message hygiene and protection.
  • Security events auditing: Microsoft SCOM has been selected to capture information such as security auditing and alerting events that are generated from the server platforms.

4.B. Functional Blueprint

The blueprint below illustrates the desired messaging infrastructure:

Exchange 2013 Design

Figure 3 Chimp Corp Functional Messaging Design 

Conclusion

In the next section we will cover more detailed aspects of the Exchange 2013 design, as well as Server Virtualization Considerations for deploying Exchange 2013.

For the next part of this post, please click here.

Exchange 2013 Sample Architecture Part 1: Implementation Scenario

Scenario Overview:

Chimp Corp has recently completed the acquisition of a competitor, Bananas Inc. As part of the core infrastructure architecture team, you have been brought in to design and implement Exchange 2013 as part of a large Enterprise systems refresh. The Project Charter has been signed off by senior stakeholders with the objective of upgrading the existing messaging environment from Exchange 2003 SP2 to Exchange 2013. Senior Management has expressed a desire to migrate the messaging environment to the cloud in order to take advantage of cost benefits, however the compliance department has mandated that specific components of the messaging environment must stay on-premises in order to meet regulatory requirements.

Management has decided to deploy a Hybrid Exchange 2013 environment in a new Active Directory Forest that is Federated to an Exchange Online organization. The on-premise environment will host approximately 60% of the Organization’s mailboxes and the remaining 40% of the Organization’s mailboxes are considered to be non-sensitive and the compliance department has approved their migration onto the cloud. This scenario represents the path of least resistance, as Microsoft Exchange 2013 does not support direct upgrade path from Exchange 2003 to Exchange 2013 and due to the considerable size of the corporate messaging environment (15,000 Mailboxes), a swing migration to Exchange 2007/2010 and then to Exchange 2013 was considered to be impractical.

Existing Environment:

Exchange 2003 Environment

The messaging environment features Exchange 2003 SP2 with Active Directory 2008, featuring four Clustered Exchange Mailbox Servers implemented across two datacenters with dedicated Network Load Balanced Exchange Front End Servers in each location. Third-party SMTP Message Hygiene appliances were configured in each site to provide Spam Filtering and Anti Virus Scanning and in addition, a number of applications were configured to relay SMTP messages via one of the SMTP appliances. A third-party Archiving tool was deployed across both sites and client access was provisioned primarily via Outlook RPC, OWA; Blackberry Enterprise Servers and Microsoft ActiveSync.

Requirements:

The following solution requirements were distilled from theRequest for Proposal (RFP) document. The solution must:

  • Conform to Microsoft Best Practices
  • Accommodate 99.99% high availability standards
  • Adhere to Disaster Recovery, High Availability and Business Continuity standards
  • Provide a fully redundant and load balanced design
  • Accommodate 9,000 Mailboxes across 2 datacenters
  • Accommodate 6,000 Mailboxes on an Exchange Online organization
  • Average Mailbox Size 1 GB
  • Anticipated Storage Growth per year of 20%
  • Store Archived Emails for 7 years
  • Adhere to Retention,  Legal Hold and eDiscovery requirements
  • Perform Email Archiving whenever a mailbox reaches 1GB or whenever messages are 1 year old.
  • Network Access Bandwidth: 1 Gbps
  • Storage Access: Minimum bonded Gigabit Connections or Fibre Channel
  • Client Access: Outlook 2007 and later, Internet Explorer, Safari and Firefox using OWA, PDA access from RIM Blackberries, IP Phones and Microsoft Windows Mobile 6 and later.

Proposed Solution:

Based on stipulated requirements in the RFP, the Proposed Solution must include a number of components including the following:

  1. Methodology used to implement Exchange 2013 and related architectural components
  2. Related hardware and software builds
  3. Related costs of implementing the solution
  4. Annual cost of ownership
  5. A high level project plan detailing the responsibilities of various stakeholders

The final solution proposed an implementation of Exchange 2013 configured as a Hybrid environment. The Exchange 2013 environment would feature the following benefits:

  • Scalability and flexibility of moving users to the cloud
  • Virtualization of on-premises Exchange environment
  • Migration of Archives to Exchange Online Archiving
  • Deployment of High Availability using native Microsoft technologies
  • Unified Management via Microsoft Exchange Administration Center
  • Systems Management via Microsoft System Center Configuration Manager (SCCM) 2012

Solution Components:

The solution featured a high-level design that was broken into the following components:

  • Exchange Server
  • Infrastructure Services
  • Email Archiving
  • Storage Design
  • Backup and Recovery
  • Systems Management
  • Client Access

Conclusion:

Successive sections in this series will provide you with the various design components of the final solution, as well as related Project Implementation plans.

For Part 2: High-level Architectural Design Document, click here.

Exchange 2013 Architecture Samples

I’m posting a set of Sample Architectural Design documents that were adapted from a real-world Multi-site Hybrid Exchange deployment. The documents are built entirely on Best Practices and I’ve taken the liberties of updating elements of the design to reflect changes in the Exchange 2013 Architecture (and of course to remove and sensitive confidential information).

This was a fairly large implementation and took the combined efforts of a large team of engineers to complete all of the deliverables, who inspired me and continue to do so. You may not need all of these document components, but it’s good to see how a large Messaging Environment can be broken down into its constituent components and architected in detail.

Read First: These design documents were derived from detailed research and consulting with expert engineers. Feel free to use as a reference, but always verify the requirements of your project against the data in these guides. Roadchimp takes no responsibility for any implementation issues that you encounter. Make sure that you implement licensed copies of Microsoft Software with valid Microsoft Support in place prior to making any changes to a production environment. Further more, make sure that you consult with Microsoft’s  resources to ensure that your hardware is fully supported by Microsoft for deploying Exchange 2013, Windows Active Directory and other architectural components.

I will start posting links to these templates here:

Exchange 2013 Configuration Guides

A warm Ook and hello from your banana loving primate friend! I’ve decided to put up a list of configuration guides for Exchange 2013 in an easy to access part of this blog. The configuration guides will help you to perform (hopefully) some tasks that you may find useful. I will post links to various guides on this page.

1. Exchange 2013 in Windows Azure

2. Configuring a Hybrid Exchange 2013 Deployment

 

I hope to get more posts out there. Thanks for all your comments and likes!

Road Chimp saying Ook!