Upcoming Series on Cloud Companies

Hello all,

Just wanted to provide a quick update on some research that I’m working on. There’s been quite a bit of coverage in the media on various types of cloud solutions, from Infrastructure to platforms and also applications that sit on the cloud. While I’ve written extensively about Applications and Infrastructure, I realized that there would be value in posting some information about what’s hot and hype with Platform Services delivery from a recent consulting engagement.

Your friendly chimp recently completed a strategic analysis for a multinational client who is looking to get into the cloud. It was a very exciting summer, traveling to various countries to meet my client and evaluating various technology solutions that would compliment the client’s existing competencies and systems.  In upcoming weeks, I will be sharing some of my research with you, my fellow simian readers. Some of the articles I intend to post:

Market Analysis – How are existing providers in the market delivering their products and how they plan to differentiate their solutions. More importantly, how are these solutions influenced by consumer behavior?

Launching a Cloud Service – What does is take to start a new business? I use a few tools that help budding entrepreneurs to devise a winning strategy in minutes, using real-world experience and the application of valuable frameworks.

Till then, your friendly Chimp looks forward to interacting with you all again! Ook

RoadChimp out.

Cloud Architectures – Storage in the Cloud

roadchimp clouds

Brief

Cloud technology is deployed across a wide variety of industries and applications. The term ‘Cloud’ itself has become so widely prevalent that we’ve devised additional terms in an effort to describe what type of cloud we’re talking about. What’s your flavor? Iaas, Paas or Saas? Or perhaps it’s Public, Private or Hybrid?

Regardless of the type of cloud you’re using or planning to implement, there’s no denying that storage is an essential component of every cloud architecture that simply cannot be overlooked. In this post, we will look into some of the most common usages of storage in the cloud and peel back the layers to discover exactly what makes them tick. Our goal is to come up with a yardstick to measure storage design.

Drivers towards Cloud Storage adoption

dropbox box_logo

What do Dropbox and Box Inc have in common? Both companies are less than 5 years old, offer services predominantly centered around cloud storage and file sharing and have been able to  attract significant amounts of capital from investors. In fact, Dropbox raised $250 million at a $4 billion dollar valuation from investors with  Box Inc raising another $125 million in mid 2012. It looks like Silicon Valley sees Cloud Storage services as a key piece in the future of cloud. So why is there such a tremendous interest around cloud storage? Consumers are drawn to a number of benefits of using cloud:

  • Redundancy: Large clouds incorporate redundancy at every level. Your data is stored in multiple copies on multiple hard drives on multiple servers in multiple data centers in multiple locations (you get the picture).
  • Geographical Diversity: With a global audience and a global demand for your content, we can place data physically closer to consumers by storing it at facilities in their country or region. This dramatically reduces round trip latency, a common complaint for dull Internet performance.
  • Performance: Storage solutions in the cloud are designed to scale dramatically upwards to support events that may see thousands or millions more consumers accessing content over a short period of time. Many services provide guarantees in the form of data throughput and transfer.
  • Security & Privacy:  Cloud storage solutions incorporate sophisticated data lifecycle management and security features that enable companies to fulfill their compliance requirements. More and more cloud providers are also providing services that are HIPAA compliant.†
  • Cost: As clouds get larger, the per unit costs of storage go down, primarily due to Economies of Scale. Many service providers choose to pass on these cost savings to consumers as lower prices.
  • Flexibility: The pay as you use model takes away concerns for capacity planning and wastage of resources due to cyclical variations in usage.

It should be noted that a Draft opinion paper released by the EU Data Protection Working Party while not explicitly discouraging Cloud adoption, recommended that Public Sector agencies perform a thorough risk analysis prior to migrating to the cloud. You can read the report here.

Storage Applications for the Cloud

We’ve listed some of the most common applications for cloud storage in this section:

  • Backup: The cloud is perceived to be a viable replacement for traditional backup solutions, boasting greater redundancy and opportunities for cost savings. The Cloud backup market is hotly contested in both the consumer and enterprise markets.
    • In the consumer market, cloud backup services like Dropbox, Microsoft SkyDrive and Google Drive offer a service that takes part of your local hard drive and syncs them up with the cloud. The trend for these pay for use services are on the rise, with Dropbox hosting data for in excess of 100 million users within four years of launching their service.
    • In the Enterprise Space, Gartner’s magic quadrant for enterprise backup solutions featured several pureplay Cloud backup providers including Asigra, Acronis and i365. Even leading providers such as CommVault and IBM have launched cloud-based backup solutions. Amazon’s recently launched Glacier service provides a cost-effective backup tier for around $0.01 per gigabyte per month.
      01 Gartner Magic Quadrant
  • File Sharing: File sharing services allow users to post files online and then share the files to users using a combination of Web links or Apps.  Services like Mediafire, Dropbox and Box offer a basic cloud backup solution that provides collaboration and link sharing features. On the other end of the spectrum, full-blown collaboration suites such as Microsoft’s Office 365 and Google Apps feature real-time document editing and annotation services.
  • Data Synchronization: (between devices): Data synchronization providers such as Apple’s iCloud as well as a host of applications including the productivity app Evernote allow users to keep files  photos and even music synchronized across array of devices (Desktop, Phone, Tablet etc.) to automatically synchronize changes
    evernote
  • Content Distribution: Cloud content distribution network (CDN) services are large networks of servers that are distributed across datacenters over the internet. At one point or another, we’ve used CDNs such as Akamai to enhance our Web browsing experience. Cloud providers such as the Microsoft Windows Azure Content Distribution Network (CDN) and the Amazon CDN offer affordable CDN services for serving static files and images to even streaming media to global audience.
  • Enterprise Content Management Companies are gradually turning to the cloud to manage Organizational compliance requirements such as eDiscovery and Search. Vendors such as HP Autonomy and EMC provide services that feature secure encryption and de-duplication of data assets as well as data lifecycle management.
  • Cloud Application Storage: The trend towards hosting applications in the cloud is driving innovations in how  we consume and utilize storage. Leading the fray are large cloud services providers such as Amazon and Microsoft who have developed cloud storage services to meet specific applications needs.
    • Application Storage Services: Products like Amazon Simple Storage Service (S3) and Microsoft Windows Azure Storage Account support storage in a variety of formats (blob, queue and table data) and scaling to very large sizes (Up to 100TB volumes).  Storage services are redundant (at least 3 copies of each bit stored) and can be accessed directly via HTTP, XML or a number of other supported protocols. Storage services also support encryption on disk.
      02 Azure storage
    • Performance Enhanced Storage: Performance enhanced storage emulates storage running on a SAN and products like Amazon Elastic Block Storage provide persistent, block-level network attached storage that can be attached to virtual machines running and in cases VMs can even boot directly from these hosts. Users can allocate performance to these volumes in terms of IOPs.
    • Data Analytics Support: Innovative distributed file systems that support super-fast processing of data have been adapted to the cloud. For example, the Hadoop Distributed File System (HDFS) manages and replicates large blocks of data across a network of computing nodes, to facilitate the parallel processing of Big Data. The Cloud is uniquely positioned to serve this process, with the ability to provision thousands of nodes, perform compute processes on each node and then tear down the nodes rapidly, thus saving huge amounts of resources. Read how the NASA Mars Rover project used Hadoop on Amazon’s AWS cloud here.

Storage Architecture Basics

03 Generic Storage Architecture

So how do these cloud based services run? If we were to peek under the hood, we would see a basic architecture that is pretty similar to the diagram above. All storage architectures comprise of a number of layers that work together to provide users with a seamless storage service. The different layers of a cloud architecture are listed below:

  • Front End: This layer is exposed to end users and typically exposes APIs that allow access to the storage. A number of protocols are constantly being introduced to increase the supportability of cloud systems and include Web Service Front-ends using REST principles, file-based front ends and even iSCSI support. So for example, a user can use an App running on their desktop to perform basic functions such as  creating folders,  uploading and modifying files, as well as defining permissions and share data with other users. Examples of Access methods and sample providers are listed below:
    • REST APIs: REST or Representational State Transfer is a stateless Web Architecture model that is built upon communications between clients and servers. Microsoft Windows Azure storage and Amazon Web Services Simple Storage Service (S3)
    • File-based Protocols: Protocols such as NFS and CIFS are supported by vendors like Nirvanix, Cleversafe and Zetta*.
  • Middleware: The middleware or Storage Logic layer supports a number of functions including data deduplication and reduction; as well as the placement and replication of data across geographical regions.
  • Back End: The back end layer is where the actual physical hardware is implemented and we refer to read and write instructions in the Hardware Abstraction Layer.
  • Additional Layers: Depending on the purpose of the technology, there may be a number of additional layers
    • Management Layer: This may supporting scripting and reporting capabilities to enhance automation and provisioning of storage.
    • Backup Layer: The cloud back end layer can be exposed directly to API calls from Snapshot and Backup services. For example Amazon’s Elastic Block Store (EBS) service supports a incremental snapshot feature.
    • DR (Virtualization) Layer: DR service providers can attach storage to a Virtual hypervisor, enabling cloud storage data to be accessed by Virtual Hosts that are activated in a DR scenario. For example the i365 cloud storage service automates the process of converting backups of server snapshots into a virtual DR environment in minutes.

Conclusion:

This brief post provided a simple snapshot of cloud storage, it’s various uses as well as a number of common applications for storage in the cloud. If you’d like to read more, please visit some of the links provided below.

Roadchimp, signing out! Ook!

Reference:

* Research Paper on Cloud Storage Architectures here.
Read a Techcrunch article on the growth of Dropbox here.
Informationweek Article on Online Backup vs. Cloud Backup here.
Read more about IBM Cloud backup solutions here.
Read about Commvault Simpana cloud backup solutions.

Technology in Government -Big Data

Executive Brief

In this article, we continue our series on technology in government by reviewing Big Data. We plan to review the impact of Big Data in the Government and common applications of technologies to manage this issue. First of all, let’s look at some basic definitions and define the scope of this article.

What is big data?

While Roger Magoulis of O’Reilly Media is most commonly credited for coining the term “Big Data” back in 2005 and launching it into the mainstream of consciousness, the term has been floating around for a number of years. (researchers found examples dating from the mid-1990s in Silicon Graphics (SGI) Slide Decks) Nevertheless, Big Data basically refers to data sets that are so large to the extent that their size becomes an encumbrance when trying to  manage and process the data using traditional data management tools.

According to IBM, we create 2.5 quintillion bytes of data each day and is commonly described by three characteristics:

  • Volume: Big Data refers to large amounts of data that is generated across a variety of applications and industries. At the time of this article, the order of magnitude from 100s of GB to Terabytes and Petabytes of data could easily qualify under the definition.
  • Variety: With a wide and disparate number of sources of Big Data, the data can be structured (like a database), semi-structured (indexed) or unstructured.
  • Velocity: The data is generated at high speeds, and needs to be processed in relatively short durations (seconds).

Why is big data important?

Big data conveys an important shift in how we interpret data to look for meaningful things in the world. The advent of Social Networking and E-commerce brought about a need for suppliers of rapidly non-differentiated online services to learn about the behavior of online users in order to tailor a superior user experience. Some of the most successful companies in the World (Hint.. starts with the letter ‘G) have based their entire business models on delivering customized ads to users based on their search queries. Prominent research projects such as NASA’s SETI (Search for Extra-terrestrial Intelligence) and Mars Rover projects; and the Human Genome sequencing program also called for similar needs:

The ability to perform lightning speed computational processes on extremely large sets of data that were also subject to frequent changes.

The challenges of traditional data management tools

The problem with conventional approaches towards managing data was that the data primarily had to be structured. Picture a database that  supports the catalog of a conventional online e-commerce website and holds hundreds and thousands of items. The database is structured and relational, meaning that each item put up for sale on the site can be stored as an object and described by a number of attributes, including the name of the item, the item’s SKU number, category, pricedescription, etc. For each item that we load onto the database ,we can perform searches according to product categories and descriptions and even sort the products by price. This is great and also efficient, because almost every object in the database will have the same types of attributes. Relational Database technologies such as SQL, Oracle etc. are great at handling this and are still very much in use today.

The problem we encounter when it comes to handling Big Data is that the data is subject to frequent change. With a Relational system, we need to define a structure or schema ahead of time. That’s not a big problem with an Online Shopping Cart database, since most items have the same attributes as described above. But what if we don’t know the types of attributes of the data we’re planning to store? Let’s imagine that we have a service that crawls the Web for Real Estate websites in a particular region. The objective is to build up an aggregated repository of information about properties for sale or rent that users can query.  Very frequently, the data that is being collected can be in a variety of sizes and types. For example, we could have HTML files, media files (JPEGs and MPEGs) or even strings of characters. In some cases it may be impossible to build a structure ahead of time, because we simply don’t know what’s out there.

So what happens each time we need to change the structure of a relational database? Rolling out schema changes for a database is a potentially complex, time and resource-intensive process and has a definite performance impact on the database during the change. Conventional solutions such as adding more computing resources or splitting up the database into shards are feasible, but do not fundamentally change how the data is being managed.

Solution: Big Data Technology

In the previous section, we explored the need for corporations and organizations to manage increasingly large amounts of data as well as the ineffectiveness of existing Database Management systems in dealing with these large data sets . In this section, we will briefly cover the most commonly deployed solutions in the industry for Big Data management.

Hadoop: Some industry executives have likened Hadoop to the brand “Kleenex”, meaning to say that Hadoop is synonymous with Big Data. Hadoop was largely developed at Yahoo and named after the toy elephant of a researcher’s son. Hadoop’s mechanism and components are described briefly:

  1. Distributed Cluster Architecture: Hadoop comprises of a collection of nodes (Master + Workers). The Master node is responsible for assigning coordinating tasks via a Jobtracker role. Hadoop has to basic layers:
    1. The HDFS layer: The Hadoop Distributed File System maintains consistency of data distributed across a large number of data nodes. Large files are distributed across the cluster and managed via a metadata server known as the Primary Namenode. Each datanode serves up data over the network using a proprietary block protocol. HDFS maintains a number of High Availability features including replication and rebalancing of data across nodes. A major advantage of HDFS is location awareness, where nodes are scheduled to run computational processes for data that is situated close to the nodes, thereby reducing network traffic.
    2. The Map Reduce layer: The Processing logic of MapReduce consists of the Map function and the Reduce function. The Map function applies a transformation to a list and returns an attribute value pair (ie. result,1). The Reduce function then concatenates the list into a string.
    3. Additional Components: Hadoop is commonly implemented with a number of additional services. We’re listing the most common components here:
      1. Pig: Pig is a scripting language for creating MapReduce queries.
      2. Hive: Hive is a data query infrastructure
      3. Squoop: Squoop is a Relational Database connector combined with data analysis tools that allows connectivity into a company’s Business Intelligence layer.
      4. Scheduling: Scheduling tools such as Facebook’s Fair Scheduler and Yahoo’s Capacity Scheduler allow users to prioritize jobs and implement some degree of Quality of Service.
      5. Other tools: A number of other tools are available for managing Hadoop and include HCatalog, a table management service for access to Hadoop data and the Ambari monitoring and management console.
  2. Batch Processing: Hadoop fundamentally uses a batch processing system to manage data. Processing is typically divided up into the following steps:
    1. Data is divvied up into small units and distributed across a cluster
    2. Each data node receives a subset of data and applies map and reduce functions to locally stored data/cloud storage
    3. Jobtracker coordinates jobs across the cluster
    4. Data may be processed in a workflow where outputs of one map/reduce pair become inputs for the next
    5. Data results may be applied to additional analysis/ reporting or BI tools
  3. Hadoop Distributions:  Hadoop was originally designed to work on the Apache platform and has very recently (Circa. October 2012) been released by Microsoft as Microsoft HDInsight Server for Windows and the Windows Azure HDInsight Service for the cloud. Other large vendor support for Hadoop includes the Oracle Big Data appliance which integrates with Cloudera’s distribution of Apache Hadoop; and Amazon’s AWS Elastic Map Reduce service for the cloud and Google’s AppEngine-MapReduce on Google App Engine.

Latest Trends in Government

Now that we’ve covered some basics on Big Data, we are now ready to explore common implementations in the government sector around the world. Large governments led the charge for Big Data implementations, with an excess of 160 large Big Data programmes being pursued by the US Government alone.

  • Search Engine Analytics: A pressing need to search vast amounts of data made publicly available by recent policy changes has seen a great practical application for Hadoop and Hive. For example, the UK government uses Hadoop to pre-populate relevant and possible search terms when a user types into a search box.
  • Digitization Programs: The cost implications for ‘going digital’ are large, and regulators are taking notice, with some estimates that online transactions can be 20 times cheaper than by phone, 30 times cheaper than by face-to-face, and up to 50-times cheaper than by post (link).For example, the UK government stated in it’s November 2012 Government Digital Strategy that it can make up to £1.2 Billion  by the year 2015 just by making public services digital by default. A number of  large government bodies have been tasked with identifying large volume transactions (>100,000 a year) that can be digitized. Successful digitization requires a number of key movements:
    • Non-exclusive policies: Bodies or groups that do not have the capabilities to go digital must not be penalized. This means that the choice to go digital should be open. Users who are not familiar with accessing digital information should also be given alternative mechanisms such as contact centers.
    • Consolidation of processes: A number of governments are moving closer towards a single consolidated online presence. For example, the U.K. government is consolidating all publishing activities across all 24 UK central government websites to the GOV.UK website. The consolidation of information without incurring any performance penalties requires the standardization to common platforms and technologies.
  • Large Agency initiatives: The largest agencies and ministries are spearheading programs on Big Data, with applications in Health, Defense, Energy and Meteorology taking on significant interests:
    •  Health Services: The US center for Medicare and Medicaid services (CMS) is developing a datawarehouse based on Hadoopto support analytic and reporting requirements of Medicare and Medicaid programs. The US National Institute for Health (NIH) is developing a the Cancer Imaging Archive, an image data-sharing service that leverages imaging technology used in assessment of therapeutic responses to treatment.
    • Defense: The US Department of Defense listed 9 major projects in a March 2012 Whitehouse paper on the adoption of Big Data anlysis across the government. Major applications involved Artificial Intelligence, Machine Learning, Image and Video recognition and Anomaly detection.
    • Energy: The US Department of Energy is investing in research on it’s Next Generation Networking program to move large datasets (>1 petabyte per month) for the Open Science Grid, ESG and Biology communities.
    • Meteorology: The US National Weather Service uses Big Data in their modeling systems to improve Tornado forecasting systems.Modern weather prediction systems utilize vast amounts of data collected from ground sources and a geostationary orbiting satellite planned to be launched in 2014 and as weather conditions are constantly changing, the need for rapid processing of high velocity is paramount to these systems.

Strategic Value

Big Data is transformative in the sense that it provides us with an opportunity perform deep meaningful analysis of information beyond what is normally available. The idea is that with more information at our fingertips, we can make better decisions.

Positive Implications

  • Greater Transparency: Big data has the opportunity to provide greater access to data by making data more frequently accessible to greater constituencies of people.
  • More opportunities for enhancing performance: By providing users with access to not only greater amounts of data, but also greater varieties of data, we create more opportunities to identify patterns and trends by connecting information from more sources, leading us to capitalize on opportunities and expose threats. This results in an overall enhanced quality of decision making that could potentially lead to greater performance.
  • Better Decisions: By allowing systems to collect more data and then applying Big Data analysis techniques to draw meaningful information from these data sets, we can make better, more timely and informed decisions.
  • Greater segmentation of stakeholders: By exposing our analytics to greater pools of raw data, we can find interesting ways to segment our constituents, identifying unique patterns at a more granular level and devising solutions and services to meet these needs. For example, we can use Big Data to analyze the Elderly living in a particular part of a city that are alone, have a unique medical condition requiring specialist care, and use this information to manage staffing and service avalability for these users.

Negative Implications

  • Big Brother: Governments are sensitive to the perception of using data to investigate and monitor the individual   and the storing and analysis of data by government has long had a strong reaction in the public eye. However, the enactment of information transparency legislation and freedom of information policies, together with the formation of public watchdog sites have led to an encouraging environment for governments to pursue Big Data.
  • Implementation Hurdles: Implementing Big Data requires a holistic effort beyond adopting a new technology. The task of effectively identifying data that can be combined and analyzed; to securely managing the data over it’s lifetime must be carefully managed.

Where to Start?

We’ve distilled a number of important lessons from around the web that could guide your Big Data implementation:

  • Focus first on requirements: Decision makers are encouraged to look for the low hanging fruit, in other words, situations that have a pressing need for Big Data solutions. BIg Data is not a silver bullet and target implementations should be evaluated thoroughly.
  • Start small: Care should be taken to manage stakeholder expectations before Big Data takes on the image of a large disruptive technology i the workplace. Focusing on small pilot projects that show tangible and visible benefits are the best way to go and often pave the way for much larger projects down the line. Often, extending the pipeline for Big Data projects allows technology stakeholders time to get over the learning curve of adoption.
  • Reuse infrastructure: Big Data technologies can happily coexist on conventional infrastructure. In fact, Big Data implementations can happily coexist with Relational Database Systems in existing IT environments.
  • Obtain high-level support: Big Data sees the greatest benefits in terms of performance and cost savings when combining different systems. But with this type of endeavor comes greater complexity and risks from differing priorities. Managing this challenge requires the appointment of senior stakeholders who can align priorities and provide the necessary visibility for forward movement.
  • Push for standardization and educate decision makers: The Policy Exchange, a UK think tank recommends that “… public sector leaders and policymakers are literate in the scientific method and confident combining big data with sound judgment.”
  • Address Ethical Issues first: A major obstacle to adopting Big Data is the pressure from groups of individuals who wish not to be tracked, monitored or singled out. Governments should tackle this issue head on by developing a code for responsible analytics

Useful Links

Information week article on Microsoft’s Big Data strategy here.
UPenn Research Paper > Development of Big Data here.
Research Trends Report on the evolution of Big Data as a Research topic here.
Cloudera whitepapers on Government Implementations here.
Article on Big Data’s success in Government here.
Article: UK govt. in talks to use Hadoop here.
Paper: UK Government Digital Strategy here.
Paper: US Federal Government Big Data Strategy here.
Article: Big data in government here.
Article: National Weather Service using Big Data here.
Research: Mckinsey Global Institute paper on Big Data here.
Report: Policy Exchange Report on Big Data here.

 

Technology in Government – Cloud Computing

Executive Brief

A number of governments have implemented roadmaps and strategies that ultimately require their ministries, departments and agencies to default to Cloud computing solutions first when evaluating IT implementations. In this article, we evaluate the adoption of cloud computing in government and discuss some of the positive and negative implications of moving government IT onto the cloud.

Latest Trends

In this section, we look at a number of cloud initiatives that have been gaining leeway in the public sector:

  • Office Productivity Services – The New Zealand Government has identified office productivity services as the first set of cloud-based services to be deployed across government agencies. Considered to be low hanging fruit and fueled by successes in migrating perimeter services like anti-spam onto the cloud, many organizations see email and collaboration as a natural next step of cloud adoption. Vendors leading the charge include Microsoft’s Office 365 for Government, with successful deployments including Federal Agencies like the USDA, Veterans Affairs, FAA and the EPA as well as the Cities of Chicago, New York and Shanghai. Other vendor solutions include Google Apps for Government which supports the US Department of the Interior.
  • Government Cloud Marketplaces – A number of governments have signified the need to establish cloud marketplaces, where a federated marketplace of cloud service providers can support a broad range of users and partner organizations. The UK  government called for the development of a government-wide Appstore, as did the New Zealand Government in a separate cabinet paper on cloud computing in August 2012. The US government has plans to establish a number of cloud services marketplaces, including the GSA’s info.apps.gov and the DOE’s YOURcloud, a secure cloud services brokerage built on Amazon’s EC2 offering. (link) The image below lists the initial design for the UK government App store.
    03 UK App Store
  • Making Data publicly available  – The UK Government is readily exploiting opportunities to make available the Terabytes of public data that can be used to develop useful applications. The recent release of Met Office UK Weather information to the public via Microsoft Azure’s cloud hosting platform. (link)
  • Government Security Certification – A 2012 Government Cloud Survey conducted by KPMG listed security as the greatest concern for governments when it comes to cloud adoption and that governments are taking measures to manage security concerns. For example, the US General Services Administration subjects each successful cloud vendor to a battery of tests that include an assessment of access controls.

01a Canada Mappings

Canadian Government Cloud Architectural Components

Strategic Value

The strategic value of cloud computing can be summed up into a number of key elements in government. We’ve listed a few that appear on the top of our list:

  • Enhancing agility of government – Cited as a significant factor in cloud adoption, cloud computing promises rapid provisioning and elasticity of resources, reducing turnaround times on projects.
  • Supporting government policies for the environment – The environmental impact due to reduced data center spending and consumption of energy on cooling has tangible environmental benefits in terms of reduced greenhouse gas emissions and potential reductions in allocations of carbon credits.
  • Enhancing Transparency of government – Cloud allows the developed of initiatives that can make government records accessible to the public, opening up tremendous opportunities for innovation and advancement.
  • Efficient utilization of resources – By adopting a pay-for-use approach towards computing, stakeholders are encouraged to architect their applications to be more cost effective. This means that unused resources are freed up to the common pool of computing resources.
  • Reduction in spending – Our research indicated this particular element is not considered to be a significant aspect of moving to cloud computing according to technology decision makers, however some of the numbers being bandied about in terms of cost savings are significant (Billions of dollars) and can appeal to any constituency.

Positive Implications

We’ve listed a number of positive points towards cloud adoption. These may not be relevant in every use case, but worthwhile for a quick read:

  • Resource Pooling – leads to enhanced efficiency, reduced energy consumption and more economical cost savings from scale
  • Scalability – Unconstrained capacity allows for more agile enterprises that are scalable, flexible and responsive to change
  • Reallocation of human resources – Freed up IT resources can focus on R&D, designing new solutions that are optimized in cloud environments and decoupling applications from existing infrastructures.
  • Cost containment – Cloud computing requires the adoption of a ‘you pay for what you use’ model, which encourages thrift and efficiency. The transfer of CAPEX to OPEX also smoothes out cash-flow concerns  in an environment of tight budgets.
  • Reduce duplication and encourage re-use – Services designed to meet interoperability standards can be advertised in a cloud marketplace and become building blocks that can be used by different departments to construct applications
  • Availability – Cloud architecture is designed to be independent of the underlying hardware infrastructure and promotes scalability and availability paradigms such as homogeneity and decoupling
  • Resiliency – The failure of one node of a cloud computing environment has no overall effect on information availability

Negative Implications

A sound study should also include a review of the negative implications of cloud computing:

  • Bureaucratic hinderances – when transitioning from legacy systems, data migration and change management can slow down the “on demand” adoption of cloud computing.
  • Cloud Gaps – Applications and services that have specific requirements which are unable to be met by the cloud need to be planned for to ensure that they do not become obsolete.
  • Risks of confidentiality – Isolation has been a long-practiced strategy for securing disparate networks. If you’re not connected to a network, there’s no risk of threats getting in. A common cloud infrastructure runs the risk of exploitation that can be pervasive since all applications and tenants are connected via a common underlying infrastructure.
  • Cost savings do not materialize – The cloud is not a silver bullet for cost savings. We need to develop cloud-aligned approaches towards IT provisioning, operations and management. Applications need to be decoupled and re-architected for the cloud. Common services should be used in order to exploit economies of scale; applications and their underlying systems need to be tweaked and optimized.

05 Cloud Security concerns

Security was cited as a major concern (KPMG)

Where to start?

There is considerable research that indicates government adoption of cloud computing will accelerate in coming years. But to walk the fine line of success, what steps can be taken? We’ve distilled a number of best practices into the following list:

00 USG Roadmap

  1. Develop Roadmaps:  Before Cloud Computing can reap all of the benefits that it has to offer, governments must first move along a continuum towards adoption. For that very purpose, a number of governments have developed roadmaps to aid in developing a course of progression towards the cloud. Successful roadmaps featured the following components:
    • A technology vision of Cloud Computing Strategy success
    • Frameworks to support seamless implementation of federated community cloud environments
    • Confidence in Security Capabilities – Demonstration that cloud services can handle the required levels of security across stakeholder constituencies in order to build and establish levels of trust.
    • Harmonization of Security requirements – Differing security standards will impede and obstruct large-scale interoperability and mobility in a multi-tenanted cloud environment, therefore a common overarching security standard must be developed.
    • Management of Cloud outliers – Identify gaps where Cloud cannot provide adequate levels of service or specialization for specific technologies and application and identify strategies to deal with these outliers.
    • Definition of unique mission/sector/business Requirements (e.g. 508 compliance, e-discovery, record retention)
    • Development of cloud service metrics such as common units of measurement in order to track consumption across different units of government and allow the incorporation of common metrics into SLAs.
    • Implementation of Audit standards to promote transparency and gain confidence
  2. Create Centers of Excellence: Cloud Computing Reference Architectures; Business Case Templates and Best Practices should be developed so that cloud service vendors should map their offerings to (i.e. NIST Reference Architecture) so that it is easier to compare services.
  3. Cloud First policies: Implementing policies that mandate all departments across government should consider cloud options first when planning for new IT projects.

Conclusion

The adoption of cloud services holds great promise, but due to the far reaching consequences necessitated by the wide-spread adoption of cloud to achieve objectives such as economies of scale, a comprehensive plan compounded with standardization and transparency become essential elements of success.

We hope this brief has been useful. Ook!

Useful Links

Microsoft’s Cloud Computing in Government page
Cisco’s Government Cloud Computing page
Amazon AWS Cloud Computing page
Redhat cloud computing roadmap for government pdf
US Government Cloud Computing Roadmap Vol 1.
Software and Information Industry updates on NIST Roadmap
New Zealand Government Cloud Computing Strategy link
A
ustralian Government Cloud Computing Strategic Direction paper
Canadian Government Cloud Computing Roadmap
UK Government Cloud Strategy Paper
GCN – A portal for Cloud in Government
Study – State of Cloud Computing in the public sector

Technological Transformation in Government

Inauguration Obama

Photo (c) A/P Sandy Huffaker

Foreword

We live in an exciting juncture when the world is undergoing massive and visible transformation. The Internet has given us instant access to information and it has affected how we do things on a global scale. Our children go to school and interact with knowledge in ways that we could have never imagined before; while demand and supply interact within virtual, global marketplaces where consumers are informed and empowered and suppliers are intelligent and efficient. Yet there is no place where the impacts of technology are more visibly felt than in the Public Sector, where technology may be deployed to serve an informed electorate with high expectations, demanding services and efficiency at an ever-accelerating pace.

Brief

In this series of articles, I will explore a number contemporary issues that Technology decision makers in Government are concerned with and also look into innovative, viable solutions that have been successfully implemented in a number of countries to solve or address these concerns.

  • Cloud Computing – While cloud technology promises to delivery significant cost savings from economies of scale and cut down on deployment costs, cloud has been traditionally shunned by governments for a number of reasons, including security and confidentiality. In recent years, a number of vendors have developed Government Clouds that are designed to integrate with existing Government networks and systems, while meeting government needs for compliance and security.
  • Big Data – Big Data refers to data sets that are so large that they become difficult to manage using traditional tools. With the proliferation of e-government initiatives, governments word-wide face significant challenges in managing vast repositories of information.
  • Open Source and Interoperability – Government’s ability to adopt and enhance open standards that encourage interoperability between different systems and establish an environment of equal opportunities among technology vendors, partners and end-users.
  • Digital Access – The Internet has redefined access to knowledge and learning and it is a priority for governments to ensure that students from all walks of life are not limited in opportunity due to poor access to the web. Here we explore how technology is transforming big cities and communities alike in accessing the web.
  • Mobility and Telecommuting – Governments worldwide are embracing  telecommuting and flex-time work policies as a viable long-term solution to reducing costs and energy consumption. We explore technologies that foster collaboration and productivity for a mobile workforce.
  • Cyber Security – With the call for increased vigilance against acts of cyber terrorism, we explore the extent that governments are prepared to do in order to maintain Confidentiality, Integrity and Availability amidst an increasingly connected ecosystem of public-sector employees, vendors, contractors and other stakeholders.
  • Open Government – Governments are heeding the call for greater transparency, public participation and collaboration by making information more readily available on government websites and also providing the public with greater access for providing feedback and commentary. This had led to the adoption of new technologies and innovations to ensure that confidentiality is not sacrificed in the light of new policies
  • Connected Health and Human Services – Case management, health records management and health benefits administration are but a few components of government services that many lives depend on to function effectively and efficiently. We will explore technologies that are transforming these services.
  • Accessibility – In an age of information workers, support for differently abled employees has become a source of competitive advantage, enabling governments to tap into additional segments of the workforce.
  • Defense and Intelligence – Technology has long played a vital role in ensuring that vital battlefield decisions can be made with timely access to information; communications occurs unimpeded in times of emergency; and cost efficiencies can me maximized in times of tightening budgets.

Dimensions of Exploration

Essential to any well-thought out study, we must consider important attributes such as the long-term implications, return on investment and practicality of implementation. Therefore, for each of the issues listed above, we will include in our analysis the following components:

  • Executive Brief
  • Latest Trends
  • Strategic Value
  • Positive Implications
  • Negative Implications
  • Proposed Solutions
  • Reference Implementations
  • Useful Links

Topics

An individual article has been dedicated to each of the following topics; please click on each one for further reading:

  • Cloud Computing
  • Big Data
  • Open Source and Interoperability
  • Digital Access
  • Mobility and Telecommuting
  • Cyber Security
  • Open Government
  • Connected Health and Human Services
  • Accessibility
  • Defense and Intelligence

* This series is a work in progress, and does not support a particular thesis or ideal. It simply reflects research of the solutions that have been devised to solve frequently unique problems and does not reflect an endorsement of a particular technology or ideal.

Why write about Government?

I’ve spent a significant amount of time consulting for government and in truth, nothing has given me greater pleasure than to see the benefits of technology impact my selfless friends and colleagues who have made the altruistic decision to stay in government in order to serve the greater good. These unsung heroes maintain the systems that support our health, education, defense, civil, social and legal infrastructure and many other essential functions of government, which many lives may depend on.

Web Analytics Primer: 1,2,3 Analyze!

Introduction:

Since the beginning of the Internet, when people started visiting websites and (hopefully) buying stuff online, businesses have wanted to know exactly what people were doing in their virtual web storefronts. From the humble page view counter, to cookies and tracking tools, the industry of web analytics was born.

Today, Web Analytics is a well established tool for obtaining knowledge and insights of the vistors to our websites by tracking the behavior of visitors as they click through from page to page. The demand for Web Analytics is growing, with an estimated current value of US $600 million and growing in the double digits every year according to a number of online sources (see 2009 study by Forrester Research). Web Analytics comes in a wide variety of solutions, ranging from simple modules that plug into full-blown Business Intelligence suites, to Cloud-based service-oriented analytics tools.

Your humble Roadchimp was recently given the task of advising a close friend and entrepreneur on devising a Web Analytics solution for a growing online business. And so I decided it would be an excellent opportunity to develop a simple primer for our  lovely primate readers.

Further Reading:

Rather than going into too much detail about Web Analytics, I’m providing a quick link to the Wikipedia page that contains a great overview of the topic. You’re one quick click away from obtaining some basic definitions of commonly used terms as well as links to some of the more popular Web Analytics platforms out in the market.

Scenario:

SwingFromLimb (a.k.a. Swing) is a rapidly growing online social tool that helps active-minded chimps to find sports facilities closer to them. Swing’s customers come from all walks of life and predominantly are primates who are interested in sports activities of all types, from Tree Canopy Judo to Prehensile Yoga and Banana Kickboxing. Swing connects activity enthusiasts to sports facility owners by listing thousands of classes online that users sign up for via the Swing website or App. As part of it’s enhanced service offering, Swing offers a club management application to facility managers to help them market their classes to online users.

Goals:

Swing wants to analyze the online behavior of its users in order to identify useful information, such as the types of classes that are more popular in a particular part of the jungle, the times of day that chimps book their classes online as well as seasonal variations in class attendance. This can happen especially in  the New Year when holy Saint Chimpolous rides up from the tropics in his banana sled and slides down our chimneys to deliver overripe fruit while consuming all of our household cleaners; and the ensuing hordes of soft-bellied and guilt-ridden chimps making a beeline for their nearest gym after the holidays.  (yes… serious readers I wrote a funny)

Details that the Swing management would like to obtain about their customers are listed below:

  • Methods of Accessing Content: The types of web browser, screen resolution, language and plug-in support (Java, Flash etc) used to access the site or App.
  • Location of Visitors: Using IP Address filtering and location awareness, we can determine which geographical locations users are connecting to the website from as well as which mobile service providers they may be using.
  • In-Site Behavior: Which pages users click to most frequently; how long users tend to dwell on a page before clicking to the next page and also what pages users visit last before leaving the site.
  • Access Patterns: Access behavior for each user based on criteria such as the time of day; geographical location (home or office) and days of the week.
  • Web Referrals: Which search engines, blogs or websites are users referred to when entering the website.

Google Analytics:

Our task is develop a prototype Analytics solution for Swing to allow management to get an idea of what a common Web Analytics solution can offer. After this evaluation phase, we can provide a recommendation on the most cost effective and scalable solution for the company. Looking at the widespread number of tools out there, we’ve decided to start with Google Analytics.

This is a free-to-use service provided by Google and at last count is the analytics platform of choice on over 17 million web sites. Google Analytics was originally developed by the Urchin Software Corporation which was acquired by Google in 2005. The product is delivered as a freemium service and features integration with Google AdWords and requires a number of cookies to be deployed on the end user’s computer.

Implementing Google Analytics

To successfully implement a web analytics platform, the company recommends a simple 3-step process on the Google Analytics website as follows:

  1. Sign up for Google Analytics
  2. Add tracking code
  3. Learn about your audience

01 Signup

We shall cover these steps in detail below:

1. Sign up for Google Analytics

The first step involves signing in with a Google Analytics account (If you don’t have a Google account already) and putting in some of your website’s details. One interesting note is that you have a choice of anonymizing your data and sharing it with other users and also on whether to enable integration with other Google services.
03a Sharing Example

Prior to signing up for the service, you have to agree to the Google Analytics Terms of Service, which among other things, declares that this is a free service for up to 10 million hits.

2. Add Tracking Code

Once you have completed the initial setup, a unique tracking code is generated for your website and you are directed to the Google Analytics dashboard where you can perform some basic configuration settings, as well as obtain a tracking code that will need to be inserted into the HTML code of your website.

05 Dashboard

The tracking code is basically a string of text and looks like this:

06 Tracking Code

The tracking code can be implemented in three different ways

  • Static Implementation: The HTML code is copied and inserted into the header of each HTML page.
  • PHP implementation: You create a file called analyticstracking.php that contains the tracking code block. This page is included in each PHP template page
  • Dynamic Content Implementation: In order to implement Google Analytics, we can follow the basic process listed above, and reference our code block through an include or template reference when each page is called by a browser.

That’s the hardest part of the work done, since there are no specific firewall requirements to configure on your servers as it is your website’s visitors end devices that connect to the Google App servers.

The following web URLs are used to pass page statistics to Google’s servers.

https://ssl.google-analytics.com/__utm.gif
http://www.google-analytics.com/__utm.gif
http://www.google-analytics.com/ga.js
http://ssl.google-analytics.com/ga.js

3. Learn about your audience

This next step is all about implementation and once the tracking code is configured, we can start to analyze data about our users. A quick login to the dashboard shows us interesting data about users and we have the ability to create customizable reports of a multitude of things including the amount of time that Android users versus iPhone users spent on the site.

Screen Shot 2013-02-07 at 1.28.26 PM

So how exactly does Google Analytics work? Once we’ve configured Swing’s web pages to serve up the tracking code, Google Analytic’s servers will start collecting data about users visiting our site. Let’s explore this mechanism briefly:

Google uses three sources to obtain information about visitors to our sites:

  • HTTP requests: A common HTTP request from a visitor’s browser typically contains information about the type of browser, referrer (which site the users coming from) and also language
  • Browser/system information: The Document Object Model (DOM) is a format of organizing pages via a tree structure that was developed and standardized by the World Wide Web Consortium (W3C) and supported by most browsers. Using DOM, Google’s servers are able to extract detailed browser information such as flash and java support, screen resolution and keyboard support.
  • Cookies: Google uses First Party Cookies which contain small amounts of information about the individual user’s current session on a website and can be passed from one page to another as the user browses a website.

The tracking code works according to the following steps: (Taken from Google Analytic’s site for developers)

  1. A browser requests a web page that contains the tracking code.
  2. A JavaScript Array named _gaq is created and tracking commands are pushed onto the array.
  3. <script> element is created and enabled for asynchronous loading (loading in the background).
  4. The ga.js tracking code is fetched, with the appropriate protocol automatically detected. Once the code is fetched and loaded, the commands on the_gaq array are executed and the array is transformed into a tracking object. Subsequent tracking calls are made directly to Google Analytics.
  5. Loads the script element to the DOM.
  6. After the tracking code collects data, the GIF request is sent to the Analytics database for logging and post-processing.

In a nutshell, the JavaScript that is pasted into the HTML pages on your website instructs the visitor’s browser to connect to Go0gle Analytic’s servers and download some additional code. The code is stored in the browser’s DOM tree and continues to push information to Google’s Analytics server whenever a user performs an action on their browser while viewing your site.

A variety of actions can be tracked by visitors and I’ve included a list below to give you an idea of what’s possible:

  • Loading a page
  • An Event such as
    • Playing a video
    • Downloading a file
    • Clicking on a button
    • Hovering the mouse on the screen
  • An e-commerce transaction on your website
    • Adding an item to a shopping cart
    • Transaction details such as Transaction ID
  • Customized parameters
    • A member logs in with special privileges

By collecting each of these events, we can start to understand fairly accurately what a user does on each page visit. For example, a user loads the website from their Android powered phone at 4pm on a tuesday afternoon, while sitting in a busy commercial part of the city. After scrolling down the main page for 2 minutes, she searches for a particular gym that she visited 2 months ago for available classes that day. She quickly finds a class and signs up online, paying for the class using her stored account information.

Making Sense of the data

So what do we do with this information? Well, for an individual user, not much really, as Web Analytics packages do not present personal information about users visiting the site. On the other hand, if we have hundreds or thousands of users connecting to a site over time, we can start analyzing the data to identify trends that may be useful to the business. For example, if we’re seeing an upward trend in the number of visitors searching for a particular product or service in a given area, we might be able to find similar products and services to cater to that demand, or in the case of Swing’s management, this might mean advising facilities owners in that region to offer more of a type of product or service. No wonder, over 30 million websites on the Internet use some form of Web Analytics.

Conclusion:

In this article, we went over a simple implementation of a Web Analytics tool and also explored the basic mechanics of Google Analytics. In future posts in this series, we will look into performing some customization and also collecting best practices and recommendations for users interested in implementing a Web Analytics tool.

References:

http://antezeta.com/news/what-google-knows-about-google
http://searchenginewatch.com/article/2239469/How-to-Use-Google-Analytics-Advanced-Segments
http://techcrunch.com/2013/01/28/google-makes-using-analytics-easier-with-new-solution-gallery-for-dashboards-segments-custom-reports/
http://analytics.blogspot.co.uk/
http://blog.kissmetrics.com/50-resources-for-getting-the-most-out-of-google-analytics/
http://www.grokdotcom.com/2009/02/16/the-missing-google-analytics-manual/