Technology in Government -Big Data

Executive Brief

In this article, we continue our series on technology in government by reviewing Big Data. We plan to review the impact of Big Data in the Government and common applications of technologies to manage this issue. First of all, let’s look at some basic definitions and define the scope of this article.

What is big data?

While Roger Magoulis of O’Reilly Media is most commonly credited for coining the term “Big Data” back in 2005 and launching it into the mainstream of consciousness, the term has been floating around for a number of years. (researchers found examples dating from the mid-1990s in Silicon Graphics (SGI) Slide Decks) Nevertheless, Big Data basically refers to data sets that are so large to the extent that their size becomes an encumbrance when trying to  manage and process the data using traditional data management tools.

According to IBM, we create 2.5 quintillion bytes of data each day and is commonly described by three characteristics:

  • Volume: Big Data refers to large amounts of data that is generated across a variety of applications and industries. At the time of this article, the order of magnitude from 100s of GB to Terabytes and Petabytes of data could easily qualify under the definition.
  • Variety: With a wide and disparate number of sources of Big Data, the data can be structured (like a database), semi-structured (indexed) or unstructured.
  • Velocity: The data is generated at high speeds, and needs to be processed in relatively short durations (seconds).

Why is big data important?

Big data conveys an important shift in how we interpret data to look for meaningful things in the world. The advent of Social Networking and E-commerce brought about a need for suppliers of rapidly non-differentiated online services to learn about the behavior of online users in order to tailor a superior user experience. Some of the most successful companies in the World (Hint.. starts with the letter ‘G) have based their entire business models on delivering customized ads to users based on their search queries. Prominent research projects such as NASA’s SETI (Search for Extra-terrestrial Intelligence) and Mars Rover projects; and the Human Genome sequencing program also called for similar needs:

The ability to perform lightning speed computational processes on extremely large sets of data that were also subject to frequent changes.

The challenges of traditional data management tools

The problem with conventional approaches towards managing data was that the data primarily had to be structured. Picture a database that  supports the catalog of a conventional online e-commerce website and holds hundreds and thousands of items. The database is structured and relational, meaning that each item put up for sale on the site can be stored as an object and described by a number of attributes, including the name of the item, the item’s SKU number, category, pricedescription, etc. For each item that we load onto the database ,we can perform searches according to product categories and descriptions and even sort the products by price. This is great and also efficient, because almost every object in the database will have the same types of attributes. Relational Database technologies such as SQL, Oracle etc. are great at handling this and are still very much in use today.

The problem we encounter when it comes to handling Big Data is that the data is subject to frequent change. With a Relational system, we need to define a structure or schema ahead of time. That’s not a big problem with an Online Shopping Cart database, since most items have the same attributes as described above. But what if we don’t know the types of attributes of the data we’re planning to store? Let’s imagine that we have a service that crawls the Web for Real Estate websites in a particular region. The objective is to build up an aggregated repository of information about properties for sale or rent that users can query.  Very frequently, the data that is being collected can be in a variety of sizes and types. For example, we could have HTML files, media files (JPEGs and MPEGs) or even strings of characters. In some cases it may be impossible to build a structure ahead of time, because we simply don’t know what’s out there.

So what happens each time we need to change the structure of a relational database? Rolling out schema changes for a database is a potentially complex, time and resource-intensive process and has a definite performance impact on the database during the change. Conventional solutions such as adding more computing resources or splitting up the database into shards are feasible, but do not fundamentally change how the data is being managed.

Solution: Big Data Technology

In the previous section, we explored the need for corporations and organizations to manage increasingly large amounts of data as well as the ineffectiveness of existing Database Management systems in dealing with these large data sets . In this section, we will briefly cover the most commonly deployed solutions in the industry for Big Data management.

Hadoop: Some industry executives have likened Hadoop to the brand “Kleenex”, meaning to say that Hadoop is synonymous with Big Data. Hadoop was largely developed at Yahoo and named after the toy elephant of a researcher’s son. Hadoop’s mechanism and components are described briefly:

  1. Distributed Cluster Architecture: Hadoop comprises of a collection of nodes (Master + Workers). The Master node is responsible for assigning coordinating tasks via a Jobtracker role. Hadoop has to basic layers:
    1. The HDFS layer: The Hadoop Distributed File System maintains consistency of data distributed across a large number of data nodes. Large files are distributed across the cluster and managed via a metadata server known as the Primary Namenode. Each datanode serves up data over the network using a proprietary block protocol. HDFS maintains a number of High Availability features including replication and rebalancing of data across nodes. A major advantage of HDFS is location awareness, where nodes are scheduled to run computational processes for data that is situated close to the nodes, thereby reducing network traffic.
    2. The Map Reduce layer: The Processing logic of MapReduce consists of the Map function and the Reduce function. The Map function applies a transformation to a list and returns an attribute value pair (ie. result,1). The Reduce function then concatenates the list into a string.
    3. Additional Components: Hadoop is commonly implemented with a number of additional services. We’re listing the most common components here:
      1. Pig: Pig is a scripting language for creating MapReduce queries.
      2. Hive: Hive is a data query infrastructure
      3. Squoop: Squoop is a Relational Database connector combined with data analysis tools that allows connectivity into a company’s Business Intelligence layer.
      4. Scheduling: Scheduling tools such as Facebook’s Fair Scheduler and Yahoo’s Capacity Scheduler allow users to prioritize jobs and implement some degree of Quality of Service.
      5. Other tools: A number of other tools are available for managing Hadoop and include HCatalog, a table management service for access to Hadoop data and the Ambari monitoring and management console.
  2. Batch Processing: Hadoop fundamentally uses a batch processing system to manage data. Processing is typically divided up into the following steps:
    1. Data is divvied up into small units and distributed across a cluster
    2. Each data node receives a subset of data and applies map and reduce functions to locally stored data/cloud storage
    3. Jobtracker coordinates jobs across the cluster
    4. Data may be processed in a workflow where outputs of one map/reduce pair become inputs for the next
    5. Data results may be applied to additional analysis/ reporting or BI tools
  3. Hadoop Distributions:  Hadoop was originally designed to work on the Apache platform and has very recently (Circa. October 2012) been released by Microsoft as Microsoft HDInsight Server for Windows and the Windows Azure HDInsight Service for the cloud. Other large vendor support for Hadoop includes the Oracle Big Data appliance which integrates with Cloudera’s distribution of Apache Hadoop; and Amazon’s AWS Elastic Map Reduce service for the cloud and Google’s AppEngine-MapReduce on Google App Engine.

Latest Trends in Government

Now that we’ve covered some basics on Big Data, we are now ready to explore common implementations in the government sector around the world. Large governments led the charge for Big Data implementations, with an excess of 160 large Big Data programmes being pursued by the US Government alone.

  • Search Engine Analytics: A pressing need to search vast amounts of data made publicly available by recent policy changes has seen a great practical application for Hadoop and Hive. For example, the UK government uses Hadoop to pre-populate relevant and possible search terms when a user types into a search box.
  • Digitization Programs: The cost implications for ‘going digital’ are large, and regulators are taking notice, with some estimates that online transactions can be 20 times cheaper than by phone, 30 times cheaper than by face-to-face, and up to 50-times cheaper than by post (link).For example, the UK government stated in it’s November 2012 Government Digital Strategy that it can make up to £1.2 Billion  by the year 2015 just by making public services digital by default. A number of  large government bodies have been tasked with identifying large volume transactions (>100,000 a year) that can be digitized. Successful digitization requires a number of key movements:
    • Non-exclusive policies: Bodies or groups that do not have the capabilities to go digital must not be penalized. This means that the choice to go digital should be open. Users who are not familiar with accessing digital information should also be given alternative mechanisms such as contact centers.
    • Consolidation of processes: A number of governments are moving closer towards a single consolidated online presence. For example, the U.K. government is consolidating all publishing activities across all 24 UK central government websites to the GOV.UK website. The consolidation of information without incurring any performance penalties requires the standardization to common platforms and technologies.
  • Large Agency initiatives: The largest agencies and ministries are spearheading programs on Big Data, with applications in Health, Defense, Energy and Meteorology taking on significant interests:
    •  Health Services: The US center for Medicare and Medicaid services (CMS) is developing a datawarehouse based on Hadoopto support analytic and reporting requirements of Medicare and Medicaid programs. The US National Institute for Health (NIH) is developing a the Cancer Imaging Archive, an image data-sharing service that leverages imaging technology used in assessment of therapeutic responses to treatment.
    • Defense: The US Department of Defense listed 9 major projects in a March 2012 Whitehouse paper on the adoption of Big Data anlysis across the government. Major applications involved Artificial Intelligence, Machine Learning, Image and Video recognition and Anomaly detection.
    • Energy: The US Department of Energy is investing in research on it’s Next Generation Networking program to move large datasets (>1 petabyte per month) for the Open Science Grid, ESG and Biology communities.
    • Meteorology: The US National Weather Service uses Big Data in their modeling systems to improve Tornado forecasting systems.Modern weather prediction systems utilize vast amounts of data collected from ground sources and a geostationary orbiting satellite planned to be launched in 2014 and as weather conditions are constantly changing, the need for rapid processing of high velocity is paramount to these systems.

Strategic Value

Big Data is transformative in the sense that it provides us with an opportunity perform deep meaningful analysis of information beyond what is normally available. The idea is that with more information at our fingertips, we can make better decisions.

Positive Implications

  • Greater Transparency: Big data has the opportunity to provide greater access to data by making data more frequently accessible to greater constituencies of people.
  • More opportunities for enhancing performance: By providing users with access to not only greater amounts of data, but also greater varieties of data, we create more opportunities to identify patterns and trends by connecting information from more sources, leading us to capitalize on opportunities and expose threats. This results in an overall enhanced quality of decision making that could potentially lead to greater performance.
  • Better Decisions: By allowing systems to collect more data and then applying Big Data analysis techniques to draw meaningful information from these data sets, we can make better, more timely and informed decisions.
  • Greater segmentation of stakeholders: By exposing our analytics to greater pools of raw data, we can find interesting ways to segment our constituents, identifying unique patterns at a more granular level and devising solutions and services to meet these needs. For example, we can use Big Data to analyze the Elderly living in a particular part of a city that are alone, have a unique medical condition requiring specialist care, and use this information to manage staffing and service avalability for these users.

Negative Implications

  • Big Brother: Governments are sensitive to the perception of using data to investigate and monitor the individual   and the storing and analysis of data by government has long had a strong reaction in the public eye. However, the enactment of information transparency legislation and freedom of information policies, together with the formation of public watchdog sites have led to an encouraging environment for governments to pursue Big Data.
  • Implementation Hurdles: Implementing Big Data requires a holistic effort beyond adopting a new technology. The task of effectively identifying data that can be combined and analyzed; to securely managing the data over it’s lifetime must be carefully managed.

Where to Start?

We’ve distilled a number of important lessons from around the web that could guide your Big Data implementation:

  • Focus first on requirements: Decision makers are encouraged to look for the low hanging fruit, in other words, situations that have a pressing need for Big Data solutions. BIg Data is not a silver bullet and target implementations should be evaluated thoroughly.
  • Start small: Care should be taken to manage stakeholder expectations before Big Data takes on the image of a large disruptive technology i the workplace. Focusing on small pilot projects that show tangible and visible benefits are the best way to go and often pave the way for much larger projects down the line. Often, extending the pipeline for Big Data projects allows technology stakeholders time to get over the learning curve of adoption.
  • Reuse infrastructure: Big Data technologies can happily coexist on conventional infrastructure. In fact, Big Data implementations can happily coexist with Relational Database Systems in existing IT environments.
  • Obtain high-level support: Big Data sees the greatest benefits in terms of performance and cost savings when combining different systems. But with this type of endeavor comes greater complexity and risks from differing priorities. Managing this challenge requires the appointment of senior stakeholders who can align priorities and provide the necessary visibility for forward movement.
  • Push for standardization and educate decision makers: The Policy Exchange, a UK think tank recommends that “… public sector leaders and policymakers are literate in the scientific method and confident combining big data with sound judgment.”
  • Address Ethical Issues first: A major obstacle to adopting Big Data is the pressure from groups of individuals who wish not to be tracked, monitored or singled out. Governments should tackle this issue head on by developing a code for responsible analytics

Useful Links

Information week article on Microsoft’s Big Data strategy here.
UPenn Research Paper > Development of Big Data here.
Research Trends Report on the evolution of Big Data as a Research topic here.
Cloudera whitepapers on Government Implementations here.
Article on Big Data’s success in Government here.
Article: UK govt. in talks to use Hadoop here.
Paper: UK Government Digital Strategy here.
Paper: US Federal Government Big Data Strategy here.
Article: Big data in government here.
Article: National Weather Service using Big Data here.
Research: Mckinsey Global Institute paper on Big Data here.
Report: Policy Exchange Report on Big Data here.

 

Technology in Government – Cloud Computing

Executive Brief

A number of governments have implemented roadmaps and strategies that ultimately require their ministries, departments and agencies to default to Cloud computing solutions first when evaluating IT implementations. In this article, we evaluate the adoption of cloud computing in government and discuss some of the positive and negative implications of moving government IT onto the cloud.

Latest Trends

In this section, we look at a number of cloud initiatives that have been gaining leeway in the public sector:

  • Office Productivity Services – The New Zealand Government has identified office productivity services as the first set of cloud-based services to be deployed across government agencies. Considered to be low hanging fruit and fueled by successes in migrating perimeter services like anti-spam onto the cloud, many organizations see email and collaboration as a natural next step of cloud adoption. Vendors leading the charge include Microsoft’s Office 365 for Government, with successful deployments including Federal Agencies like the USDA, Veterans Affairs, FAA and the EPA as well as the Cities of Chicago, New York and Shanghai. Other vendor solutions include Google Apps for Government which supports the US Department of the Interior.
  • Government Cloud Marketplaces – A number of governments have signified the need to establish cloud marketplaces, where a federated marketplace of cloud service providers can support a broad range of users and partner organizations. The UK  government called for the development of a government-wide Appstore, as did the New Zealand Government in a separate cabinet paper on cloud computing in August 2012. The US government has plans to establish a number of cloud services marketplaces, including the GSA’s info.apps.gov and the DOE’s YOURcloud, a secure cloud services brokerage built on Amazon’s EC2 offering. (link) The image below lists the initial design for the UK government App store.
    03 UK App Store
  • Making Data publicly available  – The UK Government is readily exploiting opportunities to make available the Terabytes of public data that can be used to develop useful applications. The recent release of Met Office UK Weather information to the public via Microsoft Azure’s cloud hosting platform. (link)
  • Government Security Certification – A 2012 Government Cloud Survey conducted by KPMG listed security as the greatest concern for governments when it comes to cloud adoption and that governments are taking measures to manage security concerns. For example, the US General Services Administration subjects each successful cloud vendor to a battery of tests that include an assessment of access controls.

01a Canada Mappings

Canadian Government Cloud Architectural Components

Strategic Value

The strategic value of cloud computing can be summed up into a number of key elements in government. We’ve listed a few that appear on the top of our list:

  • Enhancing agility of government – Cited as a significant factor in cloud adoption, cloud computing promises rapid provisioning and elasticity of resources, reducing turnaround times on projects.
  • Supporting government policies for the environment – The environmental impact due to reduced data center spending and consumption of energy on cooling has tangible environmental benefits in terms of reduced greenhouse gas emissions and potential reductions in allocations of carbon credits.
  • Enhancing Transparency of government – Cloud allows the developed of initiatives that can make government records accessible to the public, opening up tremendous opportunities for innovation and advancement.
  • Efficient utilization of resources – By adopting a pay-for-use approach towards computing, stakeholders are encouraged to architect their applications to be more cost effective. This means that unused resources are freed up to the common pool of computing resources.
  • Reduction in spending – Our research indicated this particular element is not considered to be a significant aspect of moving to cloud computing according to technology decision makers, however some of the numbers being bandied about in terms of cost savings are significant (Billions of dollars) and can appeal to any constituency.

Positive Implications

We’ve listed a number of positive points towards cloud adoption. These may not be relevant in every use case, but worthwhile for a quick read:

  • Resource Pooling – leads to enhanced efficiency, reduced energy consumption and more economical cost savings from scale
  • Scalability – Unconstrained capacity allows for more agile enterprises that are scalable, flexible and responsive to change
  • Reallocation of human resources – Freed up IT resources can focus on R&D, designing new solutions that are optimized in cloud environments and decoupling applications from existing infrastructures.
  • Cost containment – Cloud computing requires the adoption of a ‘you pay for what you use’ model, which encourages thrift and efficiency. The transfer of CAPEX to OPEX also smoothes out cash-flow concerns  in an environment of tight budgets.
  • Reduce duplication and encourage re-use – Services designed to meet interoperability standards can be advertised in a cloud marketplace and become building blocks that can be used by different departments to construct applications
  • Availability – Cloud architecture is designed to be independent of the underlying hardware infrastructure and promotes scalability and availability paradigms such as homogeneity and decoupling
  • Resiliency – The failure of one node of a cloud computing environment has no overall effect on information availability

Negative Implications

A sound study should also include a review of the negative implications of cloud computing:

  • Bureaucratic hinderances – when transitioning from legacy systems, data migration and change management can slow down the “on demand” adoption of cloud computing.
  • Cloud Gaps – Applications and services that have specific requirements which are unable to be met by the cloud need to be planned for to ensure that they do not become obsolete.
  • Risks of confidentiality – Isolation has been a long-practiced strategy for securing disparate networks. If you’re not connected to a network, there’s no risk of threats getting in. A common cloud infrastructure runs the risk of exploitation that can be pervasive since all applications and tenants are connected via a common underlying infrastructure.
  • Cost savings do not materialize – The cloud is not a silver bullet for cost savings. We need to develop cloud-aligned approaches towards IT provisioning, operations and management. Applications need to be decoupled and re-architected for the cloud. Common services should be used in order to exploit economies of scale; applications and their underlying systems need to be tweaked and optimized.

05 Cloud Security concerns

Security was cited as a major concern (KPMG)

Where to start?

There is considerable research that indicates government adoption of cloud computing will accelerate in coming years. But to walk the fine line of success, what steps can be taken? We’ve distilled a number of best practices into the following list:

00 USG Roadmap

  1. Develop Roadmaps:  Before Cloud Computing can reap all of the benefits that it has to offer, governments must first move along a continuum towards adoption. For that very purpose, a number of governments have developed roadmaps to aid in developing a course of progression towards the cloud. Successful roadmaps featured the following components:
    • A technology vision of Cloud Computing Strategy success
    • Frameworks to support seamless implementation of federated community cloud environments
    • Confidence in Security Capabilities – Demonstration that cloud services can handle the required levels of security across stakeholder constituencies in order to build and establish levels of trust.
    • Harmonization of Security requirements – Differing security standards will impede and obstruct large-scale interoperability and mobility in a multi-tenanted cloud environment, therefore a common overarching security standard must be developed.
    • Management of Cloud outliers – Identify gaps where Cloud cannot provide adequate levels of service or specialization for specific technologies and application and identify strategies to deal with these outliers.
    • Definition of unique mission/sector/business Requirements (e.g. 508 compliance, e-discovery, record retention)
    • Development of cloud service metrics such as common units of measurement in order to track consumption across different units of government and allow the incorporation of common metrics into SLAs.
    • Implementation of Audit standards to promote transparency and gain confidence
  2. Create Centers of Excellence: Cloud Computing Reference Architectures; Business Case Templates and Best Practices should be developed so that cloud service vendors should map their offerings to (i.e. NIST Reference Architecture) so that it is easier to compare services.
  3. Cloud First policies: Implementing policies that mandate all departments across government should consider cloud options first when planning for new IT projects.

Conclusion

The adoption of cloud services holds great promise, but due to the far reaching consequences necessitated by the wide-spread adoption of cloud to achieve objectives such as economies of scale, a comprehensive plan compounded with standardization and transparency become essential elements of success.

We hope this brief has been useful. Ook!

Useful Links

Microsoft’s Cloud Computing in Government page
Cisco’s Government Cloud Computing page
Amazon AWS Cloud Computing page
Redhat cloud computing roadmap for government pdf
US Government Cloud Computing Roadmap Vol 1.
Software and Information Industry updates on NIST Roadmap
New Zealand Government Cloud Computing Strategy link
A
ustralian Government Cloud Computing Strategic Direction paper
Canadian Government Cloud Computing Roadmap
UK Government Cloud Strategy Paper
GCN – A portal for Cloud in Government
Study – State of Cloud Computing in the public sector

Technological Transformation in Government

Inauguration Obama

Photo (c) A/P Sandy Huffaker

Foreword

We live in an exciting juncture when the world is undergoing massive and visible transformation. The Internet has given us instant access to information and it has affected how we do things on a global scale. Our children go to school and interact with knowledge in ways that we could have never imagined before; while demand and supply interact within virtual, global marketplaces where consumers are informed and empowered and suppliers are intelligent and efficient. Yet there is no place where the impacts of technology are more visibly felt than in the Public Sector, where technology may be deployed to serve an informed electorate with high expectations, demanding services and efficiency at an ever-accelerating pace.

Brief

In this series of articles, I will explore a number contemporary issues that Technology decision makers in Government are concerned with and also look into innovative, viable solutions that have been successfully implemented in a number of countries to solve or address these concerns.

  • Cloud Computing – While cloud technology promises to delivery significant cost savings from economies of scale and cut down on deployment costs, cloud has been traditionally shunned by governments for a number of reasons, including security and confidentiality. In recent years, a number of vendors have developed Government Clouds that are designed to integrate with existing Government networks and systems, while meeting government needs for compliance and security.
  • Big Data – Big Data refers to data sets that are so large that they become difficult to manage using traditional tools. With the proliferation of e-government initiatives, governments word-wide face significant challenges in managing vast repositories of information.
  • Open Source and Interoperability – Government’s ability to adopt and enhance open standards that encourage interoperability between different systems and establish an environment of equal opportunities among technology vendors, partners and end-users.
  • Digital Access – The Internet has redefined access to knowledge and learning and it is a priority for governments to ensure that students from all walks of life are not limited in opportunity due to poor access to the web. Here we explore how technology is transforming big cities and communities alike in accessing the web.
  • Mobility and Telecommuting – Governments worldwide are embracing  telecommuting and flex-time work policies as a viable long-term solution to reducing costs and energy consumption. We explore technologies that foster collaboration and productivity for a mobile workforce.
  • Cyber Security – With the call for increased vigilance against acts of cyber terrorism, we explore the extent that governments are prepared to do in order to maintain Confidentiality, Integrity and Availability amidst an increasingly connected ecosystem of public-sector employees, vendors, contractors and other stakeholders.
  • Open Government – Governments are heeding the call for greater transparency, public participation and collaboration by making information more readily available on government websites and also providing the public with greater access for providing feedback and commentary. This had led to the adoption of new technologies and innovations to ensure that confidentiality is not sacrificed in the light of new policies
  • Connected Health and Human Services – Case management, health records management and health benefits administration are but a few components of government services that many lives depend on to function effectively and efficiently. We will explore technologies that are transforming these services.
  • Accessibility – In an age of information workers, support for differently abled employees has become a source of competitive advantage, enabling governments to tap into additional segments of the workforce.
  • Defense and Intelligence – Technology has long played a vital role in ensuring that vital battlefield decisions can be made with timely access to information; communications occurs unimpeded in times of emergency; and cost efficiencies can me maximized in times of tightening budgets.

Dimensions of Exploration

Essential to any well-thought out study, we must consider important attributes such as the long-term implications, return on investment and practicality of implementation. Therefore, for each of the issues listed above, we will include in our analysis the following components:

  • Executive Brief
  • Latest Trends
  • Strategic Value
  • Positive Implications
  • Negative Implications
  • Proposed Solutions
  • Reference Implementations
  • Useful Links

Topics

An individual article has been dedicated to each of the following topics; please click on each one for further reading:

  • Cloud Computing
  • Big Data
  • Open Source and Interoperability
  • Digital Access
  • Mobility and Telecommuting
  • Cyber Security
  • Open Government
  • Connected Health and Human Services
  • Accessibility
  • Defense and Intelligence

* This series is a work in progress, and does not support a particular thesis or ideal. It simply reflects research of the solutions that have been devised to solve frequently unique problems and does not reflect an endorsement of a particular technology or ideal.

Why write about Government?

I’ve spent a significant amount of time consulting for government and in truth, nothing has given me greater pleasure than to see the benefits of technology impact my selfless friends and colleagues who have made the altruistic decision to stay in government in order to serve the greater good. These unsung heroes maintain the systems that support our health, education, defense, civil, social and legal infrastructure and many other essential functions of government, which many lives may depend on.