Building Your Operational Resilience Strategy

Have you architected flexibility into your IT infrastructure for disaster recovery, survivability, and disruption?

The next normal isn’t simply about server backups. It’s about effectiveness while architecting end-user location agnosticism. In this week’s post, discover how businesses effectively using cloud services can increase operational resilience and avoid the high costs of IT disruption with Intelligent Cloud Management.

Adoption of Cloud Services Increases as Benefits Outweigh On-Premise Options

Cloud adoption continues to accelerate with are too hard to ignore. Cloud Services provide companies the means to rapidly and cost effectively scale up or scale down needed resources such as virtual machines, containers, machine learning frameworks, and many others. In other cases, the cloud gives companies a way to provide services that they couldn’t previously, without incurring the cost of underlying on-premise infrastructure and hard-to-retain technical skill sets required for setup, operations and maintenance. Cloud services such as Microsoft's Office365 for Business and Google's Gmail for Business are examples that can help alleviate, if not outright remove, these on-premise requirements. This reduces the cost and operational burdens which are necessitated by on-premise deployments of similar services while delivering ubiquitous service availability.

In addition to cloud adoption, companies are having to contend with higher percentages of their workforce working remotely. This remote workforce still needs access to critical applications and cloud services. As such, remote end-user location agnosticism becomes important to ensure end-users can access the services they need to do their job. Corporations need to ensure that these services remain available, stable, and functional during events which could normally impair the end-user's ability to utilize those services.

Most companies already have secure Private Clouds set up within their corporate networks consumed by on-premise end-users or authorized end-users using VPN solutions. Other organizations have branched out - setting up secure but private cloud within Public Cloud Providers like Amazon Web Service and Microsoft Azure. Still others have fully adopted using the public cloud as their primary environment space with many business and finance executives mandating all new service/application deployments must be cloud based, and existing legacy on-premise deployments be audited for possible cloud migration. As such, many companies are Hybrid Cloud adopters across multiple Cloud Service Providers in addition to existing on-premise applications within their Private Cloud deployments.

Adoption of Cloud Services brings a new aspect to planning and executing Business Continuity and Digital Transformation strategies. Corporations should include all Cloud Services they provide or subscribe to and consume in their Business Continuity and Digital Transformation planning. As part of this planning, businesses must ensure they understand the cloud services dependencies, usage requirements, cost structures and other criteria which will help to ensure those service’s Operational Resilience.

What is Operational Resilience?

A basic definition of Operational Resiliency is an organization's ability and maturity to continue providing business services in situations of operational faults or errors. The adoption of cloud services can also be brought into this definition by changing it to say Operational Resilience is an organization's ability and maturity to continue providing and consuming business services (both on-premise and cloud based) in situations of operational failure or errors.

The Characteristics of Operational Resilience

Operational Resiliency contains several characteristics. The degree of importance of each characteristic is situational be IT on-premise infrastructure, application services, cloud services, user accessibility, etc.

These characteristics of Operational Resilience are:

  • Adaptability - How does a resource or service adapt to changing conditions that allow it to operate?
  • Recoverability - How does a service or offering get restored during an event?  Is it manual or automatic?  This brings in topics of High-Availability and Disaster Recovery.
  • Predictability - How are events determined and handled?  Depending on the maturity of monitoring in place, this could issue pre-event predictions or post-event notifications.
  • Securability - Are appropriate RBAC policies in place to restrict access only to those requiring access? Is Denial-of-Service monitoring in place to allow for predictive event notifications to trigger failover procedures?
  • Accessibility - While a service may be up, is the service accessible to consumers of that service? An example consideration here could be some users within an organization are provided 4G/5G wireless access points to allow them to continue accessing critical corporate resources in the situation where their remote location Wifi or network access is down and out of their control.

The Requirements of Operational Resilience

The characteristics of Operational Resilience are supported by the following requirements:

  • Investment / Executive Shepherding
    • Skills development and retention
    • Infrastructure and services to provide to support the resiliency characteristics of adaptive, self-healing, predictable
    • Automation development LOE and verification via testing
    • Monitoring development and operations
  • Automation
    • Needed for development (CI/CD) in addition to rules based and machine learning programmatic software based operations
  • Monitoring
    • Monitoring is required regardless of where the services reside.
    • Monitoring is essential to ensure events are identified appropriately feeding to an ITSM framework to ensure that notifications and action are automatically set in motion.
    • Monitoring is a critical requirement for measuring reliability and making resilience a reality.
  • Architectural Review and Due Diligence
    • Architecture Review and due diligence is part of any services definition regarding whether that service is provided within a Private Cloud, Public Cloud, or Hybrid Cloud scenario.
    • It is important to understand the Share Responsibility Model which all major Cloud Service Providers have when providing services to their customers.
    • Knowing which Operation Resilience Characteristics you directly control and which ones you are the recipient of is critical to the Operational Resilience definitions within Business Continuity and Digital Transformation planning.

Businesses Receive Guidance and Advice with Intelligent Cloud Management Offering

Through pureIntegration's Intelligent Cloud Management offering, customers can get the assistance and guidance they need for assessing and refactoring their existing Business Continuity planning and implementations.

The following actions are included within our Intelligent Cloud Management offering:

  • Review a customer’s existing business risk profile definitions by examining their stance of various critical aspects of Business Continuity and Operational Resilience including, but not limited to, Governance, Monitoring, High-Availability/Disaster Recovery, and ITSM.
  • Perform analysis across these critical aspects to identify where gaps exist contributing operational resilience issues. Examples include lack of defined KPIs for reporting on Operational Resilience, technical skill gaps, lack of monitoring and/or reporting capabilities, lack of executive sponsorship and/or ownership, and other areas.
  • Identify solutions to the identified gaps which could include technology, management changes, process changes, etc.
  • Develop a roadmap with timelines of necessary changes and adjustment to make Operation Resilience an operational reality.
  • Assist the customer with implementation of the roadmap of changes to bring applications and services into alignment for defined Business Continuity and Operational Resiliency requirements and measures.

Cloud Services, Business Continuity Planning, and Operational Resilience are not "eureka concepts." We have long anticipated business disruptions and architected around them. In the next normal of high-expectations including both digital-first customer engagement and the exponentially increasing performance demands of a distributed workforce, Intelligent Cloud Management will play a significant role in re-thinking the office and storefront of the future— striking a balance between cost and innovation.

Tired of throwing money at the wrong cloud management solutions?  pureIntegration enables companies to optimize the provisioning of their IT services across on-premise and cloud environments, to accelerate transformation from on-premise solutions and operations to Hybrid IT. Schedule a meeting with our cloud expert today!

Map Your Route to Digital Business Transformation. Download our Brochure to learn how we can help revolutionize your business. Download now.