How to Handle Incidents in a Cloud Environment

Incident handling is the process of responding to and managing security events or breaches in a cloud environment. It involves identifying, containing, analyzing, and resolving incidents, as well as communicating with stakeholders and reporting on the outcomes. However, incident handling in the cloud poses some unique challenges that require special attention and skills. In this article, we will discuss some of the most common incident handling challenges in a cloud environment and how to overcome them.

Complexity and heterogeneity

One of the main challenges of incident handling in the cloud is the complexity and heterogeneity of the cloud infrastructure and services. Cloud providers offer different types of cloud models, such as public, private, hybrid, or multi-cloud, and various levels of service, such as infrastructure as a service (IaaS), platform as a service (PaaS), or software as a service (SaaS). Each of these options has its own security features, policies, and responsibilities, which may vary across regions, zones, or accounts. Moreover, cloud customers may use multiple cloud providers or integrate their cloud systems with on-premises or third-party systems, creating a hybrid and heterogeneous environment that increases the attack surface and the potential for misconfigurations or vulnerabilities. Therefore, incident handlers need to have a clear understanding of the cloud architecture, components, and dependencies, as well as the roles and responsibilities of the cloud provider and the customer in terms of security and incident response.

Add your perspective

Rakesh Bhardwaj

Group Chief Information Officer at Ola | DeepTech | NASSCOM DeepTech Mentor | Compelling Experiences & Agility | Digital Leader 2018
Multi appliance play with multi vendor is the challenging part because either parties pass the buck on to the other party. Incident managers need to understand the wiring and plumbing (somewhat like full stack) to troubleshoot and fix. Vendors are also challenged as they don’t have such capabilities in large numbers and the availability of key players when required is a big question.
Like

18
Report contribution
Mamoun Alhomssey

Lead by Innovation | Digital Transformation | Strategy & Solutions | Neo Banks | Crypto | Advanced Data Analytics | Advisor | Group CIO | CTO | CDO | Banks Acquistions and Mergers| Fintech
in my experience, the Cloud services adoption journey is very tricky one due to the nature of the change and the different dimensions associated with it, in the middle east most of the Cloud journeys usually start in a very suspecious and uncertain way with no clear outcome of the end result, sometimes it starts only for for branding and showoffs purposes. the most important factor to start succesful journey and get the best outcome is to ensure to have higher management and business support and understanding then setup the Cloud team and start start the technical work, will share with you more details soon
Like

18
Report contribution
Ishaq Mohammed

Problem Solver, Creative Solutions builder, Technology Mapping Specialist, Managed Services
Resource Sprawl: Over time, unused cloud resources increase complexity, making it hard to identify relevant resources during incidents. Cost: Sprawl incurs unnecessary costs as customers pay for unused resources. Security Posture: Unused resources can pose security risks, potentially leading to data breaches. Incident Handling Time: Sorting through numerous resources during troubleshooting delays responses, resulting in service downtime. Recommendations: To combat sprawl, customers should regularly review and clean resources, use tagging, and automate lifecycle management to improve incident handling and overall cloud operations.
Like

5
Report contribution

Visibility and access

Another challenge of incident handling in the cloud is the limited visibility and access to the cloud resources and data. Cloud customers often rely on the cloud provider's tools and interfaces to monitor and manage their cloud assets, which may not provide sufficient or timely information about the security status, events, or incidents. For example, some cloud services may not offer detailed logs, alerts, or metrics, or may charge extra fees for accessing them. Additionally, some cloud services may not allow direct access to the underlying infrastructure or systems, making it difficult to perform forensic analysis or collect evidence. Furthermore, some cloud services may use encryption, compression, or obfuscation techniques that hinder the visibility and access to the data. Therefore, incident handlers need to use appropriate tools and methods to gain visibility and access to the cloud resources and data, such as using APIs, agents, proxies, or third-party solutions, or requesting assistance from the cloud provider.

Add your perspective

Nico Tupamahu
A robust monitoring solution is crucial. Relying solely on native tools can lead to gaps in visibility during security incidents. A proactive approach, coupled with customized access, ensures effective response to events. Having the right visibility is key to a strong security posture. It allows for swift incident detection and mitigation, safeguarding valuable assets. Stay proactive and well-prepared for a secure digital environment.
Like

4
Report contribution
Prince Ofori-Kuragu

ITSM Advisory Consultant - Accenture UK
Comprehensive Service Mapping: While comprehensive service mapping is crucial for understanding the infrastructure, it can present challenges in terms of keeping the map up-to-date, especially in a dynamic, multi-vendor environment. Undocumented or under-circulated changes in ownership, responsibilities, or infrastructure alterations can quickly render the map outdated, potentially causing confusion during incidents.
Like

1

(edited)
Report contribution
Shalin Sinha

Microsoft Canada Support Leader
In my experience working with customers, I have seen more challenges faced by them when cloud is treated just as an extension to their existing infrastructure.. Cloud is an entity in its own which should have their own defined processes, policies, support strategies, IT roles & responsibilities, outcomes and goals. This will also be need to be refined depending on what kind of cloud service you have. The experience and expertise required to be successfully manage and grow cloud is very different from traditional IT services.
Like

1
Report contribution

Compliance and jurisdiction

A third challenge of incident handling in the cloud is the compliance and jurisdiction issues that may arise from the global and dynamic nature of the cloud. Cloud providers may store, process, or transfer data across different countries or regions, which may have different laws, regulations, or standards regarding data protection, privacy, or security. For example, some jurisdictions may require data to be stored locally or prohibit data from being transferred to certain countries. Additionally, some jurisdictions may have different rules or procedures for accessing, disclosing, or preserving data in the case of an incident or investigation. Therefore, incident handlers need to be aware of the compliance and jurisdiction requirements and implications that affect their cloud data and operations, as well as the cloud provider's policies and practices regarding data location, sovereignty, or disclosure.

Add your perspective

Tony Turner

For Founders & CTO's ● Fractional CTO ● Reducing Pain Points ● Solving Problems ● Coaching ● Delivery
Incident handlers must navigate complex compliance and jurisdiction challenges in the cloud due to varying international laws and regulations on data protection, privacy, and security. They need to be well-versed in the specific requirements of each jurisdiction where their data is stored, processed, or transferred, as well as the cloud provider's policies on data location, sovereignty, and disclosure.
Like
Report contribution
Daniel Stafrace

Enterprise Security and Services Manager at KingMakers | PRINCE2® Practitioner | Certified Ethical Hacker
Compliance has always been a challenge both on prem and cloud. Depending on where the organization is operating and in which sector, different compliance requirements will be applied. Since cloud has been around for quite a while now, most cloud providers meet most regulatory requirements which gives our organizations an additional reassurance especially since the responsibility is split between your organization and the provider.
Like
Report contribution
Nico Tupamahu
In the dynamic world of cloud computing, understanding compliance and jurisdiction is vital for Security Officers. They must navigate these complexities to protect data integrity and privacy. This involves thorough jurisdictional analysis, tailored data handling protocols, partnership with cloud providers, agile compliance strategies, and ongoing team education. These steps ensure legal adherence and fortified data integrity in cloud operations.
Like
Report contribution

Scalability and automation

A fourth challenge of incident handling in the cloud is the scalability and automation of the cloud environment and the incident response process. Cloud customers can leverage the cloud's elasticity and flexibility to scale up or down their resources and services according to their needs, which may change rapidly or unpredictably. However, this also means that the cloud environment and the incident scope may change dynamically, making it harder to identify, contain, or analyze incidents. Moreover, cloud customers can use automation tools and scripts to deploy, configure, or update their cloud resources and services, which may introduce errors, inconsistencies, or vulnerabilities. Therefore, incident handlers need to use scalability and automation techniques to cope with the changing and complex cloud environment and the incident response process, such as using cloud-native or third-party tools to detect, isolate, or remediate incidents, or using code repositories, version control, or testing tools to manage and audit their automation scripts.

Add your perspective

Brett Coryell

CIO | CISO | Researching AI and Information Security
Excellent security teams should consider when, whether, and how they can use scalability and automation for their own purposes. Many dev or ops teams will want to "revert, destroy, and redeploy" the moment they notice something suspicious. That can reduce the evidence we have to work with, including watching the threat actors work. In some cases, if the team is good enough, we can use automation to deploy our own tools to increase logging, re-route requests, install decoys, or keep the bad guys contained. These tools have two sides. Don't let the threat actors be the only ones using them.
Like

1
Report contribution
Maik Ewald

Group Director IT Infrastructure at Klöckner Pentaplast
There is no way around automation here! Due to the growing complexity resulting from the increasingly rapid growth in the use of cloud services. However, this does not always mean that customers only have to use the services of the cloud provider. Many security systems can dock on here and use their own runbooks and scripts. The integration of such automated measures should be part of the service design and carried out in close consultation with the security team.
Like

1
Report contribution
CHANCHAL CHATTERJI

Enterprise Security Architect at Infosys
Lack of clear roles and responsibilities. Poor communication and collaboration. Inadequate incident categorisation and prioritisation. Insufficient incident diagnosis and resolution. Incomplete incident closure and review.
Like
Report contribution

Coordination and communication

A fifth challenge of incident handling in the cloud is the coordination and communication among the various stakeholders involved in the incident response process. These stakeholders may include the cloud customer's internal teams, such as IT, security, legal, or business, as well as external parties, such as the cloud provider, law enforcement, regulators, or customers. Each of these stakeholders may have different expectations, objectives, or interests regarding the incident handling process, as well as different levels of knowledge, skills, or authority. Moreover, each of these stakeholders may use different tools, channels, or formats to communicate or share information, which may lead to confusion, delays, or conflicts. Therefore, incident handlers need to establish and maintain effective coordination and communication among the stakeholders, such as using common protocols, standards, or platforms, or creating and following incident response plans, policies, or procedures.

Add your perspective

Jørgen Borup

Managing Consultant Architect @ Capgemini | Technical Leader | Startups and enterprise | Optimization and automation for developers, ops and endusers | Github | Atlassian | Kubernetes | Native
In my experience the first thing is clarity on what the incident is about. The scope for this article is breaches and security events. But there is other types of incidents as well, and different audiences might expect an incidents to be about slower responses, an outage of the whole or part of a service. Some may perceive this as a breach others might not, hence it is very important to be clear on what to communicate and how, neither neglect nor create a storm in glass of water becomes important. Recommend to have a clear communication plan for the various types of incidents up front, not just the priorities but the types as well.
Like

3
Report contribution
Raghuram Janapareddy

Partner & Managing Director - India @ Tenthpin | Innovation in Lifesciences
You are right. Often the anonymous mail boxes and faceless bots don’t get you to the resolution fast. The person who raises the incident often resigns to her/his fate to get the resolution on time
Like

2
Report contribution
Nico Tupamahu
Marketing and Events teams excel at communication, but when it comes to content, the Operations or Security departments should take the lead in providing explanations. They can offer refined content to the Marketing team for seamless integration into their communication platforms. This ensures a cohesive and impactful message. I have my doubts about whether every organization will understand this way of working.
Like

1
Report contribution

Here’s what else to consider

This is a space to share examples, stories, or insights that don’t fit into any of the previous sections. What else would you like to add?

Add your perspective

Jerry Young, MBA

Head of Software Development at Republic Finance
Cybersecurity is a domain where a good offense is a good defense. Companies should look at leveraging 3rd party firms and tools to do routine penetration testing of your cloud environment. This can find problems and remediate them before a breach. Sometimes it’s simply a matter of updating outdated libraries in code or changing settings within the cloud platform. You cannot simply rely on a cloud provider to keep your instance secure. Waiting until there is an incident to be proactive is normally too late and puts you on defense.
Like

2

(edited)
Report contribution
Jamie Vernon

Experienced IT Executive and Change Agent
In a crisis, teams *sink* to the level of their training. They do not rise. Therefore, practice is the answer. Practice, practice, practice. "Amateurs practice until they get it right; professionals do so until they can't get it wrong." Have tabletop exercises. Go to the datacenter and pull a cord. Maybe even have someone pretend to be sick. Netflix has had Chaos Monkey (https://www.techtarget.com/whatis/definition/Chaos-Monkey) for *years* to test system resilience. In doing so, you will learn whether: - the runbooks are safe and accessible - skillsets are duplicated - the restoration automations really work - the documentation is up to date - the monitoring tools are pervasive - you're ready for actual chaos
Like

2
Report contribution
Mohsen Faraghzadeh

Director of Information Technology
- Comprehensive incident response plan. The plan should outline roles and responsibilities, escalation procedures, communication channels, and steps to mitigate and recover from incidents. - Proactive Monitoring - Automated Alerts and Notifications - Rapid Incident Detection. Establish real-time incident detection mechanisms to identify abnormal activities or security breaches. Utilize anomaly detection, intrusion detection systems (IDS), and security information and event management (SIEM) tools to identify potential threats. - Incident Containment and Mitigation - Forensic Investigation Regularly review and refine strategies based on emerging threats, industry best practices, and lessons learned from past incidents.
Like

1
Report contribution

What are the most common incident handling challenges in a cloud environment?

Complexity and heterogeneity

Visibility and access

Compliance and jurisdiction

Scalability and automation

Coordination and communication

Here’s what else to consider

IT Strategy

Rate this article

Thanks for your feedback

More articles on IT Strategy

What are the most common incident handling challenges in a cloud environment?

Complexity and heterogeneity

Visibility and access

Compliance and jurisdiction

Scalability and automation

Coordination and communication

Here’s what else to consider

IT Strategy

Rate this article

Thanks for your feedback

Explore Other Skills