What are the most common incident handling challenges in a cloud environment?
Learn from the community’s knowledge. Experts are adding insights into this AI-powered collaborative article, and you could too.
This is a new type of article that we started with the help of AI, and experts are taking it forward by sharing their thoughts directly into each section.
If you’d like to contribute, request an invite by liking or reacting to this article. Learn more
— The LinkedIn Team
Incident handling is the process of responding to and managing security events or breaches in a cloud environment. It involves identifying, containing, analyzing, and resolving incidents, as well as communicating with stakeholders and reporting on the outcomes. However, incident handling in the cloud poses some unique challenges that require special attention and skills. In this article, we will discuss some of the most common incident handling challenges in a cloud environment and how to overcome them.
One of the main challenges of incident handling in the cloud is the complexity and heterogeneity of the cloud infrastructure and services. Cloud providers offer different types of cloud models, such as public, private, hybrid, or multi-cloud, and various levels of service, such as infrastructure as a service (IaaS), platform as a service (PaaS), or software as a service (SaaS). Each of these options has its own security features, policies, and responsibilities, which may vary across regions, zones, or accounts. Moreover, cloud customers may use multiple cloud providers or integrate their cloud systems with on-premises or third-party systems, creating a hybrid and heterogeneous environment that increases the attack surface and the potential for misconfigurations or vulnerabilities. Therefore, incident handlers need to have a clear understanding of the cloud architecture, components, and dependencies, as well as the roles and responsibilities of the cloud provider and the customer in terms of security and incident response.
-
Rakesh Bhardwaj
Group Chief Information Officer at Ola | DeepTech | NASSCOM DeepTech Mentor | Compelling Experiences & Agility | Digital Leader 2018
Multi appliance play with multi vendor is the challenging part because either parties pass the buck on to the other party. Incident managers need to understand the wiring and plumbing (somewhat like full stack) to troubleshoot and fix. Vendors are also challenged as they don’t have such capabilities in large numbers and the availability of key players when required is a big question.
-
Mamoun Alhomssey
Lead by Innovation | Digital Transformation | Strategy & Solutions | Neo Banks | Crypto | Advanced Data Analytics | Advisor | Group CIO | CTO | CDO | Banks Acquistions and Mergers| Fintech
in my experience, the Cloud services adoption journey is very tricky one due to the nature of the change and the different dimensions associated with it, in the middle east most of the Cloud journeys usually start in a very suspecious and uncertain way with no clear outcome of the end result, sometimes it starts only for for branding and showoffs purposes. the most important factor to start succesful journey and get the best outcome is to ensure to have higher management and business support and understanding then setup the Cloud team and start start the technical work, will share with you more details soon
Another challenge of incident handling in the cloud is the limited visibility and access to the cloud resources and data. Cloud customers often rely on the cloud provider's tools and interfaces to monitor and manage their cloud assets, which may not provide sufficient or timely information about the security status, events, or incidents. For example, some cloud services may not offer detailed logs, alerts, or metrics, or may charge extra fees for accessing them. Additionally, some cloud services may not allow direct access to the underlying infrastructure or systems, making it difficult to perform forensic analysis or collect evidence. Furthermore, some cloud services may use encryption, compression, or obfuscation techniques that hinder the visibility and access to the data. Therefore, incident handlers need to use appropriate tools and methods to gain visibility and access to the cloud resources and data, such as using APIs, agents, proxies, or third-party solutions, or requesting assistance from the cloud provider.
-
Nico Tupamahu
A robust monitoring solution is crucial. Relying solely on native tools can lead to gaps in visibility during security incidents. A proactive approach, coupled with customized access, ensures effective response to events. Having the right visibility is key to a strong security posture. It allows for swift incident detection and mitigation, safeguarding valuable assets. Stay proactive and well-prepared for a secure digital environment.
-
Prince Ofori-Kuragu
ITSM Advisory Consultant - Accenture UK
Comprehensive Service Mapping: While comprehensive service mapping is crucial for understanding the infrastructure, it can present challenges in terms of keeping the map up-to-date, especially in a dynamic, multi-vendor environment. Undocumented or under-circulated changes in ownership, responsibilities, or infrastructure alterations can quickly render the map outdated, potentially causing confusion during incidents.
(edited)
A third challenge of incident handling in the cloud is the compliance and jurisdiction issues that may arise from the global and dynamic nature of the cloud. Cloud providers may store, process, or transfer data across different countries or regions, which may have different laws, regulations, or standards regarding data protection, privacy, or security. For example, some jurisdictions may require data to be stored locally or prohibit data from being transferred to certain countries. Additionally, some jurisdictions may have different rules or procedures for accessing, disclosing, or preserving data in the case of an incident or investigation. Therefore, incident handlers need to be aware of the compliance and jurisdiction requirements and implications that affect their cloud data and operations, as well as the cloud provider's policies and practices regarding data location, sovereignty, or disclosure.
-
Tony Turner
For Founders & CTO's ● Fractional CTO ● Reducing Pain Points ● Solving Problems ● Coaching ● Delivery
Incident handlers must navigate complex compliance and jurisdiction challenges in the cloud due to varying international laws and regulations on data protection, privacy, and security. They need to be well-versed in the specific requirements of each jurisdiction where their data is stored, processed, or transferred, as well as the cloud provider's policies on data location, sovereignty, and disclosure.
-
Daniel Stafrace
Enterprise Security and Services Manager at KingMakers | PRINCE2® Practitioner | Certified Ethical Hacker
Compliance has always been a challenge both on prem and cloud. Depending on where the organization is operating and in which sector, different compliance requirements will be applied. Since cloud has been around for quite a while now, most cloud providers meet most regulatory requirements which gives our organizations an additional reassurance especially since the responsibility is split between your organization and the provider.
A fourth challenge of incident handling in the cloud is the scalability and automation of the cloud environment and the incident response process. Cloud customers can leverage the cloud's elasticity and flexibility to scale up or down their resources and services according to their needs, which may change rapidly or unpredictably. However, this also means that the cloud environment and the incident scope may change dynamically, making it harder to identify, contain, or analyze incidents. Moreover, cloud customers can use automation tools and scripts to deploy, configure, or update their cloud resources and services, which may introduce errors, inconsistencies, or vulnerabilities. Therefore, incident handlers need to use scalability and automation techniques to cope with the changing and complex cloud environment and the incident response process, such as using cloud-native or third-party tools to detect, isolate, or remediate incidents, or using code repositories, version control, or testing tools to manage and audit their automation scripts.
-
Brett Coryell
CIO | CISO | Researching AI and Information Security
Excellent security teams should consider when, whether, and how they can use scalability and automation for their own purposes. Many dev or ops teams will want to "revert, destroy, and redeploy" the moment they notice something suspicious. That can reduce the evidence we have to work with, including watching the threat actors work. In some cases, if the team is good enough, we can use automation to deploy our own tools to increase logging, re-route requests, install decoys, or keep the bad guys contained. These tools have two sides. Don't let the threat actors be the only ones using them.
-
Maik Ewald
Group Director IT Infrastructure at Klöckner Pentaplast
There is no way around automation here! Due to the growing complexity resulting from the increasingly rapid growth in the use of cloud services. However, this does not always mean that customers only have to use the services of the cloud provider. Many security systems can dock on here and use their own runbooks and scripts. The integration of such automated measures should be part of the service design and carried out in close consultation with the security team.
A fifth challenge of incident handling in the cloud is the coordination and communication among the various stakeholders involved in the incident response process. These stakeholders may include the cloud customer's internal teams, such as IT, security, legal, or business, as well as external parties, such as the cloud provider, law enforcement, regulators, or customers. Each of these stakeholders may have different expectations, objectives, or interests regarding the incident handling process, as well as different levels of knowledge, skills, or authority. Moreover, each of these stakeholders may use different tools, channels, or formats to communicate or share information, which may lead to confusion, delays, or conflicts. Therefore, incident handlers need to establish and maintain effective coordination and communication among the stakeholders, such as using common protocols, standards, or platforms, or creating and following incident response plans, policies, or procedures.
-
Jørgen Borup
Managing Consultant Architect @ Capgemini | Technical Leader | Startups and enterprise | Optimization and automation for developers, ops and endusers | Github | Atlassian | Kubernetes | Native
In my experience the first thing is clarity on what the incident is about. The scope for this article is breaches and security events. But there is other types of incidents as well, and different audiences might expect an incidents to be about slower responses, an outage of the whole or part of a service. Some may perceive this as a breach others might not, hence it is very important to be clear on what to communicate and how, neither neglect nor create a storm in glass of water becomes important. Recommend to have a clear communication plan for the various types of incidents up front, not just the priorities but the types as well.
-
Raghuram Janapareddy
Partner & Managing Director - India @ Tenthpin | Innovation in Lifesciences
You are right. Often the anonymous mail boxes and faceless bots don’t get you to the resolution fast. The person who raises the incident often resigns to her/his fate to get the resolution on time
-
Jerry Young, MBA
Head of Software Development at Republic Finance
Cybersecurity is a domain where a good offense is a good defense. Companies should look at leveraging 3rd party firms and tools to do routine penetration testing of your cloud environment. This can find problems and remediate them before a breach. Sometimes it’s simply a matter of updating outdated libraries in code or changing settings within the cloud platform. You cannot simply rely on a cloud provider to keep your instance secure. Waiting until there is an incident to be proactive is normally too late and puts you on defense.
(edited) -
Jamie Vernon
Experienced IT Executive and Change Agent
In a crisis, teams *sink* to the level of their training. They do not rise. Therefore, practice is the answer. Practice, practice, practice. "Amateurs practice until they get it right; professionals do so until they can't get it wrong." Have tabletop exercises. Go to the datacenter and pull a cord. Maybe even have someone pretend to be sick. Netflix has had Chaos Monkey (https://www.techtarget.com/whatis/definition/Chaos-Monkey) for *years* to test system resilience. In doing so, you will learn whether: - the runbooks are safe and accessible - skillsets are duplicated - the restoration automations really work - the documentation is up to date - the monitoring tools are pervasive - you're ready for actual chaos