Part 1 – Incident Detection

Introduction

As we always say at Infosecnirvana, “Every Attacker leaves behind a trail”. Identifying the trail in an organization’s infrastructure is the main goal of Incident Detection and this is where all the cutting edge technology, talented people and mature processes come together. From Perimeter protection devices like Firewalls (Both Network & Application), IDS/IPS, Breach Detection Systems (FireEye, Fidelis, etc.), to Endpoint Protection Systems like AV-AS, HIDS, there are a host of security management systems that help to detect potential Security incidents needing action. Even Physical Security systems, Industrial control systems, etc. can be detecting Incidents. Never before has incident detection been important than today and it comes as little surprise when organizations globally want to look at Incident Detection as an important tenet in their security posture. But before embarking on an Incident detection journey, it is important to understand the basics of Incident detection and how it forms the foundations of CSIRT functions world over. So let us start with the introduction.

Security Events are not Security Incidents:  Confused??? Don’t be. Yes, Security events are not Security incidents. Both are different and here’s Why? Security products and technologies generate several actionable items. Helpdesk, Consumers, Business, Audit and compliance and even a Security guard reports Security issues. All these together are “Security Events”. However, not all of these events are fit enough to become Security Incidents. The Events have to be carefully validated for Relevance, Authenticity, Impact and Urgency. Only after this initial validation does an event qualify as a Security incident worth investigating. In short “A Security Incident is a Qualified Security Event”. If a team where to focus on every single Security event as a Security Incident, it will be an Operational nightmare. Hence it is important to perform Event Management or Event Handling.

Event Management: Every organization should have an effective Event Management process. The Event management philosophy should be “Many inputs (Event Reporting) but One Output (Incident)”. At a broad level there are 2 major Input sources to a Central Event Management system. They are described below:

  • Automated Event Reporting: Most of the Security tools and technologies today generate several Security events daily. However, it is always difficult to individually handle these events when there are several point products in the market today. But, with the advent of SIEM, gathering, correlating and real-time alerting of these Security events is now possible. In SIEM parlance, this is done using “Use Cases”. Years back, we published a post on “Use Case Development Framework for SIEM” which went in enough details on how to build Use Cases on SIEM. This Automated Event reporting thus becomes the most important input into the Central Event Management function.
  • Manual Event Reporting: Anyone from the Business, Legal, Consumers, End Users etc. can report potential Security events to an organization. Generally, most of the organizations have a IT Helpdesk as the central reporting desk for such issues. The reporting is typically done through and email system or through a phone call. Several organizations have an online self help ticketing system to report such events too. However, these have to be handled manually.

Event Qualification:

Once the events are reported automatically or manually, the next step is Event Qualification. Before making a determination whether the event is an Security Incident or not, a few deterministic questions need to be answered. Some of those are listed below:

  • Date  Date of event discovery
  • Time  Time of event discovery
  • Time Zone  Time zone of the event source is critical when systems or businesses are geographically dispersed
  • How was the event discovered?
  • What is the impact of this event and what locations are impacted?
  • Is the event ongoing?
  • Event Reporter contact information?
  • Type of data or systems affected (if available)

Based on the responses, an initial determination can be made about the nature of the event. If this event is a Non-Security related event, they it can be routed to the respective teams for further investigation and resolution. If the event is indeed a Security related, it is raised to the Incident Detection & Response team or the CSIRT team as a Security Incident for further investigation and response.

After generating a Incident…

Once an Incident is generated from Event/Events, it has to be classified and categorized. This is the main function of Incident Classification function.

Go back or Continue reading Part 2 – Incident Classification

Part 2 – Incident Classification

Introduction:

As discussed in Part 1 – Incident Detection, once the incident is detected, it needs to be categorized appropriately for Type, Severity and Impact so that necessary response actions can be taken. Incident Classification as such has two major parts to it – One is the Incident Categorization and the other is the Incident Severity Rating. Categorization assists in putting the events in to a common bucket for better coordinated and consistent handling while the Severity ratings assist in assigning a “sense of urgency” to the Incident detected. Without Incident Classification, a CSIRT function can quickly disintegrate into a pile of Operational mess. In fact, many organizations struggle with this aspect of CSIRT function.  Hence, In this post, we will try to help readers with a simple  and practical approach that we have seen work when it comes to Incident Classification.

Incident Categorization:

As mentioned at the beginning of the post, Categorization is similar to bucketing. However, the biggest question organizations face is “How do I bucket incidents?”, “What reference can I use”.

To order this, I am going to use two popular categorization standards. One if the all famous NIST categorization standard, and the other is the FIRST categorization standard.

NIST Categories:

US Federal agency has been at the forefront of Incident detection and response and they have come up with Incident categories to assist in Incident reporting and response.  In the diagram below, you can see the NIST categories listed.

NIST CAT0-6

 

FIRST Categories: 

Apart from NIST, organizations like FIRST have also come out with several guidelines for Incident Categorization. The diagram below shows one of their examples (from Cisco) for Categorization.

Picture2

Now, both these models may not suit you entirely, but in general, we believe have at least 5 or 6 Categories will assist in better Incident Categorization. For example:

  • CAT 1 – Unauthorized Access, Compromised machine, Compromised Asset, Data Theft, Espionage etc.
  • CAT 2 – Denial of Service (DoS/DDoS)
  • CAT 3 – Malware or Malicious Code
  • CAT 4 – Reconnaissance or Scans or Probes etc. 
  • CAT 5 – Policy Violations or Improper Usage
  • CAT 6 – Others or Uncategorised 

This above categorization does a few things to combine the best of both worlds from NIST and FIRST and order them in a nice fashion in order of importance. CAT 1 being the most critical category and CAT 6 being uncategorised.

Incident Severity Rating:

Now that the Incident has been categorized, it is important to assign a Severity rating to the same. Severity ratings are typically done using Likelihood and Impact. One of the matrix which comes in quite handy is given below.

Picture1

 

As you can see, the Severity rating is basically a 5 step scale from Very Low to Critical. It has Impact and Likelihood as a matrix to help decide the severity. Both Impact and Likelihood typically arbitrary and left to the judgement of the person handling the Incident. However, many organizations tend to define this as much as possible.

Impact, a more business risk related term can be quantified by using Asset Values, Data Sensitivity Classifications, etc. while Likelihood is a more technical input arising out of the “Incident” itself like “Well known exploit = Easily Available = Likely” or “Current Infection Vector in our organization = Currently Spreading = Almost Certain”.

In Summary, With the Incident Categorization and Severity rating, you can easily bucket an incident thereby completing the Incident Classification phase of CSIRT Framework.

Go Back or Proceed with Part 3 – Incident Handling

Part 3 – Incident Handling

Introduction:

As discussed in Part 1 – Incident Detection and Part 2 – Incident Classification ,  identifying and accurately classifying an incident based on category and severity are the most important and foremost steps in an Incident Response process. Now comes the most important part of Incident Handling. Readers may have known or used Incident handling process and procedures for a long period of time, but if we were to compare each of them side by side, they would all be similar in purpose but different in execution. Consider Incident handling to be like an organization’s signature – “Unique and cannot be replicated easily”

With this post, we are trying to provide our “unique signature” regarding Incident handling.

Pre-requisites:

Before actually getting the work started, it is important to define the foundational blocks. Without these pre-requisites, a structured Incident response will be difficult. These pre-requisites are:

  1. Responder Groups – People who “do the analysis and investigation” on the ground are called Responder groups. These need to be defined as part of the CSIRT governance function. In smaller organizations, this can be one or two persons. They are typically the analysts, reverse engineers, forensic experts etc. They are the first line of defence.
  2. Resolver Groups – People who  “do the re-mediation” are the resolver groups. Typically, this group gets into action mostly post-incident. However, these groups also assist during the incident investigation from an infrastructure angle. They are typically comprised of Network teams, Server teams, Application teams etc.
  3. Management Groups – People who are the “top brass in the organization”  are the management groups. These groups are very important and need to be activated if the incident impact is going to be enterprise wide. They are typically comprised of ISM (info-sec manager), CISO, CIO, CTO etc.
  4. External Communication Groups – People outside of the IT department like Legal, HR, Crisis team,  regulators etc. are called external communication groups. These groups take care of interacting with the external agencies like law and order, media, shareholders etc.
  5. Communication Protocols – Defining “How” to communicate among the various groups is very important because these can’t be established during a live Cyber Incident. Some of the protocols can be email, phone call, template forms, ticketing systems, encrypted communication lines etc.
  6. Service Level Agreements (if any) – Organizations as they mature in their process of operating a CSIRT want to track performance efficiency of their people and process. This can be done using SLA metrics by defining the “time to respond” and the “time to resolve” or in ITIL terms “Response SLA” and “Resolution SLA”.

Incident Analysis:

Every qualified and classified Incident (Part 1 and Part 2 of CSIRT Function) has to be analysed as per its merit. While the skeletal for the analysis is the same, the content and the context differs from organization to organization. In general, every analysis starts with the following 2 questions:

  1. What we know? – The answer to the question typically lies in gathering the details regarding the incident. The details can be as follows:
    • Victim user/machine details like user name, machine name, IP Address etc.
    • Logs that triggered the incident. The logs are typically from SIEM or the point products themselves.
    • Attacker information from the logs like Attacker IP Address, Domain etc.
    • Attack pattern if it is a signature alert from IDS/IPS/WAF etc.

2. What we don’t know? – This is everything else about the incident that we are yet to investigate or determine. This is the perfect jump off point for investigation. Some of the most common items in this list are as follows:

  • Forensic Analysis of the machine like disk analysis and memory analysis.
  • Attack Vector synthesis
  • Static and Dynamic analysis of malcode, Reversed binaries etc.
  • Impact and spread of the attack in terms of data stolen, machines compromised, monetary impact etc.

Once the “known and the unknown” are identified, listing down the course of action becomes easy. This will primarily assist in a timely and coordinated response. In this post, we will not be discussing the individual tools used in analysis, however, we will be talking about the overall process involved.

Incident Communication:

Once the Incident analysis is under way, there is bound to be a constant flow of information coming from the responder groups. Communicating this to the appropriate stakeholders is key to effective Incident handling. Different organizations have different communication protocols and as mentioned above in the Pre-requisite sections, defining this can be along these lines:

  • Establish a communication protocol – Who to call? What is the number to call? What times to call?
  • Primary and Secondary contact persons
  • Communication template – Email, Report, SMS, Ticket updates, calls etc.
  • Timelines – For example, First update – within 30 minutes, Second update – Within 1 hour of First update etc..

In our opinion, incident communication is one of the most under-rated aspects of incident handling and getting this right is important.

Post-Analysis

Once the analysis is complete, a decision needs to be made based on the collected facts. The decision can be to continue with the Incident containment function or move directly to the Incident Recovery function.

Achieve Nirvana in Information Security

Follow

Get every new post delivered to your Inbox

Join other followers: