Galleries

Part 2 – Incident Classification

Introduction:

As discussed in Part 1 – Incident Detection, once the incident is detected, it needs to be categorized appropriately for Type, Severity and Impact so that necessary response actions can be taken. Incident Classification as such has two major parts to it – One is the Incident Categorization and the other is the Incident Severity Rating. Categorization assists in putting the events in to a common bucket for better coordinated and consistent handling while the Severity ratings assist in assigning a “sense of urgency” to the Incident detected. Without Incident Classification, a CSIRT function can quickly disintegrate into a pile of Operational mess. In fact, many organizations struggle with this aspect of CSIRT function.  Hence, In this post, we will try to help readers with a simple  and practical approach that we have seen work when it comes to Incident Classification.

Incident Categorization:

As mentioned at the beginning of the post, Categorization is similar to bucketing. However, the biggest question organizations face is “How do I bucket incidents?”, “What reference can I use”.

To order this, I am going to use two popular categorization standards. One if the all famous NIST categorization standard, and the other is the FIRST categorization standard.

NIST Categories:

US Federal agency has been at the forefront of Incident detection and response and they have come up with Incident categories to assist in Incident reporting and response.  In the diagram below, you can see the NIST categories listed.

NIST CAT0-6

 

FIRST Categories: 

Apart from NIST, organizations like FIRST have also come out with several guidelines for Incident Categorization. The diagram below shows one of their examples (from Cisco) for Categorization.

Picture2

Now, both these models may not suit you entirely, but in general, we believe have at least 5 or 6 Categories will assist in better Incident Categorization. For example:

  • CAT 1 – Unauthorized Access, Compromised machine, Compromised Asset, Data Theft, Espionage etc.
  • CAT 2 – Denial of Service (DoS/DDoS)
  • CAT 3 – Malware or Malicious Code
  • CAT 4 – Reconnaissance or Scans or Probes etc. 
  • CAT 5 – Policy Violations or Improper Usage
  • CAT 6 – Others or Uncategorised 

This above categorization does a few things to combine the best of both worlds from NIST and FIRST and order them in a nice fashion in order of importance. CAT 1 being the most critical category and CAT 6 being uncategorised.

Incident Severity Rating:

Now that the Incident has been categorized, it is important to assign a Severity rating to the same. Severity ratings are typically done using Likelihood and Impact. One of the matrix which comes in quite handy is given below.

Picture1

 

As you can see, the Severity rating is basically a 5 step scale from Very Low to Critical. It has Impact and Likelihood as a matrix to help decide the severity. Both Impact and Likelihood typically arbitrary and left to the judgement of the person handling the Incident. However, many organizations tend to define this as much as possible.

Impact, a more business risk related term can be quantified by using Asset Values, Data Sensitivity Classifications, etc. while Likelihood is a more technical input arising out of the “Incident” itself like “Well known exploit = Easily Available = Likely” or “Current Infection Vector in our organization = Currently Spreading = Almost Certain”.

In Summary, With the Incident Categorization and Severity rating, you can easily bucket an incident thereby completing the Incident Classification phase of CSIRT Framework.

Go Back or Proceed with Part 3 – Incident Handling

Part 3 – Incident Handling

Introduction:

As discussed in Part 1 – Incident Detection and Part 2 – Incident Classification ,  identifying and accurately classifying an incident based on category and severity are the most important and foremost steps in an Incident Response process. Now comes the most important part of Incident Handling. Readers may have known or used Incident handling process and procedures for a long period of time, but if we were to compare each of them side by side, they would all be similar in purpose but different in execution. Consider Incident handling to be like an organization’s signature – “Unique and cannot be replicated easily”

With this post, we are trying to provide our “unique signature” regarding Incident handling.

Pre-requisites:

Before actually getting the work started, it is important to define the foundational blocks. Without these pre-requisites, a structured Incident response will be difficult. These pre-requisites are:

  1. Responder Groups – People who “do the analysis and investigation” on the ground are called Responder groups. These need to be defined as part of the CSIRT governance function. In smaller organizations, this can be one or two persons. They are typically the analysts, reverse engineers, forensic experts etc. They are the first line of defence.
  2. Resolver Groups – People who  “do the re-mediation” are the resolver groups. Typically, this group gets into action mostly post-incident. However, these groups also assist during the incident investigation from an infrastructure angle. They are typically comprised of Network teams, Server teams, Application teams etc.
  3. Management Groups – People who are the “top brass in the organization”  are the management groups. These groups are very important and need to be activated if the incident impact is going to be enterprise wide. They are typically comprised of ISM (info-sec manager), CISO, CIO, CTO etc.
  4. External Communication Groups – People outside of the IT department like Legal, HR, Crisis team,  regulators etc. are called external communication groups. These groups take care of interacting with the external agencies like law and order, media, shareholders etc.
  5. Communication Protocols – Defining “How” to communicate among the various groups is very important because these can’t be established during a live Cyber Incident. Some of the protocols can be email, phone call, template forms, ticketing systems, encrypted communication lines etc.
  6. Service Level Agreements (if any) – Organizations as they mature in their process of operating a CSIRT want to track performance efficiency of their people and process. This can be done using SLA metrics by defining the “time to respond” and the “time to resolve” or in ITIL terms “Response SLA” and “Resolution SLA”.

Incident Analysis:

Every qualified and classified Incident (Part 1 and Part 2 of CSIRT Function) has to be analysed as per its merit. While the skeletal for the analysis is the same, the content and the context differs from organization to organization. In general, every analysis starts with the following 2 questions:

  1. What we know? – The answer to the question typically lies in gathering the details regarding the incident. The details can be as follows:
    • Victim user/machine details like user name, machine name, IP Address etc.
    • Logs that triggered the incident. The logs are typically from SIEM or the point products themselves.
    • Attacker information from the logs like Attacker IP Address, Domain etc.
    • Attack pattern if it is a signature alert from IDS/IPS/WAF etc.

2. What we don’t know? – This is everything else about the incident that we are yet to investigate or determine. This is the perfect jump off point for investigation. Some of the most common items in this list are as follows:

  • Forensic Analysis of the machine like disk analysis and memory analysis.
  • Attack Vector synthesis
  • Static and Dynamic analysis of malcode, Reversed binaries etc.
  • Impact and spread of the attack in terms of data stolen, machines compromised, monetary impact etc.

Once the “known and the unknown” are identified, listing down the course of action becomes easy. This will primarily assist in a timely and coordinated response. In this post, we will not be discussing the individual tools used in analysis, however, we will be talking about the overall process involved.

Incident Communication:

Once the Incident analysis is under way, there is bound to be a constant flow of information coming from the responder groups. Communicating this to the appropriate stakeholders is key to effective Incident handling. Different organizations have different communication protocols and as mentioned above in the Pre-requisite sections, defining this can be along these lines:

  • Establish a communication protocol – Who to call? What is the number to call? What times to call?
  • Primary and Secondary contact persons
  • Communication template – Email, Report, SMS, Ticket updates, calls etc.
  • Timelines – For example, First update – within 30 minutes, Second update – Within 1 hour of First update etc..

In our opinion, incident communication is one of the most under-rated aspects of incident handling and getting this right is important.

Post-Analysis

Once the analysis is complete, a decision needs to be made based on the collected facts. The decision can be to continue with the Incident containment function or move directly to the Incident Recovery function.

Part 4 – Incident Containment

Introduction

Every incident requires careful investigation and response. One of the oft used strategies by CSIRT teams is Incident Containment. By definition Incident containment is a function that assists to limit and prevent further damage from happening along with ensuring that there is no destruction of forensic evidence that may be needed for legal actions against the attackers later.

Firstly, Containment is a strategy:

Usually, organizations think that containment is a process step that we need to follow during Incident Response. But in our opinion, Incident containment should be a Strategy. Once a containment strategy is defined, the respective tools & technologies can be selected to participate in the fulfilment of the strategy. Process pieces will eventually follow. Containment strategies can be defined based on the focus area in the IT Infrastructure. It can be at the perimeter, extended perimeter, internal tier or at the end point or it can also be a combination of any of the above. Mostly, the strategy is dependent on understanding your IT infrastructure and making the best use of the infrastructure. That is why it is not the same for every organization and rightly so. We would like to list down a few examples of such containment strategies below:

Examples of Perimeter & Extended Perimeter Strategy – Stop the outbound communication from infected machine, block inbound traffic, IDS/IPS Filters, Web Application Firewall policies, null route DNS, fail-over to backup link, switch to secondary data centre etc…

Examples of Internal Networks Strategy – Switch based VLAN isolation, router based segment isolation, port blocking, IP or MAC Address blocking, ACLs etc..

Examples of Endpoint Strategy – Disconnecting the laptop/desktop, powering off the servers, blocking rules in Desktop firewall, HIPS etc…

Based on these examples, you can get an idea of what each of the strategies look like. It is also important to categorize them as being effective for the various “Incident Categories” defined in Part 2 – Incident Classification, thereby making it easier to define process and procedures specific to the categories defined. Also, it is imperative to define which strategy is “Short Term” and which is “Long Term”

What is Short Term Containment? – Typically short term containment is break fix or quick heal. The objective of the short term containment is to prevent the asset or the user from causing further damage in the organization. It is akin to a Quarantine mechanism in AV software, where it is not removed, however its potential to create further damage has been quelled. Everyone reading this post would definitely have implemented short term containments in their CSIRT life. Remember “pull the plug”, “block the mac”, “disable the user” etc. However, it is important to note that this does not fix the real reason an incident happens. It also does not stop an incident from recurring on a different asset in the organization. This is where Long term containment comes into play.

What is Long Term Containment? – Long term containment is a enterprise wide fix that is a step short of complete re-mediation of an incident root cause or attack vector. The objective of Long term containment is to stop other users or assets in the organization from getting impacted by the same incident. Input to long term containment comes from the Incident Handling phase where the appropriate investigations have been done and the possible attack vectors or infection methods have been identified. Till a full fledged enterprise wide re-mediation efforts are carried out, steps like putting a WAF behavioural policy, a custom SNORT signature to block the attack pattern, a HIPS policy for system lock down, etc. can be considered as long term containment strategies.

Validating the Strategy: Once a strategy is identified and categorized, it has to be tested for effectiveness in the field. Now, such validations cannot happen during a live incident. Hence it is important to validate the efficacy of the strategy, the timeliness of execution, the responsible parties, potential pitfalls etc. This validation also will pave way for planning the process steps required for the containment plans to work. This can be done using simulations and test runs of incidents, which will help fine tune the strategy and co-ordination of the teams.

Monitoring Effectiveness: Now that you have a validated Incident Containment strategy, the next step is to ensure that your strategy was effective against the Attack Vector. This is where monitoring of the Attack Vector, Targeted Victims, Outbound Traffic from the victims etc. become important measures of effectiveness. This can be a simple monitoring rule in SIEM products with a forward looking time frame, or it could be a completely monitored network segmentation.

In our opinion, a validated containment strategy, a detailed containment plan and an effective monitoring routine together make Incident Containment whole and meaningful. The next steps after containment are Incident Recovery.

Go back or Continue reading Part 5 – Incident Recovery