Part 4 – Incident Containment


Every incident requires careful investigation and response. One of the oft used strategies by CSIRT teams is Incident Containment. By definition Incident containment is a function that assists to limit and prevent further damage from happening along with ensuring that there is no destruction of forensic evidence that may be needed for legal actions against the attackers later.

Firstly, Containment is a strategy:

Usually, organizations think that containment is a process step that we need to follow during Incident Response. But in our opinion, Incident containment should be a Strategy. Once a containment strategy is defined, the respective tools & technologies can be selected to participate in the fulfilment of the strategy. Process pieces will eventually follow. Containment strategies can be defined based on the focus area in the IT Infrastructure. It can be at the perimeter, extended perimeter, internal tier or at the end point or it can also be a combination of any of the above. Mostly, the strategy is dependent on understanding your IT infrastructure and making the best use of the infrastructure. That is why it is not the same for every organization and rightly so. We would like to list down a few examples of such containment strategies below:

Examples of Perimeter & Extended Perimeter Strategy – Stop the outbound communication from infected machine, block inbound traffic, IDS/IPS Filters, Web Application Firewall policies, null route DNS, fail-over to backup link, switch to secondary data centre etc…

Examples of Internal Networks Strategy – Switch based VLAN isolation, router based segment isolation, port blocking, IP or MAC Address blocking, ACLs etc..

Examples of Endpoint Strategy – Disconnecting the laptop/desktop, powering off the servers, blocking rules in Desktop firewall, HIPS etc…

Based on these examples, you can get an idea of what each of the strategies look like. It is also important to categorize them as being effective for the various “Incident Categories” defined in Part 2 – Incident Classification, thereby making it easier to define process and procedures specific to the categories defined. Also, it is imperative to define which strategy is “Short Term” and which is “Long Term”

What is Short Term Containment? – Typically short term containment is break fix or quick heal. The objective of the short term containment is to prevent the asset or the user from causing further damage in the organization. It is akin to a Quarantine mechanism in AV software, where it is not removed, however its potential to create further damage has been quelled. Everyone reading this post would definitely have implemented short term containments in their CSIRT life. Remember “pull the plug”, “block the mac”, “disable the user” etc. However, it is important to note that this does not fix the real reason an incident happens. It also does not stop an incident from recurring on a different asset in the organization. This is where Long term containment comes into play.

What is Long Term Containment? – Long term containment is a enterprise wide fix that is a step short of complete re-mediation of an incident root cause or attack vector. The objective of Long term containment is to stop other users or assets in the organization from getting impacted by the same incident. Input to long term containment comes from the Incident Handling phase where the appropriate investigations have been done and the possible attack vectors or infection methods have been identified. Till a full fledged enterprise wide re-mediation efforts are carried out, steps like putting a WAF behavioural policy, a custom SNORT signature to block the attack pattern, a HIPS policy for system lock down, etc. can be considered as long term containment strategies.

Validating the Strategy: Once a strategy is identified and categorized, it has to be tested for effectiveness in the field. Now, such validations cannot happen during a live incident. Hence it is important to validate the efficacy of the strategy, the timeliness of execution, the responsible parties, potential pitfalls etc. This validation also will pave way for planning the process steps required for the containment plans to work. This can be done using simulations and test runs of incidents, which will help fine tune the strategy and co-ordination of the teams.

Monitoring Effectiveness: Now that you have a validated Incident Containment strategy, the next step is to ensure that your strategy was effective against the Attack Vector. This is where monitoring of the Attack Vector, Targeted Victims, Outbound Traffic from the victims etc. become important measures of effectiveness. This can be a simple monitoring rule in SIEM products with a forward looking time frame, or it could be a completely monitored network segmentation.

In our opinion, a validated containment strategy, a detailed containment plan and an effective monitoring routine together make Incident Containment whole and meaningful. The next steps after containment are Incident Recovery.

Go back or Continue reading Part 5 – Incident Recovery

Part 5 – Incident Recovery


Incidents can’t be avoided entirely, however the damage can be greatly minimized by a mature Incident Detection and Response function. In the CSIRT Series, we have been looking in detail at the various functions that make up a good IR process framework. Incident Containment and Incident Recovery are complimentary processes. While  containment is aimed at stopping the spread of a breach, Recovery is all about getting back on feet by reversing to a “Known good state”. The “known good state” in our opinion is very ambiguous in its meaning. It may apply to a single machine, or an entire network. However, in our opinion, Recovery process or getting back to a “Known good state” is a combination of three sub-steps:

  1. Pre-Recovery – Forensics Evidence Collection in our opinion is a Pre-Recovery step.  This is a critical process and is important for collecting and maintaining evidence that may be required to pursue future legal actions.
  2. Recovery from Backup – Ensure that systems or networks are returned to the pre-breach state.
  3. Post-Recovery –  As a post-recovery step, Remediation of the threat vector is crucial. A process to ensure that the infection or threat vector is a non-issue.

Let us look at each of these sub-steps in detail

Pre-Recovery – In cases which need legal course of action, it is important that we clearly document how all evidence has been collected, preserved and handled so that it is admissible in court. This is called Forensic Evidence Collection. It is key to note that legal requirements vary from region to region, jurisdiction to jurisdiction and a forensic person should be aware of that. It is recommended to have some of the team members obtain computer forensics training and certification to be able to handle the entire process end to end. However, it is not un-common to get professional third party help for conducting Forensic evidence collection and investigations during an Incident. Forensics is a standalone field in itself and to detail all the process steps her would be impractical. Hence, we have tried to give a succinct summary of what forensics entails:

    1. Determine legal issues regarding the incident that may cause an impact
    2. Determine technology and processes within the scope of the forensic analysis
    3. Identify evidence from the infected machine or person. The evidence can be electronic or physical.
      • Document and Collect the identified Evidence following the chain of custody
      • Perform Forensics Investigation and analysis.

Once the incident forensic process has been initiated, it is possible that the incident may need to be reclassified based on the results. Based on this, the entire recovery and remediation process attains a different color. For Example: A malicious code incident was originally triaged and classified as a medium security incident. The forensic analysis reveals that the malicious code has installed hidden back-door processes that can now be traced to additional systems that were not originally identified as being affected. The incident should be reclassified from a medium security incident to a high security incident.

Recovery from Backup – If the incident fits the criteria of high severity and or high impact, the CSIRT team should determine if IT business continuity, disaster recovery, and or backup restoration procedures should be initiated. The reason this is limited to high severity and high impact incidents is nothing but practical consideration. The goal of the Recovery phase is to safely put the impacted systems back into production. To complete the recovery process the following three steps have to be followed:

  • Validation  of the recovered systems  – Involves asking the user base if the system is operating properly or comparing that the ports and services of the system are consistent using profiling tools
  • Restoring Operations – Involves placing the system into full production, allowing it to interact fully with other devices on the network
  • Monitoring – Involves checking systems for back-doors or any other issues which may have escaped previous detection. If possible, host-based and network based monitoring should be used to compare that the attacker did not leave any back-doors on the system

Ultimately, when services are restored, the system should have an effective defence against future attacks of the same nature. Any access methods which may have been used to conduct such an attack should be corrected. When restoring services, systems or data from archived backups, consideration should be taken based on the type of attack, the data affected and most importantly the timeline in which the attack initially took place. This information should have been discovered and documented as part of the forensic analysis. This step in the recovery process is critical so that vulnerabilities, malware or corrupted data is not re-introduced into the operating environment. Depending on the severity of the incident, it may be required to do a full system rebuild in order to re-establish the integrity of the system.

Post-Recovery – Once the recovery is completed, Incident remediation steps should be followed. Most of the times, the Threat Vector will be a System vulnerability or Network vulnerability. For such vectors, available patches or system updates should be applied. System hardening techniques may also need to be applied and core deployment images may need to be updated to prevent the introduction of the weakness elsewhere in the organization. In the case of Non-Vulnerability related vector, the root cause should be identified and appropriate fixes have to be implemented.


It is important to have a well defined and smooth functioning recovery capability in the CSIRT team.  Without recovery capabilities, the probability of a security incident or issue recurring persists.
Go back or Continue to Part 6 – Continuous Improvement

Part 6 – Continuous Improvement


One of the things we do in the Incident Recovery phase is to  determine the root cause of the incident and to identify appropriate remediation steps. This typically follows the Root Cause Analysis workflow which many of you are aware of.  Once the remediation is done, it is important to document the “lessons learnt”.

Why is it important?

Lessons learnt are an important aspect of a CSIRT organization. “A stationary object gathers more moss”. This is the philosophy of a CSIRT organization – Continuous evolution and improvement. This is typically a 15 to 30 minute exercise every CSIRT member who handled the incident should go through. In this exercise, the following key items should be discussed:

  1. What process, technology or people worked?
  2. What did not work? Why?
  3. Response and Resolution effectiveness? Why?
  4. Any recurring issues or themes?

Once answers for all these questions have been penned down and discussed, a detailed action plan needs to be devised on how to improve the CSIRT function. The Action plan can be categorized under two major groups:

  • Control Improvements  – This section should describe any changes or improvements that should be put in place to better detect future incidents of this type and/or prevent similar incidents. Some of the examples are
    1. Policy Changes, typically related to organization wide policies related to user, IT systems etc.
    2. Monitoring System changes, typically these are configuration changes that will be made in SIEM, perimeter or endpoint defences to improve better detection and efficient reporting
    3. Architectural Changes, typically are long term major changes in the way the systems are built.
  • Process Improvements – This section should describe any improvements that could be made to the actual response process itself. Some of the examples are:
    1. Improving the Incident handling cheat sheet with additional details
    2. Improving the communications plan to get speedier response
    3. Escalation matrix improvements
    4. Process automation
    5. Staff training and awareness

Rinse and Repeat!!!

As you can see from above, the goal is not to do this exercise as a one time activity. Instead this is a repetitive process. However, this may not be possible for every single incident that is detected and worked by the CSIRT. Hence, this is where practicality dictates that this process should be done in a way that is scalable. Keeping that in mind, below is a recommended approach for doing this:

  1. Perform “Lessons Learnt” exercise for all Major and High Severity incidents
  2. Perform “Lessons Learnt” exercise for all repeat incident category (refer to Incident Classification for more details)
  3. Perform “Lessons Learnt” exercise on a monthly or quarterly basis for CSIRT processes.


Lessons learnt are an important part of continued learning and quest for functional perfection. CSIRT is no different and it should also be improved on a regular basis. These improvements should be aimed at efficiently detecting and responding to cyber incidents in a timely fashion.

With this post, the CSIRT Series comes to a conclusion. Please feel free to post your comments on the section below.