One of the things we do in the Incident Recovery phase is to determine the root cause of the incident and to identify appropriate remediation steps. This typically follows the Root Cause Analysis workflow which many of you are aware of. Once the remediation is done, it is important to document the “lessons learnt”.
Why is it important?
Lessons learnt are an important aspect of a CSIRT organization. “A stationary object gathers more moss”. This is the philosophy of a CSIRT organization – Continuous evolution and improvement. This is typically a 15 to 30 minute exercise every CSIRT member who handled the incident should go through. In this exercise, the following key items should be discussed:
What process, technology or people worked?
What did not work? Why?
Response and Resolution effectiveness? Why?
Any recurring issues or themes?
Once answers for all these questions have been penned down and discussed, a detailed action plan needs to be devised on how to improve the CSIRT function. The Action plan can be categorized under two major groups:
Control Improvements – This section should describe any changes or improvements that should be put in place to better detect future incidents of this type and/or prevent similar incidents. Some of the examples are
Policy Changes,typically related to organization wide policies related to user, IT systems etc.
Monitoring System changes, typically these are configuration changes that will be made in SIEM, perimeter or endpoint defences to improve better detection and efficient reporting
Architectural Changes, typically are long term major changes in the way the systems are built.
Process Improvements – This section should describe any improvements that could be made to the actual response process itself. Some of the examples are:
Improving the Incident handling cheat sheet with additional details
Improving the communications plan to get speedier response
Escalation matrix improvements
Staff training and awareness
Rinse and Repeat!!!
As you can see from above, the goal is not to do this exercise as a one time activity. Instead this is a repetitive process. However, this may not be possible for every single incident that is detected and worked by the CSIRT. Hence, this is where practicality dictates that this process should be done in a way that is scalable. Keeping that in mind, below is a recommended approach for doing this:
Perform “Lessons Learnt” exercise for all Major and High Severity incidents
Perform “Lessons Learnt” exercise for all repeat incident category (refer to Incident Classification for more details)
Perform “Lessons Learnt” exercise on a monthly or quarterly basis for CSIRT processes.
Lessons learnt are an important part of continued learning and quest for functional perfection. CSIRT is no different and it should also be improved on a regular basis. These improvements should be aimed at efficiently detecting and responding to cyber incidents in a timely fashion.
With this post, the CSIRT Series comes to a conclusion. Please feel free to post your comments on the section below.
At Infosecnirvana, we did a post on SIEM Comparison – 101 and a lot of readers where interested in evaluating AlienVault SIEM and how it stacks up against the Usual suspects like ArcSight, QRadar, McAfee Nitro, Splunk etc. Well, we listened and this post is about our take on AlienVault SIEM, its strengths, weakness and many more.
AlienVault is the enterprise avatar of Open Source SIM (OSSIM). AlienVault has a number of software components, which when put together provides what is now called a Unified Security Management tool or USM in short. The components are:
Munin, for traffic analysis and service watchdogging.
NFSen/NFDump, used to collect and analyze NetFlow information.
FProbe, used to generate NetFlow data from captured traffic.
AlienVault also includes lot of proprietary tools, the most important being a powerful correlation engine.
The combinations of all these tools have been seamlessly put together in AlienVault USM and is really a winner in the SME segment of the market. They have a nice feature set, and with the entire re-organization, additional funding, infusing new leadership etc. had made AlienVault a serious contender in the SIEM space. They are the sole contender in the Visionaries Quadrant in the 2014 Gartner Report. In short, it is like the UTM of SIEM technology. Now, is that good? Or is that bad?
What is good?
Flexible Deployment Architecture – This is where the Open Source roots really start to flex their muscles when it comes to AV USM. The 3 main components of the Architecture are as follows:
AV Sensor – AV Sensors perform Asset Discovery,
Vulnerability Assessment, Threat Detection, and Behavioral Monitoring in addition to receiving raw data from event
logs and helping in monitoring network traffic (including Flow). The sensors also perform Normalization of the received raw events and communicates them to the AV Server for correlation and reporting.
AV Server – AV Server is the Central management console that provides USM capabilities under a single GUI. The Server receives normalized data from the sensors, correlates and prioritizes the events and generates Security Alerts or Alarms. The server also provide a variety of reporting and dash-boarding capabilities as well.
AV Logger – AV Logger provides the capability to archive log files for purposes of forensic analysis and to meet compliance requirements for long term retention and management.
All the architecture components including the Sensor, the Logger, the Correlation Engine etc, can be deployed tier based, isolated or in a consolidated All-in-One style. This wide variety of deployment options help customers to have flexible and open architectures. This also in a way helps control cost depending on the budget at hand. Very rarely can products boast of such flexibility.
A Jack of All… – The best thing about AlienVault USM is being a “Jack of All” solution. They provide SIEM, HIDS/NIDS, FIM, NetFlow, Asset management, Vulnerability Management etc. under one USM platform. None of the commercial SIEM vendors like ArcSight, McAfee, etc. can boast of such diverse feature set. QRadar in my opinion is the closest to AV USM in terms of feature diversity. While all the features are formerly isolated Open Source community projects, the USM does a good job of integrating them in to a feature set. While they are not great as individual parts, they more than make up as a sum of the parts.
OTX – Open Threat Exchange is a wonderful community sharing platform that helps clients to share IP and URL reputation information so that all AV customers can benefit. This is true community sharing modeled on the likes of the Splunk Community (for app development). This has the potential to grow into a large source of Real World Intelligence and what AlienVault intends to do with this data remains to be seen. For now, it is being used by USM Correlation engine to provide better context and content for Security monitoring. AlienVault Labs, is also utilizing this infrastructure to constantly update Detection rules for malware vectors, vulnerability exploits etc. QRadar and ArcSight provide Intelligence, but it is commercial intelligence and not community intelligence. With community intelligence, you get more hits than misses.
Multi-Tenancy – While this feature may not elucidate an interest from many readers, those who have worked in an MSSP environment can understand why this is a very important feature to have. AV USM does support Multi-Tenancy out of the box. This, when combined with the Architecture flexibility provide great MSSP models to sell and operate. The key is to understand how the multi-tenancy works. Basically, a single database is used to store data of several customers using a Data isolation Logic and Permission control. The data isolation logic is based on Entities created in USM (Assets, Users, Components Assigned (Sensors) etc. are grouped together as a Single Entity) and Permissions (applied in a granular fashion to data sets related to the Entities). QRadar, ArcSight and other major SIEM products provide this as well.
Price: One of the areas where AV USM benefits is Price. They are affordable while offering a whole lot of SIEM features. Mostly, this turns out to be the deciding factor for Small and Medium Enterprise segments. QRadar, ArcSight and Splunk are some of the most expensive SIEM products out there in the market and not everyone has the budge to buy them. In such cases, AV USM is a very cost effective alternative.
Customization: Again, this is one point where AlienVault outshines the competition in capability of customization. We have seen several customers who are using AV USM with heavy customization to perform threat detection, Asset Discovery, Threat scoring, APT detection etc. This flexibility is really desired by Security analysts and AV USM is making good on this promise.
What is bad?
But King of None… – As mentioned in the good, being a jack of all is well suited for certain organizations, but without a mature functionality and expertise in any of those areas is a strong negative. For example, the correlation engine is no where close to the likes of ArcSight , QRadar or Splunk etc. The threat Intelligence is not as good as QRadar, McAfee, RSA etc. And so on and so forth. So when it comes to critical functionality expertise, AV USM is found lacking.
Database: – AV USM is using MySQL for its database. All the issues related to a structured DB for log collection, storage and management come to haunt AV USM as well. All SIEM logs are stored in the MySQL database and this causes an issue in terms of scalability, especially with High log volume environments because backup and restore is time and CPU/RAM consuming. USM can hugely benefit from moving to a Non-DB Log storage architecture, thereby giving more flexibility in data management, but will AV take that route is doubtful. Based on their product direction, they are looking at Percona Server to replace MySQL. While it is a good move, it is still customized MySQL replacement, and may not add much desired scale to the product.
Product Stability: – The biggest issue, we have seen with the product is its poor stability. With way too many components, myriad integration, a ton of scripts, the product is really unstable. Every version upgrade is a nightmare. Re-installation or Re-start is the most common solution for the product to start working again. In a mission critical environment, this is a complete NO-NO. One of the most common and frequently failing component is the DB. Issues like DB corruptions, Access issues, disk errors, unresponsive queries etc. really test the patience of end users on a regular basis. This in our opinion is the most damning negatives about AV USM.
Integration: – While AV USM is known for being customization friendly, the amount of Out-of-the-box plugins for Log Monitoring and Correlation is limited to the well known products. It does not have comprehensive integration capabilities with say legacy applications, Directory services, databases etc that other SIEM vendors boast of. Similarly, it relies mostly on its own “pre-packaged” tools for data enrichment and hence has poor “Third Party” Integration capabilities. However, if you really are a developer of open source products, the integration challenge can be overcome. But how many are willing in the real world enterprise?
Correlation & Workflow: – What good is a SIEM product if it cannot perform advanced Correlation and Operational workflow? AV USM has a strong foundation in Correlation using XML driven Directives and Alarms thresholds. However, when it comes Head-to-Head with the Industry leaders like ArcSight, QRadar, Splunk etc. it falls terribly short. We particularly like the Cyber Kill Chain flow which a lot of customers are using for complete visibility, but this is not the end game in real world enterprise operations where not always all the data points required for the directive is available. Same thing goes for the workflow, where the integration with external ticketing or issue tracking system is very limited and hence acts as a deterrent in large scale deployments.
Technical Support: – One of the common issues we hear about AV support is that it is of inconsistent and poor quality. Most of the times, the solutions rely on re-install or re-start or a bug-fix, because there are way too many components to troubleshoot and this leaves support to resort to re-install or re-start, without thorough root cause analysis.
Product Vision Stagnation: – This may not be much of an issue for potential users of AV USM, however it is important to note that the product has not gone through major leaps in the last 4 years. It had more than 3 major releases and 20+ minor releases, but nothing path breaking has been brought to the market. It has still remained in the “promising products to watch” for way too long. One of the main reasons we think is because of economies of scale. Since they are priced lower and cater to SME segment, the amount of money invested in development is less and hence the result.
In short, we we would like to conclude saying that AV USM is definitely a great addition to organization who want cost effective, quick and easy SIEM solutions. However, it still has to go a long way in competing with the big guns out there for it lacks both in firepower as well as range. So what do you think about AlienVault? Feel free to post your comments below.
SIEM posts have grown in number at Infosecnirvana, but the requests to write about more products keep coming in. One of the oft asked about product is Splunk Enterprise. We have posted on HP ArcSight, IBM QRadar and McAfee Nitro SIEM. However, readers have been asking us repeatedly to write on Splunk.
So here it is finally after being in the works for a long time
In 2003, One of the most interesting products rolled out and vowed to simplify Log management once and for all (and it did!!!) –Splunk. Their motto was simple – Throw logs at me and I will provide a web based console to search through it intuitively. Interestingly they are one of the few companies that have not been acquired, in spite of being a very innovative product. So let’s see what makes Splunk tick.
As always, a product is as good as its architecture. It has to be solid both internally as well as externally (meaning solution deployment, integration, ease of use, compatibility etc.).
Internal Architecture: Under the hood Splunk has two main services – The Splunk Daemon that is written in C++ used for data collection, indexing, search etc. and the The Splunk Web Services that is a web application written using a combination of Python, AJAX, XML, XSLT etc . which provides the super intuitive graphical UI. Splunk also provides API access using REST and it can integrate with any web framework needed. Splunk is one of the few products that still use C++ and Python instead of the clunky Java and its cousins. This provides the edge to Splunk when processing large data volumes thrown at it.
Data Architecture: Splunk is a unique search engine like “data architecture”. In fact, some of the early development was based on the same concept of the path breaking GFS (Google file system) which provided a lot of direction and research into flat file storage, indexing and free text search capabilities with unmatched speed when compared to a relational DB. Splunk went on to master the distributed file system architecture and built their own proprietary data store which powers Splunk Enterprise today.
Deployment Architecture: The deployment of Splunk is based on true Big Data Architecture – Slave and Master, where the Slaves are the Search Indexers and the Master is a search head. Of course you can have both the nodes in the same Physical server, but in a true distributed architecture, you need a master and a slave. Read more at Big Data – What you need to know? to understand better on what Big Data is and how to try your hand at it.
Typical Setup: Lets look at a typical architecture deployment of Splunk in distributed mode.
As you can see, there are three distinct components of this architecture and they are as follows:
Log collectors or Splunk Log Forwarders are installed closer to the source and forward all the logs to Splunk Indexers. This is similar to the Log Collectors in SIEM. They are not great, but are decent enough to get the job done.
The Splunk indexers typically run only the Splunk Daemon service, that receives the data and indexes it based on a pre-defined Syntax (this is akin to parsers but lot more simpler and faster to process). This is then sent to the Splunk data store. Each data store has a set of indexes based on the amount of logs received. The data store can then be configured for retention, hot or cold or warm standby etc. etc. In big data terminology, these are the slave nodes.
These indexers then use a process called as “Summarizer” or in big data terms – “Map reduce” to create a summary index of all the indexes available.
Splunk Search head, which serves as the single console to search across all data stores has the “summary index” to know which Indexer (slave) node to query and what index to query. Now this is where the scalable search power of Splunk comes from. This is the master node in big data world.
What’s good about Splunk?
Search, Search & Search: Splunk is arguably the best search engine for logs out there. We have started looking at ELK, Hadoop and other big data search engines but for the moment, Splunk rules the roost. The Splunk Search Processing Language (SPL) is the reason behind this power. The search can be done historically (on indexed data) or in real time (data before indexing) and this is as good as Log search can get. None of the SIEM products can come close to the search power of Splunk. In other words, Splunk is to search Log Data and SIEM is to search Event Data.
Fully customizable as far as searching capabilities is concerned, Splunk lets us add scripts to search queries, provides field extraction capabilities for custom logs, provides API, SDK and Web framework support to achieve all that you would need for Log management, Investigations, Reporting and alerting.
Web Interface: Even though UI is a subjective benefit, Splunk has one of the most pleasing interfaces we have seen for log management tools. It really is super easy and intuitive to use. It has great visualization capabilities, dashboards, app widgets and what not. It really puts the cool factor in a rather dull log analysis experience.
No Parsing: Basically, Splunk is an “All you can eat” for logs. Splunk follows a “store now, parse later” approach which takes care of receiving any logs thrown at it without any parsing or support issues. If it is a known log type, the indexes are added and updated appropriately. If it is not a known type, still the logs are stored and indexed to be searchable for later. You can then use Field Extractions and build custom field parsings. This is one of the killer differentiators compared to traditional SIEM products as Splunk is a lot more forgiving and agnostic in log collection and storage and does not require specialized connectors or collectors to do the job. This makes it a great log management product.
Splunk Apps help in building on top of the Search head to provide parsing, visualizations, reporting, metrics, saved searching and alerting and even SIEM-like capabilities. This, in my opinion is the power of Splunk compared to the other products in the market. They have an App Store for Splunk Apps. Cool isn’t it? These apps not only are written by product vendors, but also by User community.
Scalability: Splunk is a true big data architecture. It can scale with addition of Indexers and search heads. Ratio of Search Heads to Indexers is at a good 1:6. This means that if you have 1 search head, you can have 6 search indexers. This is very attractive when compared to other SIEM solutions in the market when it comes to scaling at the log management layer.
Not a SIEM: Splunk is not your traditional SIEM. Let me clarify further. SIEM has several things in it that assists in performing security event management, monitoring, operations and workflow. In short the keyword for SIEM is “Operational Security Management”. Now the question is – Can Splunk be an SIEM? The simple answer is YES, however the real answer lies in how much customisation and how much product expertise you have in store to make it a SIEM product.
Poor Correlation: Splunk does not do any correlation as it is not designed to do that. However, it can be used to correlate events using the Splunk search language. You can do manual correlation using piped searches, lookup tables, scripted searches etc. but again you need to be familiar with the language. You can also automate it by scheduled and real time search triggers. However, nothing is out of the box. Anton blogs about Splunk Correlation being far superior to ArcSight (which btw is the best correlation engine we have worked with) but honestly, we don’t have real life implementation experience to justify that.
SIEM App: Splunk has an enterprise SIEM app that aids in SIEM-like functions. But it is definitely not a replacement killer for SIEM product. It is very basic and and does not do much out of the box.
No Aggregation: The logs being sent to Splunk are received as is and sent to the data store. It is not aggregated. This while is a good thing for log collection and search performance, it is not good for underlying storage sizing. SIEM solutions have this capability but Splunk does not. This in turn affects the scalability aspect.
Poor Compression: Many SIEM products have a compression ratio of 10:1. However for Splunk, we have consistently seen the ratio to be around 4:1. This while good for smaller log volumes, is very poor for larger volumes. The main reason for this is that the Indexes take a lot of storage compared to the raw logs. While they aid in greater search capabilities, they increase underlying storage and maintenance cost.
Scalability: Even though, Scalability is one of the benefits of using Splunk for Log management, there is a downside to it too. Add to it the lack of aggregation, compression etc. and you can see how it impacts Scale. For example, Every indexer can handle only 100 – 150 GB/day on a good server hardware. In spite of what people might say about Splunk sizing and performance tuning, from years of personal use and experience, we can safely say that for standard enterprise hardware, this limit is as good as it gets. So assume you are looking at 1 TB/day. You would need 8 indexer servers and 2 search head servers for Splunk. However, if you were to take ArcSight or QRadar, you could do the same on two appliances with compression enabled (10:1 ratio of compression). This from a management perspective leads to larger foot print for Splunk than other SIEM products.
Price: Contrary to popular belief, Splunk can get very expensive very fast. For all the reasons mentioned above, Splunk can get very expensive compared to other SIEM vendors to do large data collection as well as SIEM functionality. In a word – Be Cautious!!!
Conclusion: In our opinion, Splunk is one of the most innovative log management tools out there. But as a SIEM, to use in day to day security management, monitoring, ticketing etc. it has a lot of catching up to do. The ideal scenario will be to use Splunk in the log management layer and use any market leading SIEM in the correlation, workflow and operational management layer. We have seen several successful implementations where Splunk serves as the log management tool and ArcSight or QRadar serves as the Correlation engine. Best of both worlds!!!
Until next time – Ciao!!!
PS: Please feel free to add on to the list of What’s good and bad? based on your experience with Splunk and we will be happy to update our posts appropriately.