SIEM – The Good, The Bad and The Ugly – Part 1

SIEM Technology – The Good, The Bad and The Ugly

SIEM is one of those technologies most of the organizations adopt in the wake of Security Log Analysis/Incident/Event Reporting requirements. If you already know what SIEM technology and want to get into the domain, these are the things to know (SIEM – What you need to know). If you don’t know what SIEM is, read it nevertheless!!! This blog post is to talk about SIEM technology by analyzing it critically (even though I am a big fan of SIEM, I believe that maturity comes from review and feedback). Almost a decade ago, SIEM started gaining traction and has come a long way since. Now, I think is a good time to review the technology from a critical view point. So here is my blog on The Good, The Bad and The Ugly!!! This will be a 2 part post, with Part 1 concentrating on Introducing SIEM and then highlighting what it has and has not achieved. Part 2 will concentrate on a proposal/vision on how SIEM should move ahead in the coming years

SIEM is data driven. Data in the form of logs from IT Infrastructure is the key driver for SIEM tools to perform their so called “magic”. Logs have been around in IT for a long time. Logs have been one of the main tools to troubleshoot programs/operating systems etc since long. Gradually, Security gained importance and because of an established logging platform available across IT landscape, Security Events also slowly started to trickle into Logs. With time, along came several compliance and Audit requirements that were driving the Security Log Management domain. Then gradually there arose a need to analyze Log Data and based on the analysis, perform an action. This is where SIM tools gained prominence. This later started to get focussed on Security related incidents and diverged as SIEM. If you look at the pro genesis of SIEM, it has all to do with Data. That is why in today’s world, where Data is exploding in the Internet, it is of utmost importance to understand a technology as SIEM and improve it with time.

What SIEM has accomplished?
For more than a Decade, SIEM has done a lot of things for IT folks. When there was no capability to analyze lines and lines of Log files, SIEM was our savior. SIEM gave us the following capabilities right off the bat:

  1. Process Log Streams from Various Products and standardize them into a single Application data set.
  2. Provide capabilities to work with several thousands of events per second and still give what we need in terms of searching and querying Log data
  3. Provide capabilities to co-relate data from different entities so that we can trace the progeny of an issue
  4. Provide nice Alerts/Reports/Dashboards/Summary for the IT Log data
  5. Finally, a Incident and Event Management Workflow to make it operational.

Several vendors of SIEM (SIM/SEM also used interchangeably but SIEM is becoming standard) exist and google searches will give you more than 20 in number. The SIEM market today has grown into a Multi-Billion dollar market and companies, people etc are all embracing the change.

SIEM Shortcomings: 
While SIEM is a lucrative segment to be in, the problem is that the technology is not mature and has some gaping holes. The technology instead of solving a problem for good, fixed some and introduced several other collateral issues. Let us look at some of them below:

  • Log Management as a technology, as a solution was never mature. We never had good enterprise wide Log Management technologies and tools around before SIEM arrived.
  • Several Log Management issues still exist. These are around Big Data Sets, Standard Log Format Specifications, Integration of Log Sources, Standardization of Applications logging with respect to Security etc. Instead of focussing on fixing these issues, we jumped into SIEM solutions (Log Management + Event Management).
  • SIEM came packaged with Log Management solutions as well, but they were not as efficient as they should be. SIEM came packaged with Event Management Solutions as well, but what is good Event Management, when Log Management is not efficient.
    • Sample this, Windows Logs are resident files in a proprietary format. All Network devices send Syslog messages using the same RFC, but content is varied. Database Audit logs are a mix of Table Data and File Audit Data. When we have a variety of such logs from vendors, there is no way we can effectively perform Log Management and subsequently Event Management
  • One of the best and easiest solution for Log Management was that SIEM vendors packaged a client that can collect and normalize the data into its proprietary format. Then the processed data was sent to a Central Manager where all Event Management capabilities existed.
  • The problem with the above approach is, different data sources need different processing and hence a different client for every data source. Though this seems to be a simple solution at the outset, it adds a layer of complexity in terms of managing the Clients themselves. Imagine this problem for a huge enterprise and you know what a pain point this is for SIEM solutions.
  • Client management is a decentralized approach and hence a failure. Monitoring the health of the client is one of the management headaches one has to bear with. Patching them, updating properties, remote management etc are all points of failure, Not to mention keeping them up and running with constant care and feed like a new born.
  • Since the log standardization in SIEM is in proprietary format, migrating from one system to another, one vendor to another is a pain point. This would require client re-installation and data re-processing. This is a problem where you are stuck with a product for life. Inter-operability between systems has been always a problem for Vendors in IT space. This while protects their business, limits the capability of the end user to get what he wants. The solution cannot be more and more new products, new projects to replace existing SIEM solutions etc. It has to be more robust than that.
  • Searching data across TBs (terabytes) of data is the most important problem every organization faces. How do our SIEM solutions solve this? By using some sort of Databasing and Indexing. All the databases today (Read Oracle/SQL/MYSQL/PGSQL) are all limited in terms of handling such randomly formatted, high volume feeds, thereby rendering long term searches, trend analysis etc a slow, frustrating and time consuming job.
  • Client Server Models implemented by SIEM does not scale for BIG DATA!!! Let me tell you how:
    1. Most of the SIEM solutions I have worked with have 3 layers of architecture – Data Collector Layer (Event Collectors), Data Storage Layer (Event Indexers/Storage) and the Data Processing Layer (Event Management/Administration/Web Console/Server).
    2. In the above architecture, Data Collection and Data Storage is High Volume ranging up to 100K events per second. However, for Data Processing Layer or the Manager Layer, there is a limit of how much it can process (typically in 1/10 – 1/20 of collected data)
    3. If the effective use of Log data is only going to be 10-20%, what about the rest?
    4. People say aggregation and filtering is done to consolidate the data to be within the 10-20% range. Filtering and Aggregation have their own pros and cons but the end result is what you collect, is not used entirely.
  • Managing SIEM solutions (from architecting, implementing, integrating, customization, event management, content development, maintenance etc) is not a simple task and usually requires huge investments in people and training. The vendors make money with this I know, but honestly, being a User, you know that “If it is complicated, adoption will be difficult”
  • Most SIEM solutions are not integrated with ITIL process of Incident and Event Management (A rather standard form for IT framework used across the industry) thereby limiting deployments that should be a seamless transition.

I have to be honest about the fact that the above list is not comprehensive and there are several points you as readers would like to point out as far as the Positives and Negatives of SIEM. Please comment on and I will update the post with your views and comments. Part 2 will be discussing about the various options for SIEM to learn and improve based on Industry feedback and User feedback.

APT – What you need to know?

APT – Advanced Persistent Threat is the latest buzz word in the industry. Everyone who is in the Security Industry, professionals and business alike want to get into the bandwagon that is called APT. Security product vendors are all gearing to cater to “APT” and all their current product lines or future releases address APT in some form of the other. Now, the fever has spread to the IT Management as well and now they want their Security teams to detect and prevent APT. Even though the InfoSec public has caught up with it, how much thought have we put into understanding the magnitude of the problem at hand? Is it enough to just jump on to something without understanding it fully or do we need a more educated and intelligent decision making?

Let us find out more in this post!!!!

As always, I would like to define APT to start with. This is key because once the definitions are clear, all we would need is to align our thinking to that definition. Then, I will list down what flaws we have in our current approach towards security. Finally, I will try to list down as many possible solutions to the problem at hand.

Defining APT:
Simply put, APT is a Security Threat to the Enterprise (even End User for that matter) that is Advanced in execution that traditional security filters are not able to catch outright and is persistent enough that it keeps moving from one compromised target to another evading detection. 

Is it a technology of the future? – No, it is not. APT is nothing but a threat we are not trained to see. One of the main reasons why APT has been so successful in many organizations is the fact that we have an outdated security strategy. For example, we are keen on tracking a Data Exfiltration from a compromised machine. How do we do it today?

  • To start of with, we look for Data Loss Prevention Solutions and see which vendor is the market leader
  • Then we implement DLP solutions with basic policies for generic data loss (PDF, WORD DOC, XLS, Source Codes, Credit Card Numbers, PAN, PII etc)
  • We fine tune the DLP policies for our enterprise specifically and implement detection and prevention capabilities
  • We log the data from DLP solutions to SIEM and alert when something of interest happens.
  • In addition or In replacement, IDS/IPS rules will be implemented to identify data loss traffic based on REGEX file names etc.
  • In some cases we would also look at Traffic going to Blacklisted Domains and IP.
I am sure all of them or majority of the organizations do this to identify Data Exfiltrations. But  can all those organizations say that they are safe against APT? The answer is a SAD NO. The reason being, Known (Policy or Signature of What is Bad) is a drop, Unknown (Where APT works) is an Ocean. The threat landscape has evolved to exploit the Unknown, but we have not evolved to detect and respond to it. What is the solution for this problem?
There are several solutions being proposed by several people in the industry.  In my opinion one of the most important solutions is to do behavior profiling and Anomaly Detection.
Now What is Behavior profiling?
Behavior Profiling – Every network, every segment of the network has a behavior profile that is deemed normal. Today how many of us know what our Network Segments look like in terms of Connections they accept, they deny, Traffic flowing within the segment, what are the most used protocols, what are not used, What size of packets flow, what outbound and inbound communications happen, Access in and out, Who is supposed to and Who is not etc etc.. I seriously doubt it. We are more concerned about getting the system up, providing the service it is deemed to provide. We seldom think about the Security Profile the segment has. Once we profile, we can identify several Anomalies.

Let us now take the same example of Data Exfiltration and see how Behavior profiling would help:

  1. We would have complete details about where sensitive data is residing, the VLAN, the Server, the Folder, The file, The DB tables etc.
  2. To the Sensitive Machine/Network/Data, We would know who has access to and Who does not?
  3. We would also track who has a copy of that data – what is the machine, where is it residing (desktop, laptop, mobile) etc.
  4. The data usage by which team, which individuals etc are also profiled and that would give us the subset of people handling that sensitive data
  5. Any theft of that data would be through one of the above actors/entities.
  6. Tracking each of their machines activity over time would give us a Normal behavior profile.
  7. Digital Markers on such sensitive data can also be placed by the corporations to track data use/flow
  8. We can also track periodicity of data access, time of access, track the data changes etc through Digital Markers
  9. Any deviations from Normal behavior is a potential Data Exfiltration action and needs to be investigated
  10. Behavior profiles thus created can be used in addition to Signature based detection

This requires intimate co-ordination with various teams and also requires great understanding of what your Network does, what it is supposed to do. This while being the most logical is the most challenging to implement and thus the most rewarding as well. Behavior profiling is being used in the Intelligence Community for a long time, but the Technology community is still to embrace this. Enterprise data is becoming critical and with threats like APT, our fundamentals are being questioned.

This approach can help after the fact but from preventing the occurrence a Long term solution is needed. From a long term perspective the only solution is building Networks and Applications (OS as well as Apps) from ground up to treat security as a embedded character and not an add on feature.

What are your thoughts on APT? How do you think we should change our Security thought process, technology and all to combat it? Sound on below!!!

How good is our current Security Strategy?

Few years ago, none of the “Hacktivist Groups” existed or even if they did, they lurked in the underworld. But today, they have the guts to come out in public and declare war on the Internet. They have also been very successful in bringing big corporations loss in terms of data and money. And how much wager would you like to place that this is just the beginning. This begs us to the very question – How good is our current Security Strategy? 

Traditionally we have been building the Security Regime using one or both of the approaches in tandem: Known Bad Security (Blacklisting) and Known Good Security (Whitelisting). To all those signature and behavior based thinkers, don’t fret, for this approach is a superset of Signature and Behavior based approach.

First let us look at Known Bad Security or Blacklisting:
One of the things we are very good at is, “KNOWN BAD” detection and response. By this I mean, we are good at identifying Vulnerabilities based on Vendor releases, patching them once the vendor releases the patches, updating AV/AS, IDS/IPS, Content Filtering etc to protect against exploitation. This is what “KNOWN BAD” Security is all about. You know it is bad and you defend against it. But a recent survey by Verizon shows that only 1% of the total data breaches are identified by IDS/IPS or AV solutions. This is a clear indicator that Signature based detection or Blacklisting based response is not giving us the results. So even though we are very good at Known Bad Security, we are being compromised day in and day out!!!!

Known Good Security or Whitelisting is just the opposite of Known Bad:
By this I mean, we identify and maintain a list of KNOWN GOOD items in our IT Infra. What connections are good, What users are good, What files are good, What is allowed, What is unauthorized etc as our data points for Known Good Security. Based on this data, we identify Security Abnormalities, anomalous pattern detection etc that don’t conform to the Whitelist and go after them as Rogues/Attackers. We investigate them, if found bad follow remediation process for them or if found good add them to the Whitelist. Once we know What is bad, we automate it by feeding to the Blacklist detection and Response. This while being effective is a slow and tedious process thereby giving gracious amounts of time for an attacker to wreak havoc.

Some Good, Some Bad:
Most of the Enterprises today effectively use a combination of Blacklisting and Whitelisting to achieve their Information Security needs. But based on the threats being propagated today, we can say with enough confidence that this approach is failing. The main reason for the failure is that, “Actual Good and Actual Bad are way more than Known Good and Known Bad”. Since we are unable to quantify these numbers scientifically, we end up doing good of nothing.

What we lack?
Our current strategy towards security has some gaping holes. Some of them are listed below:

  • Over Relying on External Sources: We still rely on Vendor input, community input and other public disclosures to define Blacklisting. One vendor’s threat detection efficiency is different from the other. One vendor might rate a Malicious Code as High Severity, but the other may rate it as Low. This kind of disparity does not help in determining what is “Actually bad”.
  • Poor Knowledge of our environment: How many times have you identified a Security incident and while investigating found out something new about the environment. I can bet that it is literally every time. Without knowing the exact nature of our environment, we would not be able to do any effective Whitelisting. Without effective Whitelisting, effective Blacklisting also is impacted
  • One cure for All diseases: We think that if one organization is compromised by a specific exploit, it is applicable to all. We seldom think or evaluate the Controls we have may differ significantly from the controls other organizations have. Security should be tailored to suit not vice-verse.
  • Once we Whitelist something, we never re-evaluate. We perceive that “it is clean” and pay little attention till hell breaks loose. This is more related to human nature than anything else I guess. Once we “move past” we never look back. This will hurt us because, a whitelist today might turn bad tomorrow due to IT dynamics, thereby leading to an exploit.
  • We live by and die by More tools – more security, Latest signatures – more protection, more resources – more coverage, more training – more knowledge. Most organization just buy Security tools or technologies to fill a check box in their Audit/Compliance needs. If the company execs have caught wind of some Security attack that happened at some other company, they are paranoid that it will happen to them as well. Hence “Gimme more security” approach.
  • More the HooHa More Serious the Threat: The amount of publicity received is directly proportional to the severity of the threat. We would have several other threats in our environment, several gaps we need to fix, but we would still look for the famous Conficker, Flame, Stuxnet, Aurora, Zeus etc. Even though some of them were big in terms of spread, every organization had different infection rates.
  • We still think Security as a Operations function. We still go by the number of Alerts worked, number of incident raised, time to solve, time to respond etc. Security is more Analytical, investigative field. Looking beyond the noise, finding the needle in the haystack, attacker attribution and all sound cool on paper, but to bring it to reality the current strategy doesn’t help.
  • Security is not a culture. In everyday life, you lock the front door, keep important things in a safe, put on safety gear, wear a seat belt etc, but we don’t treat our IT systems development, implementation and management with a Security mindset. Bad products are developed, bad implementations happen, bad administration and monitoring happen, and finally mistakes from people too happen, leading to a Security breach, data theft and loss.

I am sure there is more than the above list in terms of flaws in our current strategy. What do you think? Please comment on!!!



Get every new post delivered to your Inbox

Join other followers:

%d bloggers like this: