Splunk Enterprise – What you need to know?

SIEM posts have grown in number at Infosecnirvana, but the requests to write about more products keep coming in. One of the oft asked about product is Splunk Enterprise. We have posted on HP ArcSightIBM QRadar and McAfee Nitro SIEM. However, readers have been asking us repeatedly to write on Splunk.
So here it is finally after being in the works for a long time
In 2003, One of the most interesting products rolled out and vowed to simplify Log management once and for all (and it did!!!) –Splunk. Their motto was simple – Throw logs at me and I will provide a web based console to search through it intuitively. Interestingly they are one of the few companies that have not been acquired, in spite of being a very innovative product. So let’s see what makes Splunk tick.
As always, a product is as good as its architecture. It has to be solid both internally as well as externally (meaning solution deployment, integration, ease of use, compatibility etc.).
  • Internal Architecture: Under the hood Splunk has two main services – The Splunk Daemon that is written in C++ used for data collection, indexing, search etc. and the The Splunk Web Services that is a web application written using a combination of Python, AJAX, XML, XSLT etc . which provides the super intuitive graphical UI. Splunk also provides API access using REST and it can integrate with any web framework needed. Splunk is one of the few products that still use C++ and Python instead of the clunky Java and its cousins. This provides the edge to Splunk when processing large data volumes thrown at it.
  • Data Architecture: Splunk is a unique search engine like “data architecture”. In fact, some of the early development was based on the same concept of the path breaking GFS (Google file system) which provided a lot of direction and research into flat file storage, indexing and free text search capabilities with unmatched speed when compared to a relational DB. Splunk went on to master the distributed file system architecture and built their own proprietary data store which powers Splunk Enterprise today.
  • Deployment Architecture: The deployment of Splunk is based on true Big Data Architecture – Slave and Master, where the Slaves are the Search Indexers and the Master is a search head. Of course you can have both the nodes in the same Physical server, but in a true distributed architecture, you need a master and a slave. Read more at Big Data – What you need to know? to understand better on what Big Data is and how to try your hand at it.
  • Typical Setup: Lets look at a typical architecture deployment of Splunk in distributed mode.

Splunk_img4As you can see, there are three distinct components of this architecture and they are as follows:

  1. Log collectors or Splunk Log Forwarders are installed closer to the source and forward all the logs to Splunk Indexers. This is similar to the Log Collectors in SIEM. They are not great, but are decent enough to get the job done.
  2. The Splunk indexers typically run only the Splunk Daemon service, that receives the data and indexes it based on a pre-defined Syntax (this is akin to parsers but lot more simpler and faster to process). This is then sent to the Splunk data store. Each data store has a set of indexes based on the amount of logs received. The data store can then be configured for retention, hot or cold or warm standby etc. etc.  In big data terminology, these are the slave nodes.
  3. These indexers then use a process called as “Summarizer” or in big data terms – “Map reduce” to create a summary index of all the indexes available.
  4. Splunk Search head, which serves as the single console to search across all data stores has the “summary index” to know which Indexer (slave) node to query and what index to query. Now this is where the scalable search power of Splunk comes from. This is the master node in big data world.

What’s good about Splunk? 

  • Search, Search & Search: Splunk is arguably the best search engine for logs out there. We have started looking at ELK, Hadoop and other big data search engines but for the moment, Splunk rules the roost. The Splunk Search Processing Language (SPL) is the reason behind this power. The search can be done historically (on indexed data) or in real time (data before indexing) and this is as good as Log search can get. None of the SIEM products can come close to the search power of Splunk. In other words, Splunk is to search Log Data and SIEM is to search Event Data.
  • Fully customizable as far as searching capabilities is concerned, Splunk lets us add scripts to search queries, provides field extraction capabilities for custom logs, provides API, SDK and Web framework support to achieve all that you would need for Log management, Investigations, Reporting and alerting.
  • Web Interface: Even though UI is a subjective benefit, Splunk has one of the most pleasing interfaces we have seen for log management tools. It really is super easy and intuitive to use. It has great visualization capabilities, dashboards, app widgets and what not. It really puts the cool factor in a rather dull log analysis experience.
  • No Parsing: Basically, Splunk is an “All you can eat” for logs. Splunk follows a “store now, parse later” approach which takes care of receiving any logs thrown at it without any parsing or support issues. If it is a known log type, the indexes are added and updated appropriately. If it is not a known type, still the logs are stored and indexed to be searchable for later. You can then use Field Extractions and build custom field parsings. This is one of the killer differentiators compared to traditional SIEM products as Splunk is a lot more forgiving and agnostic in log collection and storage and does not require specialized connectors or collectors to do the job. This makes it a great log management product.
  • Splunk Apps help in building on top of the Search head to provide parsing, visualizations, reporting, metrics, saved searching and alerting and even SIEM-like capabilities. This, in my opinion is the power of Splunk compared to the other products in the market. They have an App Store for Splunk Apps. Cool isn’t it? These apps not only are written by product vendors, but also by User community.
  • Scalability: Splunk is a true big data architecture. It can scale with addition of Indexers and search heads. Ratio of Search Heads to Indexers is at a good 1:6. This means that if you have 1 search head, you can have 6 search indexers. This is very attractive when compared to other SIEM solutions in the market when it comes to scaling at the log management layer.

What’s bad?

  • Not a SIEM: Splunk is not your traditional SIEM. Let me clarify further. SIEM has several things in it that assists in performing security event management, monitoring, operations and workflow. In short the keyword for SIEM is “Operational Security Management”. Now the question is – Can Splunk be an SIEM? The simple answer is YES, however the real answer lies in how much customisation and how much product expertise  you have in store to make it a SIEM product.
  • Poor Correlation: Splunk does not do any correlation as it is not designed to do that. However, it can be used to correlate events using the Splunk search language. You can do manual correlation using piped searches, lookup tables, scripted searches etc. but again you need to be familiar with the language. You can also automate it by scheduled and real time search triggers. However, nothing is out of the box. Anton blogs about Splunk Correlation being far superior to ArcSight (which btw is the best correlation engine we have worked with) but honestly, we don’t have real life implementation experience to justify that.
  • SIEM App: Splunk has an enterprise SIEM app that aids in SIEM-like functions. But it is definitely not a replacement killer for SIEM product. It is very basic and and does not do much out of the box.
  • No Aggregation: The logs being sent to Splunk are received as is and sent to the data store. It is not aggregated. This while is a good thing for log collection and search performance, it is not good for underlying storage sizing. SIEM solutions have this capability but Splunk does not. This in turn affects the scalability aspect.
  • Poor Compression: Many SIEM products have a compression ratio of 10:1. However for Splunk, we have consistently seen the ratio to be around 4:1. This while good for smaller log volumes, is very poor for larger volumes. The main reason for this is that the Indexes take a lot of storage compared to the raw logs. While they aid in greater search capabilities, they increase underlying storage and maintenance cost.
  • Scalability: Even though, Scalability is one of the benefits of using Splunk for Log management, there is a downside to it too. Add to it the lack of aggregation, compression etc. and you can see how it impacts Scale. For example, Every indexer can handle only 100 – 150 GB/day on a good server hardware. In spite of what people might say about Splunk sizing and performance tuning, from years of personal use and experience, we can safely say that for standard enterprise hardware, this limit is as good as it gets. So assume you are looking at 1 TB/day. You would need 8 indexer servers and 2 search head servers for Splunk. However, if you were to take ArcSight or QRadar, you could do the same on two appliances with compression enabled (10:1 ratio of compression). This from a management perspective leads to larger foot print for Splunk than other SIEM products.
  • Price: Contrary to popular belief, Splunk can get very expensive very fast. For all the reasons mentioned above, Splunk can get very expensive compared to other SIEM vendors to do large data collection as well as SIEM functionality. In a word – Be Cautious!!!

Conclusion: In our opinion, Splunk is one of the most innovative log management tools out there.  But as a SIEM, to use in day to day security management, monitoring, ticketing etc. it has a lot of catching up to do. The ideal scenario will be to use Splunk in the log management layer and use any market leading SIEM in the correlation, workflow and operational management layer. We have seen several successful implementations where Splunk serves as the log management tool and ArcSight or QRadar serves as the Correlation engine. Best of both worlds!!!

Until next time – Ciao!!!

PS: Please feel free to add on to the list of  What’s good and bad? based on your experience with Splunk and we will be happy to update our posts appropriately.

15 thoughts on “Splunk Enterprise – What you need to know?”

  1. A nice article. +10 for reference to Metanetivs 🙂
    Now, what we need next is a head-to-head battle vs ArcSight Logger, QRadar Log Manager and.. what about LogRhythm?


  2. I can not agree with you statement about aggregation. Aggregation is one of the most powerful features provided by splunk, compared to other SIEM vendors. You can aggregate on whatever field or fields you want, you can aggregate using complex terms including those from external sources (for example organization structure not username). And you have at least 4 methods to do so: summary indexes, TSIDX, report acceleration and accelerated data models.
    Compression highly depends on payload and number of aggregations you use, data replication etc. My experience shows that it can be 5% to 40%. If we speak about raw data it is usually closer to 5% then 40%.

    1. Stefan, the aggregation is field based, event based etc. in other SIEM tools. Splunk does not do that. It does aggregation after collection, not before.
      Accelerations and Summary indexes help after the fact, but not before. Compression figures are best on Splunk recommended sizing numbers. It is highly dependent on a lot of factors agreed, but on an average they have poor compression ratios compared to SIEM products

  3. I agree on the compression ratios, but if you’ve ever had to do a deep analysis on other SIEM products then you’ll find that a lot of the compression has more to do with data trimming or grooming to a security spec rather than compression. That matches what the vendor (esp QRadar) says in that it is “Event Compression” however this is irreversible data loss which is achievable in Splunk with Data Model accelleration or summary table building if you were to back up only the accellerated table or summary index. I’ve done SIEM and Splunk implementations above 2TB per day of volume, and every time Splunk won on price, because you really only have to index what you want if you want to emulate a SIEM. However, Splunk more than makes up for the additional cost of keeping everything once you add the additional ROI of the other business units in the organization. This also achieves a practical application of the often figurative, Security serves the business.

  4. Oh yeah,

    Regarding scalability. ArcSight does pretty good, but QRadar and Nitro are nowhere close to Splunk with regard to multisite or regional fault tolerance. QRadar and Nitro (because of the aforementioned data trimming/event compression/data loss) cannot do a retrospective analysis, so if you are combating a long slow compromise, like that which has plagued large retail/utilities/energy sector/aviation/govt you will have no idea how long the the compromise has been present. Again, as the clean up crew after such disasters, luckily I can chew through the old data in a splunk instance and get them a picture of who/what/where/when, provided the have the raw data archived.

    During such events, usually the SIEM vendor is brought in and thoroughly embarassed when their product failed in the first place, and then failed to be able to handle the reaction.

    1. QRadar doesn’t trim events, it stores the whole event payload. Yes it does compress data (without loss), but only to allow it to be kept for longer, and in any case the compression is completely transparent to the user. Also, QRadar can keep data for years. If you run out of disk space, you can just add Data Nodes (each data node can keep 100TB of data uncompressed, > 500TB compressed) when and where you need them, enabling PB’s of data to be kept online.

      Also, QRadar supports SQL/AQL queries directly on the data as well, enabling users to sift through all of this data in a common well understood language..

      1. Of course we can store data into Data Node. But the price of a such device is abusing… Qradar could offer at a minimum the possibility to filter out event at the agent side! For the moment every thing come in the Event Processor is paid, event if you decide to drop event!! This is an unacceptable behavior for me.

  5. When you read this article, it seems clear that you never used Splunk with Enterprise Security. I can agree that Arcsight is a great correlation tool, but it can only correlate on a small part of your data, leaving all other totally unuseful on the horribly slow Loggers !
    And do you really thing that we can only relly on correlation today for security ? This will not allow you to catch APT, while Splunk, with its good analytic capabilities does a great job !

    1. Hi Robert,
      Thanks for your comment. I think you have not read the entire post clearly. I categorically mention in the “Good” that Splunk capabilities to do data analysis is superior than any product because of SPL. Also, in the bad, I clearly mention the word “Operational Security”. I am not sure about your experiences, but I can definitely tell you that many organizations need a tool that allows SIEM to be used for Workflows, Ticketing, Case Mgmt, ITSM Integration, 24×7 Monitoring ease etc. In this realm Splunk is not as mature as the other SIEM products.

      As far as your ArcSight comment goes, I am a big critic of their logger product. This was supposed to compete with Splunk but could never even come close.

Leave a Reply