- Internal Architecture: Under the hood Splunk has two main services – The Splunk Daemon that is written in C++ used for data collection, indexing, search etc. and the The Splunk Web Services that is a web application written using a combination of Python, AJAX, XML, XSLT etc . which provides the super intuitive graphical UI. Splunk also provides API access using REST and it can integrate with any web framework needed. Splunk is one of the few products that still use C++ and Python instead of the clunky Java and its cousins. This provides the edge to Splunk when processing large data volumes thrown at it.
- Data Architecture: Splunk is a unique search engine like “data architecture”. In fact, some of the early development was based on the same concept of the path breaking GFS (Google file system) which provided a lot of direction and research into flat file storage, indexing and free text search capabilities with unmatched speed when compared to a relational DB. Splunk went on to master the distributed file system architecture and built their own proprietary data store which powers Splunk Enterprise today.
- Deployment Architecture: The deployment of Splunk is based on true Big Data Architecture – Slave and Master, where the Slaves are the Search Indexers and the Master is a search head. Of course you can have both the nodes in the same Physical server, but in a true distributed architecture, you need a master and a slave. Read more at Big Data – What you need to know? to understand better on what Big Data is and how to try your hand at it.
- Typical Setup: Lets look at a typical architecture deployment of Splunk in distributed mode.
As you can see, there are three distinct components of this architecture and they are as follows:
- Log collectors or Splunk Log Forwarders are installed closer to the source and forward all the logs to Splunk Indexers. This is similar to the Log Collectors in SIEM. They are not great, but are decent enough to get the job done.
- The Splunk indexers typically run only the Splunk Daemon service, that receives the data and indexes it based on a pre-defined Syntax (this is akin to parsers but lot more simpler and faster to process). This is then sent to the Splunk data store. Each data store has a set of indexes based on the amount of logs received. The data store can then be configured for retention, hot or cold or warm standby etc. etc. In big data terminology, these are the slave nodes.
- These indexers then use a process called as “Summarizer” or in big data terms – “Map reduce” to create a summary index of all the indexes available.
- Splunk Search head, which serves as the single console to search across all data stores has the “summary index” to know which Indexer (slave) node to query and what index to query. Now this is where the scalable search power of Splunk comes from. This is the master node in big data world.
What’s good about Splunk?
- Search, Search & Search: Splunk is arguably the best search engine for logs out there. We have started looking at ELK, Hadoop and other big data search engines but for the moment, Splunk rules the roost. The Splunk Search Processing Language (SPL) is the reason behind this power. The search can be done historically (on indexed data) or in real time (data before indexing) and this is as good as Log search can get. None of the SIEM products can come close to the search power of Splunk. In other words, Splunk is to search Log Data and SIEM is to search Event Data.
- Fully customizable as far as searching capabilities is concerned, Splunk lets us add scripts to search queries, provides field extraction capabilities for custom logs, provides API, SDK and Web framework support to achieve all that you would need for Log management, Investigations, Reporting and alerting.
- Web Interface: Even though UI is a subjective benefit, Splunk has one of the most pleasing interfaces we have seen for log management tools. It really is super easy and intuitive to use. It has great visualization capabilities, dashboards, app widgets and what not. It really puts the cool factor in a rather dull log analysis experience.
- No Parsing: Basically, Splunk is an “All you can eat” for logs. Splunk follows a “store now, parse later” approach which takes care of receiving any logs thrown at it without any parsing or support issues. If it is a known log type, the indexes are added and updated appropriately. If it is not a known type, still the logs are stored and indexed to be searchable for later. You can then use Field Extractions and build custom field parsings. This is one of the killer differentiators compared to traditional SIEM products as Splunk is a lot more forgiving and agnostic in log collection and storage and does not require specialized connectors or collectors to do the job. This makes it a great log management product.
- Splunk Apps help in building on top of the Search head to provide parsing, visualizations, reporting, metrics, saved searching and alerting and even SIEM-like capabilities. This, in my opinion is the power of Splunk compared to the other products in the market. They have an App Store for Splunk Apps. Cool isn’t it? These apps not only are written by product vendors, but also by User community.
- Scalability: Splunk is a true big data architecture. It can scale with addition of Indexers and search heads. Ratio of Search Heads to Indexers is at a good 1:6. This means that if you have 1 search head, you can have 6 search indexers. This is very attractive when compared to other SIEM solutions in the market when it comes to scaling at the log management layer.
What’s bad?
- Not a SIEM: Splunk is not your traditional SIEM. Let me clarify further. SIEM has several things in it that assists in performing security event management, monitoring, operations and workflow. In short the keyword for SIEM is “Operational Security Management”. Now the question is – Can Splunk be an SIEM? The simple answer is YES, however the real answer lies in how much customisation and how much product expertise you have in store to make it a SIEM product.
- Poor Correlation: Splunk does not do any correlation as it is not designed to do that. However, it can be used to correlate events using the Splunk search language. You can do manual correlation using piped searches, lookup tables, scripted searches etc. but again you need to be familiar with the language. You can also automate it by scheduled and real time search triggers. However, nothing is out of the box. Anton blogs about Splunk Correlation being far superior to ArcSight (which btw is the best correlation engine we have worked with) but honestly, we don’t have real life implementation experience to justify that.
- SIEM App: Splunk has an enterprise SIEM app that aids in SIEM-like functions. But it is definitely not a replacement killer for SIEM product. It is very basic and and does not do much out of the box.
- No Aggregation: The logs being sent to Splunk are received as is and sent to the data store. It is not aggregated. This while is a good thing for log collection and search performance, it is not good for underlying storage sizing. SIEM solutions have this capability but Splunk does not. This in turn affects the scalability aspect.
- Poor Compression: Many SIEM products have a compression ratio of 10:1. However for Splunk, we have consistently seen the ratio to be around 4:1. This while good for smaller log volumes, is very poor for larger volumes. The main reason for this is that the Indexes take a lot of storage compared to the raw logs. While they aid in greater search capabilities, they increase underlying storage and maintenance cost.
- Scalability: Even though, Scalability is one of the benefits of using Splunk for Log management, there is a downside to it too. Add to it the lack of aggregation, compression etc. and you can see how it impacts Scale. For example, Every indexer can handle only 100 – 150 GB/day on a good server hardware. In spite of what people might say about Splunk sizing and performance tuning, from years of personal use and experience, we can safely say that for standard enterprise hardware, this limit is as good as it gets. So assume you are looking at 1 TB/day. You would need 8 indexer servers and 2 search head servers for Splunk. However, if you were to take ArcSight or QRadar, you could do the same on two appliances with compression enabled (10:1 ratio of compression). This from a management perspective leads to larger foot print for Splunk than other SIEM products.
- Price: Contrary to popular belief, Splunk can get very expensive very fast. For all the reasons mentioned above, Splunk can get very expensive compared to other SIEM vendors to do large data collection as well as SIEM functionality. In a word – Be Cautious!!!
Conclusion: In our opinion, Splunk is one of the most innovative log management tools out there. But as a SIEM, to use in day to day security management, monitoring, ticketing etc. it has a lot of catching up to do. The ideal scenario will be to use Splunk in the log management layer and use any market leading SIEM in the correlation, workflow and operational management layer. We have seen several successful implementations where Splunk serves as the log management tool and ArcSight or QRadar serves as the Correlation engine. Best of both worlds!!!
Until next time – Ciao!!!
PS: Please feel free to add on to the list of What’s good and bad? based on your experience with Splunk and we will be happy to update our posts appropriately.