Category Archives: What you need to know?

Evaluating SIEM – What you need to know?


We, at Infosecnirvana have published several posts on SIEM. SIEM as a product has created a unique place for itself in the IT Defense in Depth Strategy and has helped several organizations to effectively detect and respond to security threats as well as rapidly achieve compliance needs. Such an important product in the Security space also has a steep price attached to it. The price is not only in dollar terms, but also in ongoing human effort to manage, maintain and generate value out of it. So it becomes paramount that a right choice is made when it comes to SIEM. This blog post aims to give a set of product evaluation criteria or set of questions customers should ask in the evaluation of SIEM. This guide is using a vendor agnostic approach.


Before selecting a SIEM vendor we need to make sure that the product meets certain selection requirements. Often times, IT organizations have only a vague idea of what is required from a SIEM. They don’t have a solid understanding of the various parameters to be considered in selecting SIEM products. Some of the key requirements to be considered are as follows:

  1. Company & Product
  2. Architecture
  3. Installation & Configuration
  4. Event Collection
  5. Event Storage
  6. User Interface & User Experience
  7. Certifications

Company & Product:

A SIEM product or any other software product for that matter is as good as the company that develops it. This is key because, a company that is stable and has a long term road map focusses better on product development and building expertise. Hence, evaluating the company also becomes important when buying the product. Some of the key items to look for are:

  • Industry Focus, Market presence, Years of experience in the field
  • Financial Performance – Subjective Measurements over the past years
  • Marketplace opinions and reviews about the product and the company as a whole.
  • Do they have customer references for both the product as well as the company?
  • Analyst Reports for the last few years – Gartner, Forrester etcetera.
  • Know the Executive Leadership of the Company. Is the leadership team strong? Is it Trustworthy?
  • Do they have a track record of successful product launches, revisions, development etcetera.
  • Licensing and Pricing models
  • How strong is their product road map?
  • $$$$ spent on R&D for new products VERSUS Development on incremental growth of core product.
  • Is the vision of the product group forward thinking? Are they innovative?
  • How is the Company’s product support and services group? Is it a dedicated team in house or is it outsourced?
  • What are the product support & professional services options available? How focused is the management team in providing Support services? Is support available globally?
  • Is there expertise across the vendors partners (VAR, MSSP, Consulting Organizations) to support both basic and advanced consulting needs? How mature is the partnership or alliance relationship?


One of the key components of a product is its architecture maturity. The product should be capable of catering to IT Infrastructure needs that vary from industry to industry, from enterprise to enterprise. Some of the key questions related to Architecture are listed below:

  • How flexible is the product deployment architecture? Can it be run as an Appliance, a Software standalone, a virtual appliance/machine or a SaaS?
  • Can the architecture be deployed in a way where individual data storage capability is available per business unit/location?
  • Does the architecture allow for full data replication for HA purposes? Is HA a built-in function or additional equipment is required?
  • Does the architecture allow for interoperability with Network Management devices, System Management devices etcetera.
  • Does the architecture support scalability? Is it modular enough to expand based on growth needs, storage needs and performance needs?
  • Does the product support granular Role based Access control for the underlying hardware & application software?
  • Does it meet the organization’s policy and standards compliance requirements
  • Does the product have a secure data transmission between Event Collection, Event Storage and Event Correlation layers? Does it use encryption? If so, how strong?

Installation & Configuration: 

  • Can the Installation and Initial Configuration be handled by technical staff with minimal training?
  • Ease of set-up, Maturity of Product Documentation and Support to facilitate this effort?
  • Ease of post install maintenance, patching, routine tuning?
  • Ease of patch management of the product including the underlying data architecture.
  • How does the product or solution facilitate asset tracking?
  • From a log collection perspective, who are the supported Vendors, what Products and Versions are supported for integration?
  • How varied and comprehensive is the Data export feature (extract logs, alerts, raw data etcetera.)? Does it support CSV, PDF, HTML, Raw text etcetera?
  • Data Workflow Integration (Bidirectional access to information via external workflow tools?)
  • Email interface for report distribution, ease of customization of the email templates
  • Interface to 3rd party applications (ticketing/workflow application, existing business logging solutions etcetera.)

Event Collection:

  • Does the product have support for both Agent based Collection and Agent-less Collection?
  • For Agent systems, does the solution support Windows, Unix and Linux Platforms, File readers, XML readers, Structured and unstructured data etcetera.
  • For Agent-less, does the solution support Syslog, SNMP, SQL, ODBC/JDBC,  and API collection
  • Is the Agent management function centralized or is it standalone?
  • Does it have any limitations in Input and Output Events Per Sec (EPS)?
  • Does it offer the following capabilities to ensure reliability and flexibility?
    1. Aggregation – Can the Agent aggregate similar information based on custom defined grouping values defined by the System Administrator to cater to the changing Event Collection requirements?
    2. Bandwidth Throttling – Can the Agent prioritize forwarding of events based on defined values such as event priority? Can it send events at a specific bandwidth rate
    3. Filtering – Can the agent provide Include as well as Exclude criteria for filtering?
    4. Caching – Can the agent cache all the events in the event that the Log Store goes down? When forwarding the cache after failure does it intelligently throttle the events?
    5. Fail-over Capabilities – Can the agent send Log events to a different alternate data store when the principal data store is down? Can it do multiple destination forwarding?
    6. Transport Integrity – Can the agent encrypt the log transport to ensure confidentiality? What compression and encryption mechanisms are used?
    7. Health Monitoring – Can the agent send health messages and statistics?
  • For Microsoft Windows Event collection can the agent map the GUID/SUID to local registry/names/references for each event ID in the SYSTEM, SECURITY and all APPLICATION logs on the system?
  • Are Agents that rely on Event Source Vendor API’s to connect and collect information approved and/or certified by that event source Vendor?
  • Can the Agent follow dynamically changing folders and file names? For example in order to support event sources like IIS Web Logs or custom applications that create a log per “site/application” per logging interval?
  • Does the Agent support Database Administrator Logging (From both SQL and System / File Based Sources) for Oracle, MSSQL, MySQL and DB2?
  • Agent parsing and mapping customization. Can the agent’s parsing be modified to assist with custom log messages? Can the normalization or categorization schema be updated to support custom log messages? How is system default functionality affected if these are modified?
  • Can the Agent act as a NTP source for Source Event Logs or otherwise help in time synchronization for source event logs?
  • Time difference adjustment feature (to allow the logging system to cope with devices having inconsistent times)

Event Storage:

  • Does the product allow storage of data locally, remotely in a SAN or NAS?
  • Is the data storage capable of compression? If so what is the rate of compression?
  • Is the storage architecture dependent on standard database or does it use proprietary architecture? If proprietary, does it have all the capabilities to meet storage security requirements?
  • Is Data Archival flexible? If so, is it built-in? What options does the product have?

 User Interface & User Experience:

  • Is the interface user friendly or technical? Is the interface a standalone client console or a web console?
  • How is the performance of the User Interface?
  • Performance when searching for various data elements (IP addresses, usernames, event types, etcetera)
  • Performance when generating reports, query results, data extracts?
  • Will the product function to support the needs of the Tier based SOC Analysts, Incident Handlers, Responders?
  • Can access to data in the system be restricted according to access rights (i.e. business units can see “only their data”)?
  • Can the console present just the events a particular analyst is assigned to handle?
  • Does the interface allow easy access to actionable data? Does the interface organization require a steep learning curve? Can the analyst drive deeper analysis or via tools with a single action (right click and select)?
  • Is the data presented in a manner that makes sense to the analyst?
  • Can analyst understand correlation actions?
  • Can analyst easily change correlation actions?
  • Could the product act as an incident management tool, accepting case notes, and related information on an incident?
  • Does the Solution provide graphical Business reporting using visual aids, graphics, dashboards, template based documents etcetera?
  • How mature is the reporting capability? Matches or exceeds requirements?
  • Ease of development of new reports, customization of existing reports, tuning of generated reports, scheduling reports etcetera.
  • How easy is the accessibility to internal, centralized log sources, in normalized and raw form in case of reporting needs?
  • How customizable is the reporting query?
  • Does it have compliance packages to aid in compliance reporting?
  • Does the product allow users to perform Advanced Analysis – Statistical, Visual, Mathematical, empirical?
  • Does the product support the most difficult use-cases for correlation? (Multi Vendor, Multi-Event, Custom Application and Custom field correlation)
  • Can the system support “live”, “custom”, or “dynamic” threat feeds for live correlation and alerting? Threat Intelligence Feeds such as IP’s, Subnets, Domain, Files, Patterns, etcetera.
  • Does the concept of a “Hot-list” or comparison list exist? Automated Hot-list Trigger. A Hot-list can be a watch list or any other static form of data that can be used as a reference point.
  • Hot-list Updates are manual or automated?Customizable real-time alerting based on specified criteria
  • Distributed search across multiple data stores
  • Functionality to initiate certain actions based on real-time alert (sending email/text message, executing script etcetera)
  • Is the software designed to track user input to understand how users interact with the system? Objective measures of feature use (misuse)? How is this information mined? Are there any Privilege user management content to audit and track privilege access?

Certifications & Training:

  • Training options for the product – Classroom? On-line? Mixed?
  • Certification path and criteria
  • Continuous training options available for new product releases, feature releases etcetera?

Conclusion: Phew!!! That is a long list of things to consider for SIEM evaluation and I still feel that there are many items I am missing from this list. As mentioned at the beginning, this post is a guide to perform SIEM evaluation and should be a great starting point in terms of a check list creation, Tender floatation, proposal requests and the like. Please feel free to add in the comments section or send me an message if you feel any more items need to be added.

Until next time!!! Ciao

Punching Hard – QRadar Security Intelligence Platform


Off late, at Infosecnirvana, we have been looking beyond ArcSight Enterprise Security Platform (ESP) to see if there are any other SIEM products that either challenge or match up or exceed the capability of ArcSight ESP. One of the products that has caught our attention in recent times is the IBM acquisition – Q1 Labs offering – QRadar Security Intelligence Platform. IBM completed this buy in 2011 and jump started their Security Systems Division providing a platform to compete against HP who jump started their Enterprise Security Products group with the buying of ArcSight in 2010. Both of them are competing hard in the market place and are vying for the top spot as evidenced in numerous SIEM vendor analysis and reports.

Gartner reports are something that every company looks before investing in a SIEM solution. The interesting thing about QRadar that caught our attention is how consistently it has climbed the ladder of the SIEM Leaders Quadrant. Lets take a look at the last 3 years of the Gartner Magic Q to get an idea of the rapid climb of QRadar against ArcSight.


Looking at the graph more closely, even McAfee Nitro and Splunk are catching up in the leaders Q. However, in this post we will concentrate on Q1 Labs QRadar only as they are by and large the biggest threat to ArcSight in terms of technology and capability, not to mention Market share.

First things First:

The QRadar Integrated Security Solutions (QRadar) Platform is an integrated set of products for collecting, analysing, and managing enterprise Security Event information. The various components that are part of this Platform are:

  • QRadar Log Manager – log management solution for Event log collection & storage.
  • QRadar SIEM – Correlation engine
  • QRadar VM – Vulnerability scanner and management tool set available to integrate Event data to Vulnerability data. This provides on demand scans, rescans and vulnerability tracking.
  • QRadar QFlowNetwork Behaviour Analysis & Anomaly detection using network flow data. QFlow provides payload information (up to Layer 7) in every detected event which is a great value addition to Netflow data. 
  • QRadar vFlow – Application Layer monitoring for both Physical & Virtual environment.

Key Strengths of QRadar: Few of the things that blew us away when we played around with IBM QRadar was:

  • Easy Setup – It was a breeze to install the product. There are very few or no moving parts in the installation process. The console is also Web based and is a full functional console. From a deployment and operations perspective, this comes across as a super easy, super quick solution to SIEM needs.
  • Value Out of the Box – QRadar comes packed with a lot of content Out of the box to get up and running. The Dashboards are already built for you, more than 1500 reports are waiting for you to just click and run, rules are categorized nicely under various Threat sections and immediately start firing “Offenses” (Correlation rule triggers are called so in IBM world), Network Flow and Packet data are available instantly under the same unified console when triggers are analysed and so on and so forth. We have never seen such quick turnaround times with any other SIEM product in recent times.
  • Completely Replicated Architecture – Full replication is available in the product and can be enabled with a click. This is something which we were really impressed with. In major organisations, this is non-negotiable and such a easy set up really builds up a story.

Key Weakness of the Product: Now being ArcSight users for several years now, this section is something which is right down our alley. Some of the key weakness we saw with the product are:

  • Scale: In spite of all the ease of set up and value Out of the box, when compared against ArcSight, scaling up with multiple tiers is a problem. One of the caveats we see here is that QRadar is an appliance based model. You can have several collector appliances, but to query them you can have only only Manager or Console Appliance. This will severely impact the scalability in a multi-tier set up.
  • Multi-Tenancy: ArcSight has always been best suited for a Managed service implementation with its Customer tagging, zoning and overall multi-tenancy architecture. However, this is a big problem when it comes to QRadar. They don’t have such a capability today. However, we believe their product road map does talk about such features in the future, but we will have to bite our nails in anticipation.
  • Customization: One of the things which propelled ArcSight to land major defence and government contracts was its capability to customize almost everything except the core source code. When creating Content like Use Cases, Rules, Reports, Third party integration etc. this customization capability comes in handy. Such customization & flexibility is seldom seen in any SIEM product out there. QRadar offers some of these customization, but the moment you take it along that route, you will be disappointed on what it lets you do – Read NO API.
  • Workflow: Other impressive thing about ArcSight is its wonderful content management workflow. It has a full blow case management workflow, event handling workflow, Use Cases workflow etc. whereas QRadar falls short as it does not have any such powerful workflow capabilities. Hopefully IBM will address it in future product releases.

Overall Comparison with ArcSight: ArcSight ESP by far has been the oldest and supposedly the most mature SIEM offering in the market but honestly they are losing ground because, they have not been seriously challenged so far. QRadar does that exactly. Based on the key Strengths and Weaknesses of the product, you should have got an idea of where the product stands.

  • Most of the customers would love to get QRadar in their environment just for the ease of set up and Out of the box value. ArcSight is still a pain to set up and generate value. Most of the implementations of ArcSight have failed for the simple reason – Complexity
  • QRadar put a lot of emphasis on Network security based monitoring approach, where as ArcSight takes an Identity based Security monitoring approach. This is an interesting because the Cyber security world is still split about what is key – “Identity based or Network Security based”. In our humble opinion, a mix of both is what really works.

In Conclusion: QRadar definitely is a wonderful product and a worthy competitor to ArcSight as the battle for the top prize plays out. As technology enthusiasts, we are eager to see how the market plays out, but one thing is for sure

“QRadar Security Intelligence Platform is definitely Punching Hard”.

There you have it!!! Let me know what you guys think about these two products and which one do you prefer and why? Comment on below.

Big Data – What you need to know?

Big Data is the buzzword in IT Circle nowadays. The major reason for this is the exploding “Netizen” base. Today Everything is happening Online and Online Data is estimated in zettabytes. The wealth of information one can carve from Online data is undeniably attractive for several organizations for marketing and sales. Organizations like Google, Yahoo, Facebook, Amazon etc process several Petabytes of data on a daily basis. Many more organizations are moving towards being able to collect, store and make sense of data in the Internet to further their interests.That is where “Big Data” has caught the imagination of people around the world. But What is Big Data and How can I jump into this bandwagon. Fret not, for in the blog post, you are going to find all about it.  The structure of this blog will be typical of a What you need to know? series posted at So lets get started!!!

What is Data?
Data is anything that provides value in a structured or unstructured format. It is the lowest level of abstraction in Computing terms because after this, it is binary digits only. Data is typically stored in File Systems

Introducing File Systems
File Systems are the basis of storing and accessing data from a hardware device. It is nothing but an abstraction layer of software/firmware that gives you the capability to store data in a structured format, remember the structure and when queried, help retrieve it as quickly as possible. There are 2 major and common types of File Systems – Disk Based (local access) and Network Based (remote access). To give a simple example, FAT is a Windows Disk based File System wheres NFS is a Network based File System.

Even though both the file systems continue to dominate IT space, more and more relevance is given to Network based File Systems for obvious reasons like Distributed Data storage, redundancy, fault tolerance capabilities etc. This is the basis of “Big Data Tools and Technologies”.

Introducing DFS
Distributed File Systems are Network based File Systems that allow data to be shared across multiple machines across multiple networks. This makes it possible for multiple users on multiple machines to share files and storage resources. The client machines don’t have direct access to the Storage disk itself (as in a Disk based file system), but are able to interact with the Data using a File System protocol. One classic example of DFS is Microsoft SMB where All Windows machines are SMB Clients and access a common SMB Share on the File Server. But SMB suffers from issues pertaining to scalability and fault tolerance. This is where systems like Google File System – GFS (Google uses this in their search engine) and Hadoop Distributed File System – HDFS (Yahoo and others) come into prominence. What these File Systems do is provide a mechanism to effectively manage big data collection, storage and processing across multiple machine nodes.

Introducing HDFS:

Hadoop Distributed File Systems or shortly HDFS is similar to the other DFS file systems talked above, however it is significantly different as well. HDFS can be deployed on Commodity Hardware, is Highly Fault Tolerant and is very capable of handling large data sets. Originally HDFS was developed as part of the Apache NUTCH Project for an alternate Search Engine akin to Google. Some of the most prominent software players for HDFS are “Apache Hadoop”, “Greenplum”, Cloudera etc.

In this post, we will be looking at Log Collection and Management using the Hadoop Platform.

APACHE Hadoop: The Apache Hadoop architecture in a Nutshell consists of the following components:

  • HDFS is a Master Slave Architecture
  • Master Server is called a NameNode
  • Slave Servers are called DataNodes
  • Underlying Data Replication across Nodes
  • Interface Language – Java

Installing Hadoop: Installation of Apache Hadoop is not a very easy task, but at the same time it is not too complex either. Understanding of the Hardware Requirements, Operating System Requirements and Java Programming Language can help you install Apache Hadoop without any issues. Installing Hadoop can be either a Single Node Installation or a Cluster Installation. For this post, we will look at only Single Node Installation steps:

  1. Install Oracle Java on your machine – Ubuntu
  2. Install OpenSSH Server
  3. Create a Hadoop Group and Hadoop User and set Key Based Login for SSH
  4. Download the Latest Distribution of Hadoop from
  5. Installation is just extracting the Hadoop files into a folder and editing some property files
  6. Provide the location for the JAVA home in the following file location- hadoop/conf/
  7. Create a working folder in Hadoop User Home Directory /home//tmp
  8. Add the relevant details about the host and the home directory following configuration elements in /hadoop/conf/core-site.xml
    conf/core-site.xml —>
    A base for other temporary directories.
    The name of the default file system. A URI whose
    scheme and authority determine the FileSystem implementation. The
    uri’s scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class. The uri’s authority is used to
    determine the host, port, etc. for a filesystem.
  9. Then we need to edit the hadoop/conf/mapred-site.xml using a text editor and add the following configuration values (like core-site.xml)
    conf/mapred-site.xml —>
    The host and port that the MapReduce job tracker runs
    at. If “local”, then jobs are run in-process as a single map
    and reduce task.
  10. Open hadoop/conf/hdfs-site.xml using a text editor and add the following configurations:
    conf/hdfs-site.xml —>
    Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
  11. Before running the Hadoop Installation, the most important step is to format the NameNode or the Master Server. This is critical because, Without the NameNode, the DataNodes will not be setup. In a Single Node Installation, NameNode and DataNodes will reside on the same host, where as in Cluster Installation, NameNodes and DataNodes will reside on different hosts. In order to format the NameNode using Hadoop commands, Run the following command – /hadoop/bin/hadoop namenode -format
  12. In order to start the Hadoop Instance, from hadoop/bin run ./ and Running the commands will start up Hadoop and when you query the Java Process, you should be able to see the following components of Hadoop Running:
  13. If you have successfully completed till this, then you now have a Hadoop Single Node Instance running on your machine.

Getting Data in/out of Hadoop:

Once the installation is completed, the next thing we need to worry about is getting data in and out of Hadoop File System. Typically in order to get the data into the system, we need a API interface into HDFS. This typically is a JAVA or HTTP API. Tools like FluentD, Flume etc help in getting data in and out of Hadoop. Both the tools have plugins for receiving HTTP data, Streaming data and Syslog Data as well.

MapReduce: Hadoop and Big data discussions are incomplete without talking about MapReduce. MapReduce is a software policy framework that maps Input data based on a map file and outputs data in key value pairs. These are two different jobs when it comes to actual processing. One is the Map Task that splits the data into smaller chunks and there is the Reduce Job that generates a Key Value combination for each of the smaller data chunks. This framework is the powerhouse for Hadoop because, this is built with parallelism in mind. Map Tasks and Reduce Tasks can both be run parallel on several machines without compromising on speed, cpu and memory resources. The NameNode is the central master that tracks the Maps and the Jobs where as the DataNodes are just providing processing resource.

Finally, Using Hadoop: Now that we know what drives Hadoop and how to get Hadoop installed, the easiest thing would be to start using them. Several examples for MapReduce jobs using Java are available to aid in learning. There are several related projects running to make the Hadoop Ecosystem more scalable and mature. Some of them are:

  • HBase, a Bigtable-like structured storage system for Hadoop HDFS
  • Apache Pig is a high-level data-flow language and execution framework for parallel computation. It is built on top of Hadoop Core.
  • Hive a data warehouse infrastructure which allows sql-like adhoc querying of data (in any format) stored in Hadoop
  • ZooKeeper is a high-performance coordination service for distributed applications.
  • Hama, a Google’s Pregel-like distributed computing framework based on BSP (Bulk Synchronous Parallel) computing techniques for massive scientific computations.
  • Mahout, scalable Machine Learning algorithms using Hadoop

Conclusion: Hope this post helped you in understanding the basic concepts of Big Data and also to setup a Hadoop Single Node Installation to play with. Please do post your thoughts on how Big Data is playing a major role in your organisations.