Argus fills a huge niche in many enterprise security systems by supporting audit for the network. Network audit was recognized as the key to trusted network enterprise security in the Red Book (NCSG-TG-005), published in the 1987. Network Audit is completely different from other types of security measures, such as Firewalling (mandatory access control) or IDS or IPS, where a system can generate a notification or alarm if something of interest happens. The goal of Network Audit, is to provide accountability for network use. While the Rainbow series is not in the contemporary security spotlight, the concepts that it developed are fundamental to trusted computing.
If done well, Network Audit can enable a large number of mechanisms, Situational Awareness, Security, Traffic Engineering, Accounting and Billing, to name a few. Audit is effective when you have good audit data generation, collection, distribution, processing and management. Argus and argus's client programs provide the basics for each of these areas, so lets get started.
Effective network audit starts with sensor choice and deployment (cover). Sensor choice is crucial, as the sensor needs to reliably and accurately convey the correct identifiers, attributes, time and metrics needed for the audit application. Not every audit application needs every metric possible, so there are choices. We think argus is the best network flow monitor for security. Some sites have complete packet capture at specific observation points, while others have Netflow data a key points in the network. We're going to stay with argus as the primary sensor for this discussion.
Network based argus deployment is dependant on how you insert argus into the data path. Argus has been integrated into DYI open-source based routers, (Quagga, OpenWRT) and that can provide you with a view into every interface. But, most sites use commercial routers.
The predominate way that sites insert argus into the network data path is through router/switch port mirroring. This involves configuring the device to mirroring both the input and output streams of a single interface to a monitor port. There are performance issues, but for most sites this is more than adequate to monitor the exterior border of the workgroup or enterprise network. Using this technique, argus can be deployed at many interfaces in the enterprise. But, when building a network audit system, there is a need for some formalisms in your deployment strategy, to ensure that the audit system sees all the traffic (ground truth). If you want to audit everything from the outside to the inside, you need a sensor on every interface that supports traffic that gets in and out. If you miss one, then you don't have a comprehensive audit, you have a partial audit and partial audits are not eventually sucessful for security.
Argus has been successfully deployed to most end systems (computers, and some specialty devices, such as Ericsson's ViPr video conference terminal) and this type of deployment strategy is very powerful for auditing the LAN. Many computers support multiple interfaces, and so when deploying in end systems for audit purposes, be sure and monitor all the interfaces, if possible.
For most sites, the audit data will cover the enterprise/Internet border, but for some, it will involve generating data from hundreds of observation points. To collect the sensor data to a central point for storage, processing and archival, we use the program radium(). Radium() can attach to 100's of argus and netflow data sources, and distribute the resulting data stream to 100's of programs of interest. Radium() can simply collect and distribute "primitive" data (unmodified argus sensor data) or radium() can process the primitive data that it receives, correcting for time, filtering and labeling the data.
Using radium(), you can build a rather complex data flow framework, that collects argus data in near realtime. You can also use radium() to retreive data from sensors, or other repositories, on a scheduled basis, lets say hourly or daily. radium() provides the ability to deliver files from remote sites on demand. A simple example of how useful this is, is in the case where you deploy argus on a laptop and have the argus data stored on the laptops native file system. Assuming the laptop leaves campus or the corporate headquarters for a few days, when it returns, the radium() running on the laptop can serve up files from its local repository, when asked by an authorized collector.
For Incident Response organizations, where you receive packet files as a part of the investigation of an incident, you can use rasplit() to insert the argus data you generate from the packet files into an incident specific audit system. Giving each origination site its own ARGUS_MONITOR_ID,and using that unique ID when creating the argus records, you can generate a rich incident network activity audit facility to assist in incident identification, correlation, analysis, planning, and tracking.
Audit information needs to be stored for processing and historical reference. Argus-clients support two fundamental strategies for storage: native file system support and MySQL. There are advantages and limitations to both mechanisms, and you can use both at the same time to get the best of both worlds.
If you are collecting a little data every now and then, or collection huge amounts of data (> 10GB/day) or collecting from a lot of sensors, a native file system respository is a good starting point. A native file system repository is one where all the collected "primitive" data is stored in files in a structured hierarchical system. This approach has a lot of advantages; ease of use, performance, familiar data management utilities for archiving, compressing and removal, etc... but it lacks a lot of the sophisticated data assurance features you find in modern relational database management systems.
The best program for establishing a native file system respository is rasplit() or rastream().
rasplit -M time 5m -w /argus/archive/primitive/\$srcid/%Y/%m/%d/argus.%Y.%m%d.%H.%M.%S -S argus.data.source
Run in this fashion, rasplit() will open and close files as needed to write data into a probe oriented, time structured file system, where the files hold data for each 5 minute time period. Five mintues is chosen because it generates 288 files in each daily directory, which is good for performance on most Unix filesystems, which have major performance problems when the number of files in a directory gets too big. If you don't like 5 minutes, and a lot of people do change this number, go to large chunks, not less.
The program rasqlinsert() allows you to insert "primitive" argus data directly into a MySQL database table(s). We use this very successfully when the number of flows are < 20M flows/day, which are most working group systems, and many, many enterprise networks.
rasqlinsert -m none -M time 1d -w mysql://user@localhost/argusData/argus_%Y_%m_%d -S argus.data.source
This appends argus data as it's received into MySQL tables that are named by Year_month_day that are in the argusData database. The table schema that is used has ascii columns and the actual binary record in each row that is inserted. The average size of an entry is 500 bytes, in this configuration, and so 20M flows/day will result in 10GB daily tables.
Now, if you want to read data from this repository, you need to specify a time bounds, which you can do by reading from a specific table, or you can provide a time range on the command line. Say you are interested in analyzing the flows seen in the first 20 mintues, from 2 days ago.
rasql -t -2d+20m -M time 1d -r mysql://user@localhost/argusData/argus_%Y_%m_%d
This takes the time range, and the table naming strategy, and figures out which table(s) from the local argusData database to read to provide you with data you're looking for. You can write these records to a file, or pipe to other programs, like racluster() to generate a view that you want. See ra.1 for how to specify time filters.
The argus audit repository establishes an information system that contains status reports for all the network activity observed. All the utility of the Network Audit is extracted from this information system. From a security perspective, there are 3 basic things that the audit supports:
1. | Network Forensics |
2. | Daily Custom Anomaly Detection Reports |
3. | Daily Operational Status Reports |
Network Forensics is a complex analytic process, and we discuss this topic here. Bascially, the process involves a lot of searching of the repository, to answer questions like "When was the first time we saw this IP address?", "Has any host accessed this host in the last 24 hours?", "Have we seen this pattern before?", "When was the first time this string was used in a URL?". Some queries are constrained to specific time regions, but others encompass the entire repository. The type of network audit respository impacts how efficiently this type of analytic process progresses, and so a well structured Network Forensics respository will use a combination of native file system and RDBMs support. But, for simple queries, such as "What happend around 11:35 last night?" any repository strategy will do very well.
Many sites develop their own custom security and operational status reports, which they generally run at night. These include reports on the numbers of scanners, how many internal hosts were accessed from the outside, reports of "odd" flows, the number of internal machines accessing machines on the outside, the top talkers in and out of the enterprise, to see if a machine starts leaking GBs of data, and any accesses from foreign countries, etc... This type of processing generally involves reading an entire days worth of flow records, aggreagating the data using various strategies, and then comparing the output witha set of filtering criteria, or an "expected" list.
Generally, if you are doing a small number of passes through an entire day's data, a native file system repository is the best architecture. However, you can create a temporary native file system repository from the MySQL database repository, if the time range isn't too large, so which repository you start with is really dependent on how comfortable you are with MySQL, at least that has been my experience.
Network Audit for many sites is a very large data issue, for some a problem. There are sites that collect a few Terabytes of flow data per day, and so archive management is a really important issue. But even if you only collect a few 100 KBs of data per day, archive management is important. A good rule of thumb for security is keep it all, until you can't.
The biggest issue in archive management is size, which is the product of the data ingest rates (how many flow per day are you generating and collecting) and the data retention time (how long are you going to hold onto the data). The more data the more resources needed to use it (storage and processing). While compression helps, it really can cause more problems that it solves. Limiting retention (throwing data away) will limit the usefulness of the archive, so it will be a balancing act.
We recommend compressing all the files in the archive. All ra* programs can read either bz2 or gz files, and so saving this data space up front is very reasonable. But it does cost in processing to uncompress a file every time you want to use it. Many sites keep the last 7 days worth of data uncompressed, as these are more likely to be accessed by scripts and people during the week, but compress all the rest. Scripts that do this are very simple to build, if you use a Year/month/day style file system structure.
Data retention is a big deal for some sites, as the amount of data they are generating per day is very large. With both data repository types, deleting data from the repository is very simple. Delete the directory for an entire day, or drop the MySQL database table. But many sites want to have staged management of their repository data. Working retention policies have included strategies like: "Primitive" data for 6 months, Daily "Matrix" data forever online. "Primitive" data on CD/Tape for 2 years.
How long you keep your data around determines the utility of the audit system. From a security perspective, the primitive data has value for at least 1 year. Many sites are notified of security issues up to 12 months after the incident. In some situations, such as Digital Rights Management issues, where university students downloaded copyright protected material to their dorm room, and then redistributed the material through peer-to-peer networks, these have been reported 18 months after the fact.
The respository doesn't have to be "online" to be effective, and what is kept online, doesn't have to be the original "primitive" data. These issues in data retention are best determined through site policy on what the data is being retained for, and then the type of data retained can be tailored to the policy.
Some sites are concerned that if they have a repository, someone, such as law enforcement, may ask for it, and they are not prepared for the potential consequences. This is not an issue that has come up yet, for the sites that we are aware of, but if you do decide to have long retention times on your audit archive, consider developing policy on how you will want to share the information.
Page Last Modified: 14:22:39 EDT 13 Mar 2012 ©Copyright 2000 - 2012 QoSient, LLC. All Rights Reserved.