There are basically two archive strategies that we support with the current argus-client programs. The first is native file system based, where the data is stored as files in the hosts file system. The second is where all the data is stored as a set of MySQL tables. There are advantages and limitations to both mechanisms, and you can use both at the same time to get the best of both worlds.
This page describes issues we're working on for archive mangement and performance. Most of it relates to native file system support.
One of the important properties of an argus archive is to be able to 'fetch' argus data from the repository quickly. Usually that means getting some or all the data that you have for a given time period, in order to inspect the data for specific events, conditions, etc... Rarely do you want the entire archive, usually you want to know what was happening relative to an event, which has a time associated with it. Last monday around noon, yesterday at 11:14:20.15 am someone logged into a machine from ....., can you tell me .... I tried to access this machine an hour ago, and had some trouble, can you help me?
All of these archive queries have a time scope associated with them. This technology is provided to help you access argus data from your archive with specific time ranges. Our strategy to accomplish this is to generate a time index for argus data files using the utility rasqltimeindex(), and to read the data using the utility rasql().
The program for establishing a native file system respository time index is rasqltimeindex().
rasqltimeindex -r argus.file -w mysql://user@host/db
rasqltimeindex() will create a MySQL database table entry for every second that is 'covered' by an argus record in the file, and insert them into the 'Seconds' table in database 'db', creating the table if needed. Each row will have the seconds value, the probe identifier and the byte offsets for the beginning and end of all the records that SPAN the specific second in the file.
If you want to index an entire archive at one time, you can do this:
rasqltimeindex -R /path/to/the/archive -w mysql://user@host/db
This will cause rasqltimeindex() to recursively find all the argus data files in the file system under /path/to/the/archive, and it will index and insert any files that have not already been indexed.
Now, you can also filter the input to rasqltimeindex(), like any ra* program, and generate indexes of specific data, but because there is only one 'Seconds' table per database, the usefulness of this strategy is limited.
Indexing compressed data is supported. rasqltimeindex() will uncompress the file before indexing, so all the byte offsets will be correct. The filename that is stored in the 'Filename' table will have the compression extension included, and rasql() will test for the existence of both the compressed or uncompressed file when it goes to read the file.
Because rasql() will read the file starting at the specifed byte offsets (thus the performance improvements), rasql() will need to uncompress the file, and leave it in place, to do its file processing. This will require that rasql() have write access to the archive.
Data is read from argus archives that are supported by MySQL based tables, using the program rasql(). When given a time range expression, rasql() will use the tables described above, you just need to provide the -t time range option and the database where the indexes are stored:
rasql -t -10d+20s -r mysql://user@host/db
In this case we're looking for all the records in the database that span 00:00:00 - 00:00:20, 10 days ago. Used in this way, rasql() will search the Seconds table for all the seconds that occur between the start and end time range expression. From that data, it calculates byte offsets of all the files that are in the Filename table, and in table order, it will read all the data in each matching file, apply the same time filter to the records as they are read.
When the data is being held solely in a MySQL Database Repository, there is no need for a Seconds table index. But, many sites have a hybrid system, where data is held in a MySQL based repository for a time period, like 1 month, and then the data is rolled out of the database into a native file system repository for the remainder of the archive retention period. In this case, you will have a need to quickly access the data regardless of what repository strategy is being used.
rasql() supports time searching this hybrid strategy, and you don't have to worry about it. Lets say that you need to get some data from the archive that relates to a particular metwork, last week April 1st, around lunchtime.
rasql -t 2010/04/01.11:58:00+5m -M time 1d -r mysql://user@host/db/argus_%Y_%m_%d - net 1.2.3.0/24
This takes the time range, and the table naming strategy, and figures out which table(s) from the local argusData database to read. It also looks to see if there is a 'Seconds' table in that database, and if so, it will query the table to see if there are any files that also need to be search. You can write these records to a file, or pipe to other programs, like racluster() to generate a view that you want. See ra.1 for how to specify time filters.
Page Last Modified: 14:22:39 EDT 13 Mar 2012 ©Copyright 2000 - 2012 QoSient, LLC. All Rights Reserved.