FTimes

Location:

Home

FTimes

Man Pages

ftimes-proximo

Man Pages
ftimes
ftimes-bimvl
ftimes-cat
ftimes-cmp2dbi
ftimes-cmp2diff
ftimes-crv2dbi
ftimes-crv2raw
ftimes-dbm-bash
ftimes-dbm-dump
ftimes-dbm-find
ftimes-dbm-make
ftimes-dbm-reap
ftimes-dbm-weed
ftimes-dig2ctx
ftimes-dig2dbi
ftimes-encoder
ftimes-grabber
ftimes-map2dbi
ftimes-map2mac
ftimes-proximo
ftimes-sizimus
ftimes-sortini
ftimes-srm
ftimes-xformer
ftimes-xpa
hashcp
hashdig-bash
hashdig-bind
hashdig-dump
hashdig-filter
hashdig-find
hashdig-harvest
hashdig-harvest-sunsolve
hashdig-make
hashdig-resolve-sunsolve
hashdig-stat
hashdig-weed
hipdig
tarmap
zipmap

FTimes
ChangeLog
Cookbook
Download
FTimes in Action
GnuPG
HashDig
History
Man Pages
News
Papers
Project Summary
XMagic

Home
FTimes
WebJob

NAME

ftimes-proximo - Locate a group of dig hits within a specified byte range

SYNOPSIS

ftimes-proximo [-l limit] [-r range] {-G group=tag,tag[,tag[,...]][:range]|-g <groups-file}> -f {file|-}

DESCRIPTION

This utility locates a group of dig hits within a specified byte range. To work properly, the input must be sorted by 'hostname' (when present), 'name', and 'offset' in ascending order. Note that this utility does not sort the input -- that step can be done with ftimes-sortini(1). The input format can vary so long as it contains at least the 'name', 'tag', 'offset', and 'string' fields. The two most common formats are:

    name|type|tag|offset|string

and

    hostname|name|type|tag|offset|string|joiner

The first is produced by ftimes(1) and hipdig(1), and the second is produced by ftimes-dig2dbi(1). Each input record must contain a non-null tag value -- those that don't will be ignored. Generally, each tag should correspond to a unique dig string. However, tag overloading is allowed.

This utility can also take its own output as input, thus providing a way to analyze groups of groups. In that case, the input must contain at least the 'name', 'group', 'footprint', and 'offset' fields.

Output is written to stdout in one of the following formats:

    name|group|ordered|proximity|gap|limit|range|window|footprint|offset|offsets|tags

    hostname|name|group|ordered|proximity|gap|limit|range|window|footprint|offset|offsets|tags

The breakdown of the output format is as follows:

hostname

Hostname of the subject system. This value is transferred directly from the input stream, but only if that field is present.

name

URL-encoded filename. This value is transferred directly from the input stream.

group

Name of the group (as defined on the command line or in a group config file) that was matched.

ordered

Boolean value (y/n) indicating whether the actual tag order matches the order specified in the group definition. If order is important, be sure to specify group definitions using the desired order.

proximity

A value from 0.00 to 1.00 indicating the relative proximity of the dig hits for a given group. This value is computed as follows:

    ( <limit> - <gap> ) / <limit>

where the gap is the smaller of the specified limit (-l option) or actual gap.

gap

The average gap, in bytes, between adjacent dig hits.

limit

The largest average gap between dig hits for them to be considered close. As the actual gap approaches this number, proximity goes to zero.

range

The number of bytes between the lowest and highest dig offsets for a given match.

window

The number of bytes used to determine whether a given match is in range or not.

footprint

The number of bytes between the beginning of the first and end of the last dig hits (inclusive).

offset

The offset of the group hit. This corresponds to the lowest offset within a group for a given match (hit).

offsets

Comma delimited list of dig offsets in the order they were found.

tags

Comma delimited list of dig tags in the order they were found.

The trigger event for generating an output record is a group match. Each time a member offset changes for a given group, the entire group is evaluated to see if the resulting set of offsets fall within the specified range. If that condition is met, then an output record is generated.

OPTIONS

-f {file|-}

Specifies the name of the input file. A value of '-' will cause the program to read from stdin.

-G group=tag,tag[,tag[,...]][:range]

Specifies a group definition where

group: The name of the group.
tag,tag[,tag[,...]]: A comma delimited list of two or more unique dig tags.
range: A decimal number or the word 'infinity'. The range is optional in a group definition.

-g groups-file

Specifies the name of a file containing one or more group definitions. The format is the same as that used for the -G option.

-l limit

Specifies the largest average gap between dig hits for them to be considered close. As the gap approaches this number, proximity goes to zero. The default gap is 100 bytes.

-r range

A decimal number or the word 'infinity'. If the latter is specified, then the range window is all bytes in a given file. This is useful if you simply want to determine whether or not all tags occur in a given file. The default range is 100 bytes.

CAVEATS

Group matching only maintains (i.e., remembers) the last offset of each group member. This means that there are cases where a single group could have multiple matches in a specified range, but only one is reported. For example, suppose you have the following group definition:

    g_test=a1,b2,c3,d4:100

Now suppose that you have the following dig records:

    name|type|tag|offset|string
    "file"|normal|a1|10|a1
    "file"|normal|b2|20|b2
    "file"|normal|c3|30|c3
    "file"|normal|a1|40|a1
    "file"|normal|d4|50|d4

In this case, one could say that the group matches twice within the specified range of 100 bytes. Once for offsets 10, 20, 30, and 50, and once for offsets 20, 30, 40, and 50. Since this utility only maintains the last offset of each group member, only the second set of offsets is considered a match. This happens because the 'a1' offset is reset from 10 to 40 when the fourth record (not counting the header) is porcessed. Effectively, this means that given two potential matches within a specified range, the match where the offsets are the closest always wins.

AUTHOR

Klayton Monroe

HISTORY

This utility was initially written to perform proximity analysis in a case where we needed to identify last names in close proximity to their respective Social Security Numbers (SSN).

This utility first appeared in FTimes 3.9.0.

LICENSE

All documentation and code are distributed under same terms and conditions as FTimes.