Revision $Id: ftimes-map-db-setup.base,v 1.2 2008/12/24 18:39:40 jay Exp $ Purpose This recipe demonstrates how to preprocess and load FTimes map data into a MySQL database. Motivation Often, it is useful to load information into a relational database so that a wide variety of analysis queries may be written once and run multiple times. Requirements Cooking with This recipe assumes that you are familiar with FTimes, Perl, and MySQL and that you already have a running MySQL engine. It also assumes that your Perl setup includes the Digest::MD5 module. Time to Implement Assuming that you have satisfied all the requirements/prerequisites, this recipe should take approximately 15 minutes to implement. Solution 1. If your map data is in the FTimes compressed format, you'll need to decode it. This can be accomplished with: # ftimes --decoder compressed-snapshot.map > snapshot.map 2. All map data must be processed prior to loading it into the database. In general, the following modifications need to be made in the following order: - A new column, 'joiner', must be added to the map data. This field is used as the primary key in the database. For each filename in the data set, create an MD5 hash and append that value to the end of the record using a '|' character as the delimiter. - Escape all backslash characters (i.e. '\') with an additional backslash character. Thus, the value '"c:\boot.ini"' becomes '"c:\\boot.ini"'. - Replace all NULL fields with the value `\N'. The Perl script, ftimes-map2dbi.pl, included in the src distribution may be used to preprocess map data. It has the following syntax: # ftimes-map2dbi.pl [-F] [-d db] [-h host] [-m max-rows] -f {file|-} Where '-F' forces existing output files to be overwritten, '-d' specifies the name of the target database, '-h' specifies the name of the subject system (i.e. the system on which the data was collected) and causes the script to to insert an extra column called hostname, '-m' specifies the maximum number of records that may exist in the target table, and '-f' specifies the input file: either an uncompressed map file or stdin. Two output files, having the extensions .sql and .dbi are produced by running this script. The SQL statements in the .sql file may be used to import the .dbi data into MySQL. Note: is ftimes-map2dbi.pl not included in the src distribution until after the 3.3.0 release. To obtain the most current revision, go to the FTimes cvs tree. Invoking this script with: # ftimes-map2dbi.pl -d host1 -f snapshot.map should produce two output files: snapshot.map.sql and snapshot.map.dbi. 3. Import the .dbi data using MySQL's command-line interface. Continuing with the example from step two, we have: # mysql host1 < snapshot.map.sql Note: MySQL typically requires data files to be world readable. This implies that all intermediate directories must also be world searchable. Alternatively, mysqlimport -- a command-line interface to the 'LOAD DATA' SQL statement -- can load the preprocessed map data. However, the target table must exist prior to import, and its name must match the name of the file being loaded upto the first period. Thus, to load the contents of snapshot.map.dbi, you should create a table called snapshot. Hint: use the CREATE statement in the .sql file to create the snapshot table. Once the table has been created, use the following command to import the data. # mysqlimport --fields-terminated-by='|' host1 snapshot.map.dbi Note: Unless mysqlimport supports an option to ignore one or more input lines, you should plan to remove the header line in snapshot.map.dbi prior to importing the map data. 4. Run SQL queries to analyze the loaded data. The queries given in Appendix 1 show how common analysis questions may be implemented in SQL. You may find it useful to append the ORDER BY clause to the end of various queries -- it has the affect of sorting the output based on the specified field. At first, you may want to limit output to ensure that the query is working as expected. This can be done with a LIMIT clause. To prevent potentially confusing line wraps, you should use a *very* wide xterm, or alternatively replace '*' with a specific list of columns. Closing Remarks We need more analysis queries -- hint, hint ;) Author This recipe was brought to you by Klayton Monroe. References Appendix 1 --- ftimes-map-queries.sql --- ###################################################################### # # Select records by MD5. # ###################################################################### SELECT * FROM snapshot WHERE md5 = "d41d8cd98f00b204e9800998ecf8427e"; SELECT * FROM snapshot WHERE md5 = "SPECIAL"; SELECT * FROM snapshot WHERE md5 = "SYMLINK"; SELECT * FROM snapshot WHERE md5 = "DIRECTORY"; SELECT * FROM snapshot WHERE md5 = 'SPECIAL' or md5 = 'DIRECTORY' or md5 = 'SYMLINK'; SELECT * FROM snapshot WHERE name RLIKE '/([.][sr]hosts|s?hosts.equiv)"$' AND md5 = '8ef32056e74261fa8e5ee3b8670f81de' ORDER BY hostname, name; # The file contains '+ +' SELECT name, mode, md5 FROM snapshot WHERE md5 = 'SPECIAL' or md5 = 'DIRECTORY' or md5 = 'SYMLINK'; ###################################################################### # # Select records by name. # ###################################################################### SELECT * FROM snapshot WHERE name RLIKE '[.]ssh/.*"$'; SELECT * FROM snapshot WHERE name RLIKE 'ssh_host(_rsa|_dsa)?_key([.]pub)?"$'; SELECT * FROM snapshot WHERE name RLIKE '([.]ssh/.*|ssh_host(_rsa|_dsa)?_key([.]pub)?)"$'; SELECT * FROM snapshot WHERE name RLIKE '(known_hosts2?|authorized_keys)"$'; WHERE name RLIKE 'ssh/[^/]*known_hosts2?' SELECT * FROM snapshot WHERE name RLIKE '[.](login|profile|cshrc|tcshrc|bash_profile)"$'; SELECT * FROM snapshot WHERE name RLIKE '[.](bash_history|history|sh_history)"$'; SELECT * FROM snapshot WHERE name RLIKE '[.]Xauthority"$'; SELECT * FROM snapshot WHERE name RLIKE '/([.][sr]hosts|s?hosts.equiv)"$'; SELECT * FROM snapshot WHERE name RLIKE '/([.][sr]hosts|s?hosts.equiv)"$' AND size > 0 ORDER BY hostname, name; SELECT * FROM snapshot WHERE name RLIKE '/([.][sr]hosts|s?hosts.equiv)"$' AND size = 4 ORDER BY hostname, name; # There's a good chance that this file contains '+ +' SELECT * FROM snapshot WHERE name RLIKE '[.]log"$'; SELECT * FROM snapshot WHERE name RLIKE '[.](bz|bz2|gz|jar|tar|tbz|tbz2|tgz|zip)"$'; SELECT hostname, uid, gid, right(mode, 4) as mode, size, md5, name FROM snapshot WHERE name RLIKE '[.](bz|bz2|gz|tar|tar[.]bz|tar[.]bz2|tar[.]gz|tbz|tbz2|tgz|zip)"$'; SELECT hostname, uid, gid, right(mode, 4) as mode, size, md5, name FROM snapshot WHERE LOCATE(LEFT(name, LENGTH(name)-1), '"/path/to/some/archive"') = 1 ORDER BY hostname, name; SELECT client_id, hostname, uid, gid, right(mode, 4) as mode, size, md5, name FROM snapshot WHERE locate(left(name,length(name)-1),'"/path/to/some/archive"') = 1 AND (mode RLIKE '^4[0-7]{3}[1357]$' OR mode RLIKE '^10[0-7]{3}[4567]$') AND hostname = 'some-host'ORDER BY hostname, name; SELECT conv(right(mode,1),10,8)&1 as x, client_id, hostname, uid, gid, right(mode, 4) as mode, size, md5, name FROM snapshot WHERE locate(left(name,length(name)-1),'"/path/to/some/archive"') = 1 AND (mode RLIKE '^4[0-7]{3}[1357]$' OR mode RLIKE '^10[0-7]{3}[4567]$') AND hostname = 'some-host'ORDER BY hostname, name; #egrep '/([.][sr]hosts|s?hosts.equiv|[.]netrc|[.]?my[_.]cnf|admpwd?|odbc[.]ini|known_hosts2?|authorized_keys)"' *.map '/(\.[sr]hosts|s?hosts.equiv|\.netrc|\.?my[_\.]cnf|admpwd?|odbc\.ini|known_hosts2?|authorized_keys|admpw|\.htpasswd|tnsnames\.ora|[.](login|profile|cshrc|tcshrc|bash_profile|Xauthority)|([.]ssh/.*|ssh_host(_rsa|_dsa)?_key([.]pub)?)|sshd_config|ssh_config|\.config|web\.xml|users\.lst)"' *.map ###################################################################### # # Select records by ctime (date/time/range). # ###################################################################### SELECT * FROM snapshot WHERE ctime LIKE '2002-01-01%' ORDER BY ctime desc; SELECT * FROM snapshot WHERE ctime LIKE '%19:00:00' ORDER BY ctime desc; SELECT * FROM snapshot WHERE ctime > '2002-01-01 12:00:00' and ctime < '2002-01-01 16:00:00' ORDER BY ctime desc; ###################################################################### # # Select records by [mac]time (date/time/range). # ###################################################################### SELECT * FROM snapshot WHERE mtime LIKE '2002-01-01%' or atime LIKE '2002-01-01%' or ctime LIKE '2002-01-01%' ORDER BY ctime desc; SELECT * FROM snapshot WHERE mtime LIKE '%19:00:00' or atime LIKE '%19:00:00' or ctime LIKE '%19:00:00' ORDER BY ctime desc; ###################################################################### # # Select records that have been url encoded. This could imply the # existence of non-printable characters, or it could be something # benign like 'c++' which gets encoded to 'c%2b%2b'. The second # query filters out a common false positive (i.e. '+'), but use # sparingly or with additional query contraints. # ###################################################################### SELECT * FROM snapshot WHERE name RLIKE '%[a-fA-F0-9]{2}'; SELECT * FROM snapshot WHERE name RLIKE '%[a-fA-F0-9]{2}' and name NOT RLIKE '%2[Bb]'; ###################################################################### # # Select records that have two or more adjacent characters in # their name. Characters of interest include dot and space # (i.e. '.' and '+'). Remember space characters are encoded # as pluses by FTimes (i.e. ' ' becomes '+'). # ###################################################################### SELECT * FROM snapshot WHERE name RLIKE '[.]{2,}'; SELECT * FROM snapshot WHERE name RLIKE '[.]{2,}' AND mode RLIKE '^4[0-7]{4}$'; SELECT hostname, uid, gid, name FROM snapshot WHERE name RLIKE '[.]{2,}' AND mode RLIKE '^4[0-7]{4}$'; SELECT * FROM snapshot WHERE name RLIKE '[+]{2,}'; SELECT * FROM snapshot WHERE name RLIKE '[+]{2,}' AND mode RLIKE '^4[0-7]{4}$'; SELECT hostname, uid, gid, name FROM snapshot WHERE name RLIKE '[+]{2,}' AND mode RLIKE '^4[0-7]{4}$'; This one combines special characters and multi -dots and -spaces. SELECT hostname, name, uid, gid, mode, size, md5 FROM snapshot WHERE name RLIKE '%([018-9A-Fa-f][0-9A-Fa-f]|7[CcFf])' OR name RLIKE '[.+]{2,}' ORDER BY hostname, name; ###################################################################### # # Select records that represent directories, regular files, or both # from '/dev/' or '/devices/'. # ###################################################################### SELECT * FROM snapshot WHERE name RLIKE '^"/dev(ices)?/' AND mode RLIKE '^4[0-7]{4}'; SELECT * FROM snapshot WHERE name RLIKE '^"/dev(ices)?/' AND mode RLIKE '^(10|4)[0-7]{4}'; SELECT * FROM snapshot WHERE name RLIKE '^"/dev(ices)?/' AND mode RLIKE '^(10|4)[0-7]{4}' AND size > 0; I tend to like this one b/c any suspicious parent dirnames will pop out as a result of running it. However, it does not catch the case where the suspicious dir is empty. SELECT * FROM snapshot WHERE name RLIKE '^"/dev(ices)?/' AND mode RLIKE '^10[0-7]{4}'; ###################################################################### # # Select various suid files. # ###################################################################### SELECT * FROM snapshot WHERE mode RLIKE '^10[4-7][0-7]{2}[1357]$'; SELECT * FROM snapshot WHERE mode RLIKE '^10[4-7][0-7]{2}[1357]$' AND uid = 0; SELECT * FROM snapshot WHERE mode RLIKE '^10[4-7][0-7]{2}[1357]$' AND hostname = 'victim';; SELECT * FROM snapshot WHERE mode RLIKE '^10[4-7][0-7]{2}[1357]$' AND uid = 0 AND hostname = 'victim';; SELECT hostname, uid, gid, mode, name FROM snapshot WHERE mode RLIKE '^10[4-7][0-7]{2}[1357]$' AND uid = 0 ORDER BY hostname, name; SELECT hostname, uid, gid, right(mode, 4) as mode, name FROM snapshot WHERE mode RLIKE '^10[4-7][0-7]{2}[1357]$' AND uid = 0 ORDER BY hostname, name; SELECT hostname, count(mode) AS 'root-owned, world-executable suid files' FROM snapshot WHERE mode RLIKE '^10[4-7][0-7]{2}[1357]$' AND uid = 0 GROUP BY hostname; ###################################################################### # # Select various sgid files. # ###################################################################### SELECT * FROM snapshot WHERE mode RLIKE '^10[2367][0-7]{2}[1357]$'; SELECT * FROM snapshot WHERE mode RLIKE '^10[2367][0-7]{2}[1357]$' AND uid = 0; SELECT * FROM snapshot WHERE mode RLIKE '^10[2367][0-7]{2}[1357]$' AND hostname = 'victim'; SELECT * FROM snapshot WHERE mode RLIKE '^10[2367][0-7]{2}[1357]$' AND uid = 0 AND hostname = 'victim'; SELECT hostname, uid, gid, mode, name FROM snapshot WHERE mode RLIKE '^10[2367][0-7]{2}[1357]$' AND uid = 0 ORDER BY hostname, name; SELECT hostname, count(mode) AS "root-owned, world-executable sgid files" FROM snapshot WHERE mode RLIKE '^10[2367][0-7]{2}[1357]$' AND uid = 0 GROUP BY hostname; SELECT hostname, uid, gid, right(mode, 4) as mode, name FROM snapshot WHERE mode RLIKE '^10[2367][0-7]{2}[1357]$' AND gid <= 20 ORDER BY hostname, name; SELECT hostname, count(mode) AS 'root+20-owned, world-executable sgid files' FROM snapshot WHERE mode RLIKE '^10[2367][0-7]{2}[1357]$' AND gid <= 20 GROUP BY hostname; ###################################################################### # # Select various core files. # ###################################################################### SELECT * FROM snapshot WHERE name LIKE '%core%'; # Many hits... SELECT * FROM snapshot WHERE name LIKE '%core"'; # Any thing that ends with 'core'. SELECT * FROM snapshot WHERE name LIKE '%/core"'; # Any thing that ends with '/core'. SELECT * FROM snapshot WHERE name RLIKE '.*([.]|/)core"$'; SELECT * FROM snapshot WHERE name LIKE '%core%' AND size > 1000000; SELECT * FROM snapshot WHERE name LIKE '%/core"' AND uid=0 AND mode RLIKE '^.*[4-7]$'; ###################################################################### # # Select world writable directories (no sticky) and files with uid <= 20. # ###################################################################### SELECT md5, right(mode, 4) as mode, uid, gid, hostname, name FROM snapshot WHERE mode RLIKE '^4[0246][0-7]{2}[2367]$' AND uid <= 20 ORDER BY hostname, name; SELECT md5, right(mode, 4) as mode, uid, gid, hostname, name FROM snapshot WHERE mode RLIKE '^10[0246][0-7]{2}[2367]$' AND uid <= 20 ORDER BY hostname, name; SELECT hostname, count(mode) AS 'non-sticky, world-writable directories with uid <=20' FROM snapshot WHERE mode RLIKE '^4[0246][0-7]{2}[2367]$' AND uid <= 20 GROUP BY hostname; SELECT hostname, count(mode) AS 'world-writable files with uid <=20' FROM snapshot WHERE mode RLIKE '^10[0246][0-7]{2}[2367]$' AND uid <= 20 GROUP BY hostname; ###################################################################### # # Select world readable archives with uid <= 20. # ###################################################################### SELECT md5, right(mode, 4) as mode, uid, gid, hostname, name FROM snapshot WHERE name RLIKE '[.](tar|tgz|tbz2|zip|gz|bz|bz2)"$' AND mode RLIKE '^10[0-7]{3}[4567]$' AND uid <= 20 ORDER BY hostname, name; SELECT hostname, count(mode) AS 'world-readable archives with uid <=20' FROM snapshot WHERE name RLIKE '[.](tar|tgz|tbz2|zip|gz|bz|bz2)"$' AND mode RLIKE '^10[0-7]{3}[4567]$' AND uid <= 20 GROUP BY hostname; ###################################################################### # # Select world writable files that have well-known script extensions. # ###################################################################### SELECT md5, right(mode, 4) as mode, uid, gid, hostname, name FROM snapshot WHERE name RLIKE '[.](awk|sh|pl)"$' and mode RLIKE '10...[2367]'; --- ftimes-map-queries.sql --- The command provided here will extract the preceding sql statements into a separate file. However, the filename argument must match the actual name of this recipe for the command to work properly. sed -e '1,/^--- ftimes-map-queries.sql ---$/d; /^--- ftimes-map-queries.sql ---$/,$d' ftimes-map-db-setup.txt > ftimes-map-queries.sql