CheckWWW A1c Copyright (c) 1999 Allegro Consultants, Inc. Author: sieler@gmail.com CheckWWW is a utility that helps to check for possible problems with your web server. It has two primary functions, each of which is more fully documented later in this text: CHECKTREE Recursively "walks" down a set of HTML pages, looking for bad links. SCANLOG Scans a web server's access log file, looking for possible problems of two kinds: - files that did not get fully transferred; - files that are not bytestream files. CheckWWW may be run interactively, or may be given an INFO string. (See important note about log file format at the end of this help text) Commands: --------------------------- ALIAS [CLEAR | LIST | virtualdir [=] actualdir] Example: ALIAS /font/ /APACHE/FONTS ALIAS /cgi-bin/ /APACHE/PUB/cgi-bin --------------------------- Exit Terminate program --------------------------- CHECKtree [filename] [options] options: LP Sends most output to a printer, formal name LP. MAXDEPTH # (default: MAXDEPTH 999) MAXDEPTH # means: don't go "deeper" than the specified value (i.e., in terms of recursion). A value of 1 means "top level only". [NO]PROGRESS (default: PROGRESS) PROGRESS means: print a dot (.) every 1,000 records. [NO]QUIET (default: QUIET) NOQUIET means: print each filename as we check it. [NO]SHOWALL SHOWALL is like SHOWFINAL, but it reports on every file seen. [NO]SHOWFinal SHOWFINAL means: produce a report at the end of the CHECKTREE showing all filenames that are "interesting". An "interesting" file is one that has any of the following attributes true: i = found without "/index.html" N = Not a byte stream file! - = file missing when we tried to open it # = URL referring to file had "#label" in it S = file transfer was apparently short W = Wide record file (> 132 bytes) s = possible security access problem o = Old file (not accessed in 90 days) f = Future date on file n = non-HTML file Note: SHOWFINAL implies QUIET. (This can be overridden by saying: SHOWFINAL NOQUIET) SHOWMOST Like SHOWFINAL, but only shows files that are more interesting (first seven flags). SHOWSOME Like SHOWFINAL, but only shows files that are the most interesting (first two flags). [NO]TIMES (default: NOTIMES) TIMES means: report the CPU/elapsed time the CHECKTREE command took. The CHECKTREE command opens and read the specified file (default: index.html) and looks for URLs pointing to local files. For each one found, it recursively opens the specified file and checks it. NOTE: CHECKTREE will "see" an HREF or IMG tag correctly only if the tag is formatted tightly (the way some PC-based HTML checkers insist is "correct"): would not be checkd. --------------------------- HELP Displays this information. --------------------------- RESET (see SET) --------------------------- ROOT [ | NONE] (default: /WWW/WWW/ARPA/htdocs) --------------------------- SCANlog [webserver_log_filename] [options] (default: /WWW/APACHE/logs/access_log) The SCANLOG command checks the access_log file, looking for entries that might indicate problems. One of the items it reports is the number of probes ("GETs") done by the Code Red and NIMDA worms (cf. www.cert.org). options: [NO]COUNTS (default: NOCOUNTS) COUNTS means: produce a summary report showing all files "hit" (and the number of times they were hit). Default: NOCOUNTS [NO]KSAM (default: KSAM) KSAM means: create a session temporary nameless KSAM file to speed up operation. LP Sends most output to a printer, formal name LP. [NO]SHOWALL SHOWALL is like SHOWFINAL, but it reports on every file seen. [NO]SHOWCODERED (default: NOSHOWCODERED) SHOWCODERED means: report each Code Red worm probe, showing the nodename (or IP address) of the offending requestor (instead of the actual URL requested). [NO]SHOWFinal SHOWFINAL means: produce a report at the end of the CHECKTREE showing all filenames that are "interesting". An "interesting" file is one that has any of the following attributes true: i = found without "/index.html" N = Not a byte stream file! - = file missing when we tried to open it # = URL referring to file had "#label" in it S = file transfer was apparently short W = Wide record file (> 132 bytes) s = possible security access problem o = Old file (not accessed in 90 days) f = Future date on file n = non-HTML file Note: SHOWFINAL implies QUIET. (This can be overridden by saying: SHOWFINAL NOQUIET) SHOWMOST Like SHOWFINAL, but only shows files that are more interesting (first seven flags). [NO]SHOWNIMDA (default: NOSHOWNIMDA) SHOWNIMDA means: report each NIMDA worm probe, showing the nodename (or IP address) of the offending requestor (instead of the actual URL requested). SHOWNONBS Like SHOWFINAL, but only shows files that are not bytestream files. SHOWSOME Like SHOWFINAL, but only shows files that are the most interesting (first two flags). SHOW404 Causes 404 files to be listed as they are found. (404 means the web server couldn't find the file!) SUM404 Like SHOWFINAL, but only shows files that are were flagged as "404" in the access_log. (404 means the web server couldn't find the file!) [NO]SHOWALLNONBS (default: NOSHOWALLNONBS) SHOWALLNONBS means: list non-ByteStream files as we encounter them. Web servers other than QWEBS perform poorly when reading non-bytestream files. To convert a file to bytestream: tobyte -at ./foo.html ./foo.html or for binary files: tobyte ./foo.LZW ./foo.LZW [NO]SHOWSHORT (default: NOSHOWSHORT) SHOWSHORT means: list files whose transfer amount appears to be short (i.e., files that were not fully transferred). NOTE: if a file was enlarged after the first time it was accessed in a given log file, it will incorrectly show as "short". (E.g., if you've added text to an HTML document) [NO]SHOWUNrecognized (default: NOSHOWUNRECOGNIZED) SHOWUNRECOGNIZED means: display every record that had an operation other than GET or POST. [NO]SHOWWORMS (default: NOSHOWWORMS) Implies both [no]SHOWCODERED and [no]SHOWNIMDA. [NO]PROGRESS (default: PROGRESS) PROGRESS means: print a dot (.) every 1,000 records. [NO]SETVAR (default: NOSETVAR) SETVAR means: at end of a SCANLOG, create some CI variables with the prefix "CHECKWWW_". Try it and see (hint: SHOWVAR CHECKWWW@). STOPREC # (default: unused) STOPREC # means: stop scanning (reading) the access log file after # records. E.g., to only read the first 100 records, use STOPREC 100 [NO]TIMES (default: NOTIMES) TIMES means: report the CPU/elapsed time a SCANLOG takes. --------------------------- SET (and RESET) [options] SET and RESET allow you to set (and reset) various options: AUTOINDEX The AUTOINDEX option means: if the CHECK command sees a directory, it should automatically append "/index.html" and then try to open it. Default: SET AUTOINDEX DEBUG1 Default: RESET DEBUG1 KSAM Default: SET KSAM (assuming PARM=0) OLDDATES The OLDDATES option tells CHECKWWW to try to fetch/restore each file's access date during a CHECK operation. Default: SET OLDDATES PAGING Default: (batch) RESET PAGING (session) SET PAGING PROGRESS Default: SET PROGRESS (assuming PARM=0) ROOT | NONE Default: /WWW/WWW/ARPA/htdocs SETVAR Default: RESET SETVAR (assuming PARM=0) SHOWMISSING Default: RESET SHOWMISSING STRIP Default: SET STRIP (assuming PARM=0) TIMES Default: RESET TIMES (assuming PARM=0) --------------------------- USEq filename Reads & executes CHECKWWW commands from the specified file. USE filename ... displays the commands before execution. USEQ filename ... suppresses the command display. --------------------------- // Terminate program ----------------------------------------------------------------- A few options are initially set from the PARM value: PARM bit 15 (PARM = 1) : NOPROGRESS PARM bit 14 (PARM = 2) : SETVAR PARM bit 13 (PARM = 4) : NOKSAM PARM bit 12 (PARM = 8) : TIMES PARM bit 11 (PARM = 16) : NOSTRIP PARM bit 10 (PARM = 32) : BATCH The BATCH flag tells CHECKWWW to behave as it is being run in batch. This means that it won't try to paginate output. ----------------------------------------------------------------- At startup, CHECKWWW looks for two CI string variables: CHECKWWW_USEQFILE (checked first) and CHECKWWW_USEFILE (checked second, independently) If either exists, an implied "USEQ filename" or "USE filename" is done. ----------------------------------------------------------------- Sample run, showing INFO and PARM: :checkwww scan, 3 ...relatively quiet, scans /WWW/APACHE/logs/access_log and creates various CHECKWWW_@ variables. :checkwww scan, 2 Scanning: /WWW/APACHE/logs/access_log... [checked 56,000 records]; 8,001 short... (I hit control-Y after about 56,000 records read) Read 56,107 records *partial* result codes: # 200 : 54,788 # 302 : 147 # 304 : 696 # 401 : 130 # 404 : 346 Longest file name: 201 # fully transmitted: 46,967 # truncated/aborted: 8,030 # missing files (fserr 52&457): 58 # non-bytestream files: 0 Total # MB: 964 :showvar CHECKWWW@ CHECKWWW_RSLT_200 = 54788 CHECKWWW_RSLT_302 = 147 CHECKWWW_RSLT_304 = 696 CHECKWWW_RSLT_401 = 130 CHECKWWW_RSLT_404 = 346 CHECKWWW_RECORDS = 56107 CHECKWWW_FULL = 46967 CHECKWWW_SHORT = 8030 CHECKWWW_MISSING = 58 CHECKWWW_NONBS = 0 ----------------------------------------------------------------- NOTE: The style of access log records currently handled by CheckWWW is the original "standard". However, some web servers have a different format (and some of those allow you to choose to generate the original format). Here is a sample of the format expected: 195.44.9.20 - - [01/Jul/1999:02:03:34 -0700] "GET /india/hp_office_new.jpg HTTP/1.0" 200 106121 (pretend the two lines above are actually just a single line of text.) If your web server generates a different format log file, feel free to contact my (sieler@gmail.com) to discuss having CHECKWWW support it. ----------------------------------------------------------------- If CHECKWWW encounters a badly formed entry in the access log, it will report it like the following example: -------------------------------------------------------------- Expected "]" at end of timestamp [177120]: pc13.allegrosupport.com - - [24/Sep/ ^ nodename: pc13.allegrosupport.com filename: - username: - timestamp: get: rslt: 0 # bytes: 0 -------------------------------------------------------------- On an HP 3000/968, it takes about 3 minutes to read an access_log of 185,000 records. Here's the result of one such SCANLOG: Read 185,625 records Result codes: # bad : 1 # 200 : 180,800 # 302 : 636 # 304 : 1,908 # 400 : 2 # 401 : 292 # 403 : 12 # 404 : 1,973 Longest file name: 273 # fully transmitted: 95,395 # truncated/aborted: 86,364 # missing files (fserr 52 & 457): 193 # non-bytestream files: 0 KSAM info: # KSAM records: 1,141 # KSAM hits: 180,618 # KSAM misses: 1,327 # KSAM too long: 7 Total # MB: 3,401 ----------------------------------------------------------------- (updated 2001-03-21) 20150128