Simple statistics from httpd log files
I wanted to have a simple way of analysing the access_log
generated by the Mosaic httpd server. So, I wrote a
simple reorder.c
program, which can print out the machine name from which
the request came (m), the file that
was being accessed (f), the day the page was accessed (d), and
the time (and date) the page was accessed (t) in any
desired order. Which parts have to be output is specified by
the command line arguments, a sequence of `m', `f', `d', and `t'
(separated by spaces).
The file name is normalized, which means the following endings
are cut-off: `/index.html', `/', and `/.',
as these refer to the same file anyway.
The program takes the access log file from the standard input.
To get a sorted list of the files accessed sorted on the
number of different machines that accessed it, use
reorder f m <access_log|sort|uniq|cut -d' ' -f1|uniq -c|sort -r
To get the number of accesses per day, use
reorder d <access_log|sort|uniq -c
Per day
To know how many accesses there were per day, I wrote a simple
per day filter, which reads a file
in the format outputted by uniq -c, and divides the
numbers by the value given as an argument.
The following will return the pages sorted by number of accesses,
where the accesses are given in pages per day:
reorder f m <access_log|sort|uniq|cut -d' ' -f1|uniq -c|sort -r|per_day `reorder d<access_log|uniq -c|wc -l`
Automatic generating a HTML page
When I found out I could read the access_log file from my own account,
I decided to make a script file, which
could generate an page with the
information about the number of accesses to the pages that were accessed
more than once. For this purpose I wrote a modified version
of the per day filter, which
generates some HTML code.
My hacker page