Simple statistics from httpd log files

I wanted to have a simple way of analysing the access_log generated by the Mosaic httpd server. So, I wrote a simple reorder.c program, which can print out the machine name from which the request came (m), the file that was being accessed (f), the day the page was accessed (d), and the time (and date) the page was accessed (t) in any desired order. Which parts have to be output is specified by the command line arguments, a sequence of `m', `f', `d', and `t' (separated by spaces). The file name is normalized, which means the following endings are cut-off: `/index.html', `/', and `/.', as these refer to the same file anyway. The program takes the access log file from the standard input.

To get a sorted list of the files accessed sorted on the number of different machines that accessed it, use

reorder f m <access_log|sort|uniq|cut -d' ' -f1|uniq -c|sort -r

To get the number of accesses per day, use

reorder d <access_log|sort|uniq -c

Per day

To know how many accesses there were per day, I wrote a simple per day filter, which reads a file in the format outputted by uniq -c, and divides the numbers by the value given as an argument. The following will return the pages sorted by number of accesses, where the accesses are given in pages per day:

reorder f m <access_log|sort|uniq|cut -d' ' -f1|uniq -c|sort -r|per_day `reorder d<access_log|uniq -c|wc -l`

Automatic generating a HTML page

When I found out I could read the access_log file from my own account, I decided to make a script file, which could generate an page with the information about the number of accesses to the pages that were accessed more than once. For this purpose I wrote a modified version of the per day filter, which generates some HTML code.

My hacker page