Summarising custom logs

The following uses lsof to check all open log files by the web server, (access and error logs).
You will need to run this command first to save the logs files as LOGS:

LOGS=$(lsof -ln | awk '$4 ~ /[0-9]w/ && $5 ~ /REG/ {FILE[$NF]++}END{for (i in FILE) print i}')

Browser and robot.txt check

Now you can run the following command to receive an output:

for log in $(echo "$LOGS" | grep access); do HIT_COUNT=$(grep $(date "+%d/%b/%Y") $log -c); if [[ "$HIT_COUNT" -ge 100 ]]; then echo -e "\n$log - $HIT_COUNT total hits today\n"; grep $(date "+%d/%b/%Y") $log | awk -F \" '{USER[$(NF-1)]++}END{for (i in USER) print USER[i],i}' | sort -n | tail -10 ; fi; done

Example Output:
/var/www/lukeslinux.co.uk/access_log - 41417 total hits today

537 Mozilla/5.0 (Linux; Android 5.1.1; SAMSUNG SM-G920I Build/LMY47X) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/3.2 Chrome/38.0.2125.102 Mobile Safari/537.36
688 Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36
800 Links (2.8; Linux 2.6.32-573.7.1.el6.x86_64 x86_64; GNU C 4.4.7; dump)
917 Mozilla/5.0 (iPhone; CPU iPhone OS 9_2_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13D15 Safari/601.1
948 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36
1077 -
1485 Rackspace Monitoring/1.1 (https://monitoring.api.rackspacecloud.com)
3425 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
3458 Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
20386 Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2


IP Check

for log in $(echo "$LOGS" | grep access); do HIT_COUNT=$(grep $(date "+%d/%b/%Y") $log -c); if [[ "$HIT_COUNT" -ge 100 ]]; then echo -e "\n$log - $HIT_COUNT total hits today\n"; grep $(date "+%d/%b/%Y") $log | awk '{ if ( $2 !~ /^[0-9]/ ) REQ[$1" "$6" "$7]++; if ( $2 ~ /^[0-9]/ ) REQ[$1" "$2" "$7" "$8]++}END{for (i in REQ) print REQ[i],i}' | sort -n | tail -10 ; fi; done


Example output:

/var/log/nginx/lukeslinuxlessons.co.uk.access.log - 200 total hits today

2 180.76.15.157 "GET /
2 66.249.78.114 "GET /robots.txt
2 66.249.78.121 "POST /wp-admin/admin-ajax.php
2 88.119.179.121 "GET /wp-admin/admin-ajax.php?action=revslider_show_image&img=../wp-config.php
3 180.76.15.134 "GET /
3 94.236.7.190 "GET /favicon.ico
3 94.236.7.190 "GET /logrotate/
3 94.236.7.190 "POST /wp-admin/admin-ajax.php
7 113.190.128.197 "POST /xmlrpc.php
7 117.4.251.108 "POST /xmlrpc.php

Finding Crawlers

Note: You will need to change date range and make sure you run the very first LOG command on this page first.

LC_ALL=C awk '/21\/Jul\/2016:20:4/ {REQ[FILENAME" "substr($0,index($0,$12))]++}END{for (i in REQ) print REQ[i],i}' $(echo "$LOGS" | grep access) | sort -rn | egrep -i "bot|crawl|spider|slurp" | head -25

Accurate number of Apache requests per hour

Note: Change date range and log file locationg

LC_ALL=C awk '/02\/Aug\/2016/ && $0 !~ /(.js|.png|.jpg|.css|.ico) HTTP|.*Monitoring/' /var/log/nginx/exampledomain.com | cut -d[ -f2 | cut -d] -f1 | awk -F: '{print $2":00"}' | sort -n | uniq -c

Log requests for Specific date/time range

Note: Change date/time and log file location

LC_ALL=C awk -F \" '/09\/Jun\/2016:(10:[12345]|11:[01])/ && $0 !~ /(.js|.png|.jpg|.css|.ico) HTTP/ {REQ[$2]++}END{for (i in REQ) print REQ[i],i}' /var/log/httpd/lexampledomain.com | sort -rn | head -50