=== Summarising custom logs ===
The following uses lsof to check all open log files by the web server, (access and error logs).
\\
You will need to run this command first to save the logs files as LOGS:
LOGS=$(lsof -ln | awk '$4 ~ /[0-9]w/ && $5 ~ /REG/ {FILE[$NF]++}END{for (i in FILE) print i}')
\\
=== Browser and robot.txt check ===
Now you can run the following command to receive an output:
for log in $(echo "$LOGS" | grep access); do HIT_COUNT=$(grep $(date "+%d/%b/%Y") $log -c); if [[ "$HIT_COUNT" -ge 100 ]]; then echo -e "\n$log - $HIT_COUNT total hits today\n"; grep $(date "+%d/%b/%Y") $log | awk -F \" '{USER[$(NF-1)]++}END{for (i in USER) print USER[i],i}' | sort -n | tail -10 ; fi; done
\\
Example Output:
/var/www/lukeslinux.co.uk/access_log - 41417 total hits today
537 Mozilla/5.0 (Linux; Android 5.1.1; SAMSUNG SM-G920I Build/LMY47X) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/3.2 Chrome/38.0.2125.102 Mobile Safari/537.36
688 Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36
800 Links (2.8; Linux 2.6.32-573.7.1.el6.x86_64 x86_64; GNU C 4.4.7; dump)
917 Mozilla/5.0 (iPhone; CPU iPhone OS 9_2_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13D15 Safari/601.1
948 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36
1077 -
1485 Rackspace Monitoring/1.1 (https://monitoring.api.rackspacecloud.com)
3425 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
3458 Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
20386 Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2
\\
=== IP Check ===
for log in $(echo "$LOGS" | grep access); do HIT_COUNT=$(grep $(date "+%d/%b/%Y") $log -c); if [[ "$HIT_COUNT" -ge 100 ]]; then echo -e "\n$log - $HIT_COUNT total hits today\n"; grep $(date "+%d/%b/%Y") $log | awk '{ if ( $2 !~ /^[0-9]/ ) REQ[$1" "$6" "$7]++; if ( $2 ~ /^[0-9]/ ) REQ[$1" "$2" "$7" "$8]++}END{for (i in REQ) print REQ[i],i}' | sort -n | tail -10 ; fi; done
\\
Example output:
/var/log/nginx/lukeslinuxlessons.co.uk.access.log - 200 total hits today
2 180.76.15.157 "GET /
2 66.249.78.114 "GET /robots.txt
2 66.249.78.121 "POST /wp-admin/admin-ajax.php
2 88.119.179.121 "GET /wp-admin/admin-ajax.php?action=revslider_show_image&img=../wp-config.php
3 180.76.15.134 "GET /
3 94.236.7.190 "GET /favicon.ico
3 94.236.7.190 "GET /logrotate/
3 94.236.7.190 "POST /wp-admin/admin-ajax.php
7 113.190.128.197 "POST /xmlrpc.php
7 117.4.251.108 "POST /xmlrpc.php
\\
=== Finding Crawlers ===
**Note:** You will need to change date range and make sure you run the very first LOG command on this page first.
LC_ALL=C awk '/21\/Jul\/2016:20:4/ {REQ[FILENAME" "substr($0,index($0,$12))]++}END{for (i in REQ) print REQ[i],i}' $(echo "$LOGS" | grep access) | sort -rn | egrep -i "bot|crawl|spider|slurp" | head -25
\\
=== Accurate number of Apache requests per hour ===
**Note:** Change date range and log file locationg
LC_ALL=C awk '/02\/Aug\/2016/ && $0 !~ /(.js|.png|.jpg|.css|.ico) HTTP|.*Monitoring/' /var/log/nginx/exampledomain.com | cut -d[ -f2 | cut -d] -f1 | awk -F: '{print $2":00"}' | sort -n | uniq -c
\\
=== Log requests for Specific date/time range ===
**Note:** Change date/time and log file location
LC_ALL=C awk -F \" '/09\/Jun\/2016:(10:[12345]|11:[01])/ && $0 !~ /(.js|.png|.jpg|.css|.ico) HTTP/ {REQ[$2]++}END{for (i in REQ) print REQ[i],i}' /var/log/httpd/lexampledomain.com | sort -rn | head -50