Summary
GoAccess is used to analyze and display information from Apache2's access.log file. However, the version in the Ubuntu repository is not compiled with all the necessary options to make the best use of it, as well as being out of date.
Building From Source
Several package are needed to build GoAccess from source:
sudo apt-get install build-essential autoconf libncursesw5-dev libgeoip-dev libtokyocabinet-dev zlib1g-dev libbz2-dev libncurses-dev
The source code is then cloned:
cd ~
git clone https://github.com/allinurl/goaccess.git
cd goacccess
And finally the build + install:
autoreconf --force --install
./configure --enable-geoip --enable-utf8 --enable-tcb=btree --with-getline
make
sudo make install
Using GoAccess
In its simplest form, GoAccess can be run like this:
sudo goaccess --log-file=/var/log/apache2/access.log
There are many command-line options. See the man page for details. Perhaps somewhat more realistic would be something like this:
sudo goaccess \
--log-format=COMBINED \
--agent-list \
--hl-header \
--color-scheme=3 \
--real-os \
--ignore-panel=VIRTUAL_HOSTS \
--ignore-crawlers \
--with-mouse \
--log-file=/var/log/apache2/access.log
My guess is that GoAccess is of better use on a very active web server. In my case, I'm monitoring a server with little traffic, and so the way GoAccess can only monitor 1 file (typically /var/log/apache2/access.log) and not the rest of the rolled logged files was a problem.
To get around this, I wrote a script that uses GoAccess' support for Tokyo Cabinet database files to process all the rolled log files, and then I start GoAccess again with the normal /var/log/apache2/access.log filename. So even when the web server needs to reboot, I still see the full stats when i restart GoAccess.
#!/bin/bash -e
cmd=$(cat <<-HEREDOC
goaccess
--log-format=COMBINED
--agent-list
--hl-header
--color-scheme=3
--real-os
--db-path=/tmp/
--enable-panel=VISITORS
--enable-panel=REQUESTS
--enable-panel=REQUESTS_STATIC
--enable-panel=NOT_FOUND
--enable-panel=HOSTS
--enable-panel=OS
--enable-panel=BROWSERS
--enable-panel=VISIT_TIMES
--enable-panel=REFERRERS
--enable-panel=REFERRING_SITES
--enable-panel=KEYPHRASES
--enable-panel=GEO_LOCATION
--enable-panel=STATUS_CODES
--ignore-panel=VIRTUAL_HOSTS
--ignore-crawlers
--with-mouse
HEREDOC
)
cmd=$(echo ${cmd} | tr '\t\r\n' ' ')
# remove the old Tokyo Cabinet files (if any remain from last time)
sudo rm -f /tmp/*.tcb
# combine all the old access.log files into 1 file and process it first
filename=/tmp/$(basename $0).$$.tmp
sudo zmore /var/log/apache2/access.log.* >> ${filename}
sudo ${cmd} --keep-db-files --log-file=${filename} --output=report.json
rm -f ${filename} report.json
# now we'll re-load the previous content and tell it the name of the "real" file to monitor
sudo ${cmd} --load-from-disk --log-file=/var/log/apache2/access.log
# remove the Tokyo Cabinet files since we'll rebuild them next time anyway
rm -f /tmp/*.tcb