Analyze your website traffic the simple way

Introduction

Every page owner comes to the point when she wants to see, if and how her site gets visited. Often the choice of the weapon is google analytics. It's so easy, isn't it? You just have to create an account, add a small JavaScript snippet to your page and then you can already see, how many visitors arrive at your site.

But I think, sometimes it's a bit too much. First of all, not everyone wants to create an account at google just to see, how much traffic a site gets. Also today many customers don't like to see, that their habits get tracked by google. They want to have a small digital fingerprint. The last point is of course performance. Even if the google analytics script isn't that large, it's still some code which has to get fetched and evaluated.

To the rescue, every web server has something built in which is called an access log. This gets updated as soon as a user comes to your page or does a request on any resource on the web server. Now, this is one of the ugly files, no one wants to read. It's a simple text file, which shows all the requests line per line in a not very readable format.

Note

My examples show the configuration for an nginx web server on an Ubuntu 12.04 system. This should be easily transferable to any other web server or system with the help of a search engine of your choice.

The Result of this post

The result of this post is a readable HTML access log. I set up an example here, where you can see, how it looks like. So you can decide, whether to read further or not.

Create simple visit reports with goaccess

Here a small tool comes into play, which is called goaccess. It can be used, to parse the log files and let it create a simple HTML page, with the data being shown in a nice way. The created report file shows you grouped information about the amount of visitors for the single pages, which static files get accessed most, with which browsers and operating systems the page was visited and so on.

On the website of the tool you find a small little set of instruction, on how to download and install it

After installing goaccess, you can simply create the report file with the following command. It assumes, that the log file is the default log file created by nginx and it will create the HTML file in the directory, in which you currently are. You can pass other paths, if your configuration differs or if you want to create the report file somewhere else.

goaccess -f /var/log/nginx/access.log -a > report.html

Create the report file automatically

Like you may already have noticed, the HTML report file doesn't get updated on new requests to the web site. To handle this, I chose a very old and reliable unix tool called cron. This is a handy little thing which can be used to tell the system to automatically execute some commands at specific times. Therefore I added the following line to the /etc/crontab file:

* * * * * root goaccess -f /path/to/access/log/filename -a > /path/to/output/html/filename

This tells the system to create a new report file every minute. More information on how to configure a cronjob can be found here. It could also make sense to create a new log file only once a day or once an hour if you don't need such a detailed report.

Access the report file

I let the report HTML file be created in a log directory of my site root, so I can easily make it accessible from the outside. This way I don't have to connect to the server via ssh to see the report.

Then I added a configuration for my web server to make the file accessible from the web. In my case I granted access to the log directory. I also added the option, that requests to that directory should not be tracked in the access log. Last but not least, I added a basic html authentication to the log resource to prevent foreign people to access these files.

In order to setup the users and their passwords for the access to that directory, I followed these instructions.

location /log {
    access_log off;
    auth_basic 'Restricted';
    auth_basic_user_file /etc/nginx/.htpasswd;
}

After a restart of the web server, the files in that directory can be properly accessed by the users, I created.

Final words

That's it. Now a nice report gets created automatically which can be accessed from the internet (with the right credentials).

I just want to add, that I like google analytics very much. It's an awesome tool which comes in handy, when you need detailed information of the visits and if you want to analyze exactly how the user interacts with your page. But very often you only have a bunch of static pages and want to see, how many people visit them. And then you have a small little tool which can give you exactly that without loading external scripts or create accounts at some external services.