Ad Blocker using Squid Proxy on Raspberry Pi

Submitted by nigel on Sunday 17th September 2017

As mentioned in my previous blog Raspberry Pi with Squid Proxy Server for Testing Hand-Held Devices when Developing in a Sandbox it is possible (in fact quite usual) to use Squid to prevent user access to certain websites. This can include ad servers which serve up annoying ad content when trying to read articles. This tutorial shows how we can extend the functionality of the Squid server we have already set up to block ads.

The banning of ad content is achieved by downloading a blacklist of prohibited urls from a service provider, although it has to be mentioned that these lists depend upon the community to contribute any urls they discover and thus there can never be 100% confidence that such a list is ever complete. 

Once the list is downloaded from the service provider, Squid needs to be informed of its presence in its configuration and the ban will take the form of an acl. Ok now we know what we need to achieve let's start the process.

Lets log into the Raspberry Pi, switch to superuser, and download the blacklist.
Nigels-MacBook-Pro:Projects nigel$ ssh pi@192.168.0.201
pi@192.168.0.201s password: 
 
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
 
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sun Sep 10 14:33:42 2017 from 192.168.0.7
pi@raspberrypi:~ $ sudo su
root@raspberrypi:/home/pi# cd /etc/squid3/
root@raspberrypi:/etc/squid3# ls -las
total 832
  4 drwxr-xr-x   2 root root   4096 Sep  9 08:44 .
  4 drwxr-xr-x 112 root root   4096 Sep  9 11:17 ..
  4 -rw-r--r--   1 root root   1547 Dec 24  2016 errorpage.css
  4 -rw-r--r--   1 root root    421 Dec 24  2016 msntauth.conf
272 -rw-r--r--   1 root root 277585 Sep  9 08:44 squid.conf
272 -rw-r--r--   1 root root 277565 Sep  3 11:06 squid.conf.local-vm
272 -rw-r--r--   1 root root 277526 Sep  2 19:22 squid.conf.original
root@raspberrypi:/etc/squid3# curl -sS -L --compressed "http://pgl.yoyo.org/adservers/serverlist.php?hostformat=nohtml&showintro=0&mimetype=plaintext" > ad_block.txt 
root@raspberrypi:/etc/squid3# ls -las
total 872
  4 drwxr-xr-x   2 root root   4096 Sep 17 15:20 .
  4 drwxr-xr-x 112 root root   4096 Sep  9 11:17 ..
 40 -rw-r--r--   1 root root  40042 Sep 17 15:20 ad_block.txt
  4 -rw-r--r--   1 root root   1547 Dec 24  2016 errorpage.css
  4 -rw-r--r--   1 root root    421 Dec 24  2016 msntauth.conf
272 -rw-r--r--   1 root root 277585 Sep  9 08:44 squid.conf
272 -rw-r--r--   1 root root 277565 Sep  3 11:06 squid.conf.local-vm
272 -rw-r--r--   1 root root 277526 Sep  2 19:22 squid.conf.original
root@raspberrypi:/etc/squid3#
The blacklist is now in place - you can see it in the directory listing as file ad_block.txt. The next job is to add this to the Squid configuration as an acl. This needs to be placed above the http_access allow all line we put in the config in the previous blog.
## disable ads ( http://pgl.yoyo.org/adservers/ )
acl ads dstdom_regex "/etc/squid3/ad_block.txt"
http_access deny ads
You can use your favourite editor for this - I used vi - and then reboot the server.
root@raspberrypi:/etc/squid3# vi ad_block.txt 
root@raspberrypi:/etc/squid3# service squid3 restart
We are now ready to test our solution by landing on an ad heavy website such as NME. There are two excellent methods of doing this - one is checking the access.log in real time for TCP_DENIED records, and the other is to eyeball the site and see if the ads really are missing. Let's go for option one first.
root@raspberrypi:/etc/squid3# tail -f /var/log/squid3/access.log
# 
Looks like there is a bunch of them! To check let's pipe the access file through cordcount looking for TCP_DENIED
root@raspberrypi:/etc/squid3# cat /var/log/squid3/access.log | grep TCP_DENIED | wc
     44     440    5192
root@raspberrypi:/etc/squid3#
There we are - there have been 44 instances that have been blocked.
Eyeball the Websites
NME

NME was totally successful - empty spaces in the screen where the blocked ads would've been seen. In this instance two ads have been blocked. 

Whipser

The native Android app Whisper was partially successful. The main ad shows a diagnostic that the ad failed to launch - I'm happy with that - but it failed to block the small ad at the bottom of the screen. 

Automating the Fetching of the Blacklist
The list should be refreshed occasionally - like every week - in case there are new entries. The best way of achieving this is to put the earlier curl command in a shell script and then add a cron entry.
/etc/squid3/newads.sh
#!/bin/bash
curl -sS -L --compressed "http://pgl.yoyo.org/adservers/serverlist.php?hostformat=nohtml&showintro=0&mimetype=plaintext" > /etc/squid3/ad_block.txt 
 
## restart squid
/usr/sbin/squid3 restart
And now ad an entry to the root crontab - I have elected to download the script once a week just into Sunday morning.
crontab -e
# Squid3 fetch ad blocker blacklist
5 4 * * 1 /etc/squid3/newads.sh  > /dev/null 2>&1