Ad Blocker using Squid Proxy on Raspberry Pi

Submitted by nigel on Sunday 17th September 2017

As mentioned in my previous blog Raspberry Pi with Squid Proxy Server for Testing Hand-Held Devices when Developing in a Sandbox it is possible (in fact quite usual) to use Squid to prevent user access to certain websites. This can include ad servers which serve up annoying ad content when trying to read articles. This tutorial shows how we can extend the functionality of the Squid server we have already set up to block ads.

The banning of ad content is achieved by downloading a blacklist of prohibited urls from a service provider, although it has to be mentioned that these lists depend upon the community to contribute any urls they discover and thus there can never be 100% confidence that such a list is ever complete. 

Once the list is downloaded from the service provider, Squid needs to be informed of its presence in its configuration and the ban will take the form of an acl. Ok now we know what we need to achieve let's start the process.

Lets log into the Raspberry Pi, switch to superuser, and download the blacklist.
Nigels-MacBook-Pro:Projects nigel$ ssh pi@192.168.0.201
pi@192.168.0.201s password: 
 
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
 
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sun Sep 10 14:33:42 2017 from 192.168.0.7
pi@raspberrypi:~ $ sudo su
root@raspberrypi:/home/pi# cd /etc/squid3/
root@raspberrypi:/etc/squid3# ls -las
total 832
  4 drwxr-xr-x   2 root root   4096 Sep  9 08:44 .
  4 drwxr-xr-x 112 root root   4096 Sep  9 11:17 ..
  4 -rw-r--r--   1 root root   1547 Dec 24  2016 errorpage.css
  4 -rw-r--r--   1 root root    421 Dec 24  2016 msntauth.conf
272 -rw-r--r--   1 root root 277585 Sep  9 08:44 squid.conf
272 -rw-r--r--   1 root root 277565 Sep  3 11:06 squid.conf.local-vm
272 -rw-r--r--   1 root root 277526 Sep  2 19:22 squid.conf.original
root@raspberrypi:/etc/squid3# curl -sS -L --compressed "http://pgl.yoyo.org/adservers/serverlist.php?hostformat=nohtml&showintro=0&mimetype=plaintext" > ad_block.txt 
root@raspberrypi:/etc/squid3# ls -las
total 872
  4 drwxr-xr-x   2 root root   4096 Sep 17 15:20 .
  4 drwxr-xr-x 112 root root   4096 Sep  9 11:17 ..
 40 -rw-r--r--   1 root root  40042 Sep 17 15:20 ad_block.txt
  4 -rw-r--r--   1 root root   1547 Dec 24  2016 errorpage.css
  4 -rw-r--r--   1 root root    421 Dec 24  2016 msntauth.conf
272 -rw-r--r--   1 root root 277585 Sep  9 08:44 squid.conf
272 -rw-r--r--   1 root root 277565 Sep  3 11:06 squid.conf.local-vm
272 -rw-r--r--   1 root root 277526 Sep  2 19:22 squid.conf.original
root@raspberrypi:/etc/squid3#
The blacklist is now in place - you can see it in the directory listing as file ad_block.txt. The next job is to add this to the Squid configuration as an acl. This needs to be placed above the http_access allow all line we put in the config in the previous blog.
## disable ads ( http://pgl.yoyo.org/adservers/ )
acl ads dstdom_regex "/etc/squid3/ad_block.txt"
http_access deny ads
You can use your favourite editor for this - I used vi - and then reboot the server.
root@raspberrypi:/etc/squid3# vi ad_block.txt 
root@raspberrypi:/etc/squid3# service squid3 restart
We are now ready to test our solution by landing on an ad heavy website such as NME. There are two excellent methods of doing this - one is checking the access.log in real time for TCP_DENIED records, and the other is to eyeball the site and see if the ads really are missing. Let's go for option one first.
root@raspberrypi:/etc/squid3# tail -f /var/log/squid3/access.log
1505665437.619     38 192.168.0.4 TCP_MISS/200 630 GET http://pagead2.googlesyndication.com/activeview? - HIER_DIRECT/216.58.208.130 image/gif
1505665440.152   3632 192.168.0.4 TCP_MISS/200 4009 CONNECT bs.serving-sys.com:443 - HIER_DIRECT/82.199.68.72 -
1505665449.828   8361 192.168.0.4 TCP_MISS/200 630 CONNECT ml314.com:443 - HIER_DIRECT/34.252.181.159 -
1505665452.178     78 192.168.0.4 TCP_MISS/204 740 GET http://beacon.krxd.net/event.gif? - HIER_DIRECT/54.228.222.160 image/gif
1505665491.360  69091 192.168.0.4 TCP_MISS/200 746 CONNECT www.facebook.com:443 - HIER_DIRECT/31.13.90.36 -
1505665493.069  65946 192.168.0.4 TCP_MISS/200 5038 CONNECT match.adsrvr.org:443 - HIER_DIRECT/54.247.91.116 -
1505665502.280  65538 192.168.0.4 TCP_MISS/200 5552 CONNECT odr.mookie1.com:443 - HIER_DIRECT/35.157.149.45 -
1505665519.263  91582 192.168.0.4 TCP_MISS/200 3318 CONNECT bam.nr-data.net:443 - HIER_DIRECT/162.247.242.21 -
1505665544.650 120399 192.168.0.4 TCP_MISS/200 1227 CONNECT session.timecommerce.net:443 - HIER_DIRECT/54.192.2.48 -
1505665554.600      7 192.168.0.4 TCP_MISS/503 4491 GET http://pagead2.googlesyndication.com/activeview? - HIER_NONE/- text/html
1505665577.132      8 192.168.0.4 TCP_DENIED/403 3709 CONNECT www.google-analytics.com:443 - HIER_NONE/- text/html
1505665577.146      9 192.168.0.4 TCP_DENIED/403 3727 CONNECT securepubads.g.doubleclick.net:443 - HIER_NONE/- text/html
1505665577.357    200 192.168.0.4 TCP_MISS/200 27785 GET http://www.nme.com/ - HIER_DIRECT/54.230.11.205 text/html
1505665577.396    249 192.168.0.4 TCP_MISS/200 252 GET http://uid1.vindicosuite.com/e/? - HIER_DIRECT/35.186.160.37 text/plain
1505665577.495      7 192.168.0.4 TCP_DENIED/403 4577 GET http://pagead2.googlesyndication.com/activeview? - HIER_NONE/- text/html
1505665577.567      2 192.168.0.4 TCP_DENIED/403 3727 CONNECT securepubads.g.doubleclick.net:443 - HIER_NONE/- text/html
1505665577.615      2 192.168.0.4 TCP_DENIED/403 3727 CONNECT securepubads.g.doubleclick.net:443 - HIER_NONE/- text/html
1505665577.748    337 192.168.0.4 TCP_MISS/200 364 GET http://www.summerhamster.com/bcn? - HIER_DIRECT/52.27.8.169 image/gif
1505665579.364     58 192.168.0.4 TCP_MISS/304 446 GET http://dnn506yrbagrg.cloudfront.net/pages/scripts/0025/8842.js - HIER_DIRECT/54.230.11.179 -
1505665579.380      2 192.168.0.4 TCP_DENIED/403 3727 CONNECT securepubads.g.doubleclick.net:443 - HIER_NONE/- text/html
1505665579.669     69 192.168.0.4 TCP_MISS/200 576 GET http://mpp.vindicosuite.com/bg/? - HIER_DIRECT/130.211.103.172 application/javascript
1505665579.676      4 192.168.0.4 TCP_DENIED/403 4350 POST http://timeinc.demdex.net/event? - HIER_NONE/- text/html
1505665579.761      3 192.168.0.4 TCP_DENIED/403 4176 GET http://b.scorecardresearch.com/b? - HIER_NONE/- text/html
1505665579.785     79 192.168.0.4 TCP_MISS/304 341 GET http://img.en25.com/i/elqCfg.min.js - HIER_DIRECT/23.207.182.156 application/x-javascript
1505665579.794    197 192.168.0.4 TCP_MISS/200 8677 GET http://uid1.vindicosuite.com/js/tm.js? - HIER_DIRECT/35.186.160.37 application/x-javascript
1505665579.802     64 192.168.0.4 TCP_MISS/304 577 GET http://cdn.optimizely.com/js/103816086.js - HIER_DIRECT/2.21.189.139 text/javascript
1505665579.852      3 192.168.0.4 TCP_DENIED/403 6207 GET http://as.casalemedia.com/cygnus? - HIER_NONE/- text/html
1505665579.928     83 192.168.0.4 TCP_MISS/200 619 GET http://uconnect.tealiumiq.com/ulog/_error? - HIER_DIRECT/54.72.62.235 image/gif
1505665579.939      7 192.168.0.4 TCP_DENIED/403 10853 GET http://timeinc.demdex.net/event? - HIER_NONE/- text/html
1505665579.989      2 192.168.0.4 TCP_DENIED/403 3718 CONNECT googleads.g.doubleclick.net:443 - HIER_NONE/- text/html
1505665580.067      3 192.168.0.4 TCP_DENIED/403 6209 GET http://as.casalemedia.com/cygnus? - HIER_NONE/- text/html
1505665580.696      3 192.168.0.4 TCP_DENIED/403 3727 CONNECT securepubads.g.doubleclick.net:443 - HIER_NONE/- text/html
1505665580.728      3 192.168.0.4 TCP_DENIED/403 4014 POST http://as.casalemedia.com/headerstats? - HIER_NONE/- text/html
1505665580.756      3 192.168.0.4 TCP_DENIED/403 4061 GET http://p.skimresources.com/px.gif? - HIER_NONE/- text/html
1505665580.758      2 192.168.0.4 TCP_DENIED/403 4061 GET http://p.skimresources.com/px.gif? - HIER_NONE/- text/html
1505665580.763      2 192.168.0.4 TCP_DENIED/403 4446 GET http://r.skimresources.com/api/? - HIER_NONE/- text/html
1505665580.829      2 192.168.0.4 TCP_DENIED/403 4014 POST http://as.casalemedia.com/headerstats? - HIER_NONE/- text/html
1505665581.101    222 192.168.0.4 TCP_MISS/200 413 GET http://s1642912926.t.eloqua.com/visitor/v200/svrGP? - HIER_DIRECT/142.0.160.13 image/gif
1505665581.101    225 192.168.0.4 TCP_MISS/200 252 GET http://uid1.vindicosuite.com/e/? - HIER_DIRECT/35.186.160.37 text/plain
1505665581.462     53 192.168.0.4 TCP_MISS/200 726 GET http://0914.global.ssl.fastly.net/ad2/script/x.js? - HIER_DIRECT/151.101.60.249 text/javascript
1505665581.492     44 192.168.0.4 TCP_MISS/200 759 GET http://0914.global.ssl.fastly.net/ad2/img/x.gif? - HIER_DIRECT/151.101.60.249 image/gif
1505665581.496     43 192.168.0.4 TCP_MISS/200 759 GET http://0914.global.ssl.fastly.net/ad2/img/x.gif? - HIER_DIRECT/151.101.60.249 image/gif
1505665581.661      2 192.168.0.4 TCP_DENIED/403 3709 CONNECT www.google-analytics.com:443 - HIER_NONE/- text/html
1505665581.666      2 192.168.0.4 TCP_DENIED/403 3709 CONNECT www.google-analytics.com:443 - HIER_NONE/- text/html
1505665581.732      2 192.168.0.4 TCP_DENIED/403 3709 CONNECT www.google-analytics.com:443 - HIER_NONE/- text/html
1505665582.039    204 192.168.0.4 TCP_MISS/204 657 GET http://gwiq.globalwebindex.net/gwiq/? - HIER_DIRECT/69.16.175.10 text/javascript
1505665582.099     90 192.168.0.4 TCP_MISS/200 8775 GET http://ksassets.timeincuk.net/wp/uploads/sites/55/2016/10/cropped-nme-site-icon-300x300.jpg - HIER_DIRECT/52.85.74.98 image/jpeg
1505665582.208      2 192.168.0.4 TCP_DENIED/403 4197 GET http://b.scorecardresearch.com/b? - HIER_NONE/- text/html
1505665582.230    114 192.168.0.4 TCP_MISS/304 534 GET http://ksassets.timeincuk.net/wp/uploads/sites/55/2016/10/cropped-nme-site-icon-300x300.jpg - HIER_DIRECT/52.85.74.98 -
1505665582.314    262 192.168.0.4 TCP_MISS/200 605 GET http://po.st/v1/status? - HIER_DIRECT/74.217.253.90 application/javascript
1505665582.414     71 192.168.0.4 TCP_MISS/200 410 GET http://p.po.st/p? - HIER_DIRECT/208.146.36.215 image/gif
1505665586.135      2 192.168.0.4 TCP_DENIED/403 3925 POST http://t.skimresources.com/api/link - HIER_NONE/- text/html
1505665592.588  13319 192.168.0.4 TCP_MISS/200 4157 CONNECT c.disquscdn.com:443 - HIER_DIRECT/104.16.77.166 -
1505665592.589  15014 192.168.0.4 TCP_MISS/200 154 CONNECT cdn.shopify.com:443 - HIER_DIRECT/151.101.62.110 -
1505665592.591  15018 192.168.0.4 TCP_MISS/200 154 CONNECT cdn.shopify.com:443 - HIER_DIRECT/151.101.62.110 -
1505665592.592  15460 192.168.0.4 TCP_MISS/200 213 CONNECT csi.gstatic.com:443 - HIER_DIRECT/216.58.201.163 -
1505665592.592  13326 192.168.0.4 TCP_MISS/200 3666 CONNECT disqus.com:443 - HIER_DIRECT/151.101.64.134 -
1505665601.000      4 192.168.0.4 TCP_DENIED/403 6210 GET http://as.casalemedia.com/cygnus? - HIER_NONE/- text/html
1505665601.173      4 192.168.0.4 TCP_DENIED/403 4014 POST http://as.casalemedia.com/headerstats? - HIER_NONE/- text/html
1505665601.173      2 192.168.0.4 TCP_DENIED/403 3727 CONNECT securepubads.g.doubleclick.net:443 - HIER_NONE/- text/html
1505665603.359      8 192.168.0.4 TCP_DENIED/403 3709 CONNECT www.google-analytics.com:443 - HIER_NONE/- text/html
1505665603.359      3 192.168.0.4 TCP_DENIED/403 3727 CONNECT securepubads.g.doubleclick.net:443 - HIER_NONE/- text/html
1505665603.557    150 192.168.0.4 TCP_MISS/200 27785 GET http://www.nme.com/ - HIER_DIRECT/54.230.11.205 text/html
1505665603.560    200 192.168.0.4 TCP_MISS/200 252 GET http://uid1.vindicosuite.com/e/? - HIER_DIRECT/35.186.160.37 text/plain
1505665603.720      2 192.168.0.4 TCP_DENIED/403 3727 CONNECT securepubads.g.doubleclick.net:443 - HIER_NONE/- text/html
1505665603.889    173 192.168.0.4 TCP_MISS/200 364 GET http://www.summerhamster.com/bcn? - HIER_DIRECT/52.27.8.169 image/gif
1505665605.489     24 192.168.0.4 TCP_MISS/304 446 GET http://dnn506yrbagrg.cloudfront.net/pages/scripts/0025/8842.js - HIER_DIRECT/54.230.11.179 -
1505665605.595     24 192.168.0.4 TCP_MISS/304 341 GET http://img.en25.com/i/elqCfg.min.js - HIER_DIRECT/23.207.182.156 application/x-javascript
1505665605.601     34 192.168.0.4 TCP_MISS/200 576 GET http://mpp.vindicosuite.com/bg/? - HIER_DIRECT/130.211.103.172 application/javascript
1505665605.615      3 192.168.0.4 TCP_DENIED/403 4350 POST http://timeinc.demdex.net/event? - HIER_NONE/- text/html
1505665605.620      2 192.168.0.4 TCP_DENIED/403 4176 GET http://b.scorecardresearch.com/b? - HIER_NONE/- text/html
1505665605.679      2 192.168.0.4 TCP_DENIED/403 3709 CONNECT www.google-analytics.com:443 - HIER_NONE/- text/html
1505665605.686      4 192.168.0.4 TCP_DENIED/403 6207 GET http://as.casalemedia.com/cygnus? - HIER_NONE/- text/html
1505665605.734      6 192.168.0.4 TCP_DENIED/403 10875 GET http://timeinc.demdex.net/event? - HIER_NONE/- text/html
1505665605.745      2 192.168.0.4 TCP_DENIED/403 3718 CONNECT googleads.g.doubleclick.net:443 - HIER_NONE/- text/html
1505665605.748    207 192.168.0.4 TCP_MISS/200 8677 GET http://uid1.vindicosuite.com/js/tm.js? - HIER_DIRECT/35.186.160.37 application/x-javascript
1505665605.776     38 192.168.0.4 TCP_MISS/200 619 GET http://uconnect.tealiumiq.com/ulog/_error? - HIER_DIRECT/54.72.62.235 image/gif
1505665605.882      3 192.168.0.4 TCP_DENIED/403 4062 GET http://p.skimresources.com/px.gif? - HIER_NONE/- text/html
1505665605.887      3 192.168.0.4 TCP_DENIED/403 4062 GET http://p.skimresources.com/px.gif? - HIER_NONE/- text/html
1505665605.892      3 192.168.0.4 TCP_DENIED/403 4446 GET http://r.skimresources.com/api/? - HIER_NONE/- text/html
1505665605.998    111 192.168.0.4 TCP_MISS/200 413 GET http://s1642912926.t.eloqua.com/visitor/v200/svrGP? - HIER_DIRECT/142.0.160.13 image/gif
1505665606.169      2 192.168.0.4 TCP_DENIED/403 3727 CONNECT securepubads.g.doubleclick.net:443 - HIER_NONE/- text/html
1505665606.175      2 192.168.0.4 TCP_DENIED/403 4014 POST http://as.casalemedia.com/headerstats? - HIER_NONE/- text/html
1505665606.669    208 192.168.0.4 TCP_MISS/200 252 GET http://uid1.vindicosuite.com/e/? - HIER_DIRECT/35.186.160.37 text/plain
1505665606.788     22 192.168.0.4 TCP_MISS/200 728 GET http://0914.global.ssl.fastly.net/ad2/script/x.js? - HIER_DIRECT/151.101.60.249 text/javascript
1505665606.813     22 192.168.0.4 TCP_MISS/200 759 GET http://0914.global.ssl.fastly.net/ad2/img/x.gif? - HIER_DIRECT/151.101.60.249 image/gif
1505665606.817     24 192.168.0.4 TCP_MISS/200 759 GET http://0914.global.ssl.fastly.net/ad2/img/x.gif? - HIER_DIRECT/151.101.60.249 image/gif
1505665607.000      3 192.168.0.4 TCP_DENIED/403 3709 CONNECT www.google-analytics.com:443 - HIER_NONE/- text/html
1505665607.003      2 192.168.0.4 TCP_DENIED/403 3709 CONNECT www.google-analytics.com:443 - HIER_NONE/- text/html
^C
root@raspberrypi:/etc/squid3# 
Looks like there is a bunch of them! To check let's pipe the access file through cordcount looking for TCP_DENIED
root@raspberrypi:/etc/squid3# cat /var/log/squid3/access.log | grep TCP_DENIED | wc
     44     440    5192
root@raspberrypi:/etc/squid3#
There we are - there have been 44 instances that have been blocked.
Eyeball the Websites
NME

NME was totally successful - empty spaces in the screen where the blocked ads would've been seen. In this instance two ads have been blocked. 

Whipser

The native Android app Whisper was partially successful. The main ad shows a diagnostic that the ad failed to launch - I'm happy with that - but it failed to block the small ad at the bottom of the screen. 

Automating the Fetching of the Blacklist
The list should be refreshed occasionally - like every week - in case there are new entries. The best way of achieving this is to put the earlier curl command in a shell script and then add a cron entry.
/etc/squid3/newads.sh
#!/bin/bash
curl -sS -L --compressed "http://pgl.yoyo.org/adservers/serverlist.php?hostformat=nohtml&showintro=0&mimetype=plaintext" > /etc/squid3/ad_block.txt 
 
## restart squid
/usr/sbin/squid3 restart
And now ad an entry to the root crontab - I have elected to download the script once a week just into Sunday morning.
crontab -e
# Squid3 fetch ad blocker blacklist
5 4 * * 1 /etc/squid3/newads.sh  > /dev/null 2>&1
blog terms
Squid Raspberry Pi Linux