Version 0.1, 3 January 2005, http://www.neilvandyke.org/urlskip/
by
Neil W. Van Dyke
<neil@neilvandyke.org>
Copyright © 2005 Neil W. Van Dyke. This program is Free Software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License [GPL] for details. For other license options and commercial consulting, contact the author.
The UrlSkip Scheme library provides a function that translates some of the Web URLs that might be used to track a user across sites, by removing intermediate HTTP redirectors or information that might identify the user. Such a function might be used as part of a privacy-enhancing Web browser, or to canonicalize or un-obfuscate URLs for Web analysis projects.
Note that UrlSkip is not intended to remove information used by “affiliate” referral programs to identify site operators that have sent users to a site. However, in some cases this affiliate ID information might be lost in the process of removing a intermediary URL that is used by a third party to track and profile users.
UrlSkip currently requires R5RS, the [uri.scm] library, and a particular
regular expression function. Therefore, UrlSkip currently works only with
PLT MzScheme, although it will be made more portable once uri.scm
is.
UrlSkip is released under the GPL license, unlike most of the author's other released Scheme libraries, which are LGPL.
The procedures in this section are used internally by the urlskip
procedure, and correspond to particular HTTP server hostnames. They are
exposed here mainly for purposes of documentation, and are likely to change
in future versions of UrlSkip. Each procedure accepts a uriobj and
yields either a new URL string of a simpler URL, or #f if no simpler
URL was determined.
UrlSkips
http://ad.doubleclick.net:
Substring following;;~sscs=%3f.
UrlSkips
http://click.linksynergy.com:
If path/fs-bin/stat, then query valueRD_PARM1orrd_parm1.
UrlSkips
http://rds.yahoo.com:
Substring of thehttpURL following*-.
UrlSkips
http://service.netmeans.com:
If path/bfast/click, then query valueloc.
UrlSkips
http://web.ask.com:
If path/redir, then query valuebu.
UrlSkips
http://www.amazon.com:
If path/exec/obidos/redirect, then remove all query values except fortagandpath.
UrlSkips
http://www.anrdoezrs.net:
Query valueurl.
UrlSkips
http://www.commission-junction.com:
If path/track/track.dll, then query valueURL.
UrlSkips
http://www.google.com:
If path/pagead/iclk, then query valueadurl.
If path/url, then query valueq.
UrlSkips
http://www.qksrv.net:
Query valuelocorurl.
The only real library interface is the urlskip procedure.
Accepts a URL uri and yields a URL that is either uri or a UrlSkip simplified version of same. uri may be a string or a uriobj. If a simplified URL is yielded, it is always a string.
The UrlSkip test suite can be enabled by editing the source code file and loading [Testeez]; the test suite is disabled by default.