Version 0.1, 18 August 2004, http://www.neilvandyke.org/uri-scm/
by
Neil W. Van Dyke
<neil@neilvandyke.org>
Copyright © 2004 Neil W. Van Dyke. This program is Free Software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU Lesser General Public License [LGPL] for details.
Note: This version of the library has endured some testing, and is being used in at least one production application, but be advised that some design refinements and API changes are expected. Especially, we are reconsidering the use of immutable strings and pairs, would like to add more of the extensibility of [UriFrame], and need to look at the forthcoming IETF-W3C standards.
uri.scm is a Scheme code library for parsing, representing, and
transforming Web Uniform Resource Identifiers (URI) [RFC2396], which
includes Uniform Resource Locators (URL) and Uniform Resource Names (URN).
It supports absolute and relative URIs and URI references.
The library provides two separate interfaces, based on the two supported representations: a convenient verbatim string representation, and a parsed representation. From an interface standpoint, URI are immutable objects, and all operations are functions yielding the same or new immutable URI objects. Functionality specific to individual URI schemes is generally outside the scope of this library, and is better supported via separate companion libraries.
This library has been designed after experience with [UriFrame], which was
specific to PLT Scheme and dependent on a heavyweight object system. This
library appears in some ways simliar to the uri module of [SLIB],
but is intended to provide additional functionality.
The current version of this library is specific to PLT Scheme, but it has
been written with the intention of being portable shortly to most R5RS
Scheme implementations that support popular SRFIs. It officially requires
[SRFI-6], [SRFI-8], [SRFI-9], [SRFI-13] (for string-downcase),
[SRFI-16], [SRFI-23], and [SRFI-39].1 It also requires some regular expression
operations that can be provided by the [Pregexp] library if the
implementation does not provide appropriate native operations. If the
implementation provides immutable strings and pairs, the library will take
advantage of them; on implementations that do not provide these, copying
and hopeful wishes will be used.
Several procedures to support escaping and unescaping of URI component
strings, as described in [RFC2396 sec. 2.4], are provided. Also provided
are escaping and unescaping procedures that also support + as an
encoding of a space character, as is used in some HTTP encodings of HTML
forms.
These procedures have multiple variants, concerning mutability of the strings they yield, and following the naming convention:
foo
foo-i
foo/shareok
Many applications will not call these procedures directly, since most of this library's interface automatically escapes and unescapes strings as appropriate.
| uri-escape str [start [end]] => string | Procedure |
| uri-escape-i str [start [end]] => string | Procedure |
| uri-escape/shareok str [start [end]] => string | Procedure |
|
Yields a URI-escaped encoding of string str. If start and end are given, then they designate the substring of str to use. All characters are escaped, except alphanumerics, minus, underscore, period, and tilde. For example. (uri-escape "a = b/c + d") => "a%20%3D%20b%2Fc%20%2B%20d"
|
| uri-plusescape str [start [end]] => string | Procedure |
| uri-plusescape-i str [start [end]] => string | Procedure |
| uri-plusescape/shareok str [start [end]] => string | Procedure |
|
Like (uri-plusescape "a = b/c + d") => "a+%3D+b%2Fc+%2B+d"
|
| uri-unescape str [start [end]] => string | Procedure |
| uri-unescape-i str [start [end]] => string | Procedure |
| uri-unescape/shareok str [start [end]] => string | Procedure |
|
Yields an URI-unescaped string from the encoding in string (uri-unescape "a%20b+c%20d") => "a b+c d"
|
| uri-unplusescape str [start [end]] => string | Procedure |
| uri-unplusescape-i str [start [end]] => string | Procedure |
| uri-unplusescape/shareok str [start [end]] => string | Procedure |
|
Like (uri-unplusescape "a%20b+c%20d") => "a b c d"
|
| char->uri-escaped-string chr => string | Procedure |
| char->uri-escaped-string-i chr => string | Procedure |
|
Yields a URI-escaped of character chr. For example: (char->uri-escaped-string #\/) => "%2F"
|
This section describes the "URI string" API, while the next section
describes the "URI object," (uriobj) API. All procedures in this
section yield URIs using immutable strings, and accept URIs as strings
(immutable or mutable) or as the opaque objects described in the next
section.
| display-uri uri port => undef | Procedure |
| display-uri/nofragment uri port => undef | Procedure |
|
Displays uri to output port port. For example: (display-uri "http://s/foo#bar" (current-output-port))
-| http://s/foo#bar
(display-uri/nofragment "http://s/foo#bar" (current-output-port))
-| http://s/foo
|
| uri->string uri => string | Procedure |
|
Yields the full string representation of URI uri. Of course this is
not needed when using only the string representation of URI, but using this
procedure in libraries permits the (define my-uriobj (string->uriobj "http://www/"))
my-uriobj => #<uriobj>
(uri->string my-uriobj) => "http://www/"
(uri->string "http://www/") => "http://www/"
|
URI schemes are currently represented as lowercase Scheme symbols and associated data.
| ftp-uri-scheme => urischeme | Variable |
| gopher-uri-scheme => urischeme | Variable |
| http-uri-scheme => urischeme | Variable |
| https-uri-scheme => urischeme | Variable |
| imap-uri-scheme => urischeme | Variable |
| ipp-uri-scheme => urischeme | Variable |
| news-uri-scheme => urischeme | Variable |
| nfs-uri-scheme => urischeme | Variable |
| telnet-uri-scheme => urischeme | Variable |
|
Some common URI scheme symbols, as a convenience for Scheme code that must be portable to Scheme implementations with case-insensitive readers. For example, in some Scheme implementations: 'ftp => FTP
ftp-uri-scheme => ftp
|
| uri-scheme uri => urischeme | Procedure |
|
Yields the URI scheme of uri, or (uri-scheme "Http://www") => http
|
| register-uri-scheme-default-portnum sym portnum => undef | Procedure |
|
Registers integer portnum as the default port number for the server authority component of URI scheme sym. (define x-foo-uri-scheme (string->symbol "x-foo"))
(register-uri-scheme-default-portnum x-foo-uri-scheme 007)
(register-uri-scheme-default-portnum x-foo-uri-scheme 666)
error--> cannot change uri scheme default portnum: x-foo 7 666
|
| register-uri-scheme-hierarchical sym => undef | Procedure |
|
Registers URI scheme sym as having a "hierarchical" form as described in [RFC2396 sec. 3]. |
| uri-fragment uri => string-or-f | Procedure |
| uri-fragment/escaped uri => string-or-f | Procedure |
|
Yields the fragment identifier component of URI (or URI reference)
uri as a string, or (uri-fragment "foo#a%20b") => "a b"
(uri-fragment/escaped "foo#a%20b") => "a%20b"
|
| uri-without-fragment uri => string | Procedure |
|
Yields uri without the fragment component. For example: (uri-without-fragment "http://w/#bar") => "http://w/"
|
| uri-with-fragment uri fragment => string | Procedure |
| uri-with-fragment/escaped uri fragment => string | Procedure |
|
Yields a URI that is like uri except with the fragment fragment
(or no fragment if fragment is (uri-with-fragment "http://w/" "foo") => "http://w/#foo"
(uri-with-fragment "http://w/#foo" "bar") => "http://w/#bar"
(uri-with-fragment "http://w/#bar" #f) => "http://w/"
The (uri-with-fragment "foo" "a b") => "foo#a%20b"
(uri-with-fragment/escaped "foo" "a%20b") => "foo#a%20b"
|
This and some of the following subsections concern "hierarchical" generic URI syntax as described in [RFC2396 sec. 3].
| uri-hierarchical? uri => boolean | Procedure |
|
Yields a Boolean value for whether or not the URI scheme of URI uri is known to have a "hierarchical" generic URI layout. For example: (uri-hierarchical? "http://www/") => #t
(uri-hierarchical? "mailto://www/") => #f
(uri-hierarchical? "//www/") => #f
|
Several procedures extract the server authority values from URIs [RFC2396 sec. 3.2.2].
| uri-server-userinfo+host+portnum uri => (string-or-f, string-or-f, integer-or-f) | Procedure |
|
Yields three values for the server authority of URI uri: the userinfo
as a string (or (uri-server-userinfo+host+portnum "ftp://anon@ftp.foo.bar/")
=> "anon" "ftp.foo.bar" 21
|
| uri-server-userinfo uri => string-of-f | Procedure |
| uri-server-host uri => string-of-f | Procedure |
| uri-server-portnum uri => integer-or-f | Procedure |
|
Yield the respective part of the server authority of uri. See the
discussion of |
A parsed hierarchical path [RFC2396 sec. 3] is represented in
uri.scm as a tuple of a list of path segments and an upcount.
The list of path segments does not contain any "." or
".." relative components, as those are removed during parsing.
The upcount is either #f, meaning an absolute path, or an integer 0
or greater, meaning a relative path of that many levels "up." A path
segment without any parameters is represented as either a string or, if
empty, #f. For example:
(uri-path-upcount+segments "/a/b/") => #f ("a" "b" #f)
(uri-path-upcount+segments "/a/b/c") => #f ("a" "b" "c")
(uri-path-upcount+segments "/a/../../../b/c") => 2 ("b" "c")
and:
(uri-path-upcount+segments "/.") => #f ()
(uri-path-upcount+segments "/") => #f (#f)
(uri-path-upcount+segments ".") => 0 ()
(uri-path-upcount+segments "") => 0 (#f)
(uri-path-upcount+segments "./") => 0 (#f)
(uri-path-upcount+segments "..") => 1 ()
(uri-path-upcount+segments "/..") => 1 ()
(uri-path-upcount+segments "../") => 1 (#f)
A path segment with parameters is represented as a list, with the first
element a string or #f for the path name, and the remaining elements
strings for the parameters. For example:
(uri-path-segments "../../a/b;p1/c/d;p2;p3/;p4")
=> ("a" ("b" "p1") "c" ("d" "p2" "p3") (#f "p4"))
In the current version of uri.scm, parsed paths are actually
represented in reverse, which simplifies path resolution and permits list
tails to be shared among potentially large numbers of long paths. For
example (uripath is a concept of the "object URI" API):
(let ((base (string->uripath "/a/b/c/index.html")))
(map (lambda (n)
(resolved-uripath (string->uripath n) base))
'("x.html" "y/y.html" "../z/z.html")))
=>
(("x.html" . #0=("c" . #1=("b" "a")))
("y.html" "y" . #0#)
("z.html" "z" . #1#))
| uri-path-upcount+segments uri => (integer-or-f, list-of-urisegment) | Procedure |
| uri-path-upcount+segments/reverse uri => (integer-or-f, list-of-urisegment) | Procedure |
|
Yields the path upcount and the segments of uri as two values. The
segments list should be considered immutable, as it might be shared
elsewhere. (uri-path-upcount+segments/reverse "../a/../../b/./c")
=> 2 ("c" "b")
(uri-path-upcount+segments "../a/../../b/./c")
=> 2 ("b" "c")
|
| uri-path-upcount uri => integer-or-f | Procedure |
| uri-path-segments uri => list-of-urisegment | Procedure |
| uri-path-segments/reverse uri => list-of-urisegment | Procedure |
|
See the documentation for (uri-path-upcount "../a/../../b/./c") => 2
(uri-path-segments "../a/../../b/./c") => ("b" "c")
(uri-path-segments/reverse "../a/../../b/./c") => ("c" "b")
|
| urisegment-name urisegment => string-or-f | Procedure |
| urisegment-params urisegment => list | Procedure |
| urisegment-name+params urisegment => (string-or-f, list) | Procedure |
| urisegment-has-params? urisegment => boolean | Procedure |
|
Yield the components of a parsed URI segment. The values should be considered immutable. For example: (urisegment-name+params "foo") => "foo" ()
(urisegment-name+params #f) => #f ()
(urisegment-name+params '("foo" "p1" "p2")) => "foo" ("p1" "p2")
(urisegment-name+params '(#f "p1" "p2")) => #f ("p1" "p2")
|
This library provides support for parsing the URI query component [RFC2396
sec. 3.4], as attribute-value lists in the manner of http URI scheme
queries. Parsed queries are represented as association lists, in which the
car of each pair is the attribute name as a string, and the cdr
is either the attribute value as a string or #t if no value given.
All strings are URI-unescaped. For example:
(uri-query "?q=fiendish+scheme&case&x=&y=1%2B2")
=>
(("q" . "fiendish scheme") ("case" . #t) ("x" . "") ("y" . "1+2"))
| uri-query uri => uriquery | Procedure |
|
Yields the parsed attribute-value query of uri, or (uri-query "?x=42&y=1%2B2") => (("x" . "42") ("y" . "1+2"))
|
| uri-query-value uri attr => string-or-t-or-f | Procedure |
|
Yields the value of attribute attr in uri's query, or (uri-query-value "?x=42&y=1%2B2" "y") => "1+2"
|
| uriquery-value uriquery attr => string-or-t-or-f | Procedure |
|
Yields the value of attribute attr in uriquery, or |
This subsection concerns resolving relative URI.
| absolute-uri? uri => boolean | Procedure |
|
Yields a Boolean value for whether or not URI uri is known by the library's criteria to be absolute. |
| resolved-uri uri base-uri => string | Procedure |
|
Yields a URI string that is URI uri possibly resolved with respect to URI base-uri, but not necessarily absolute. As an extension to [RFC2396] rules for resolution, base-uri may be a relative URI. (resolved-uri "x.html" "http://w/a/b/c.html")
=> "http://w/a/b/x.html"
(resolved-uri "//www:80/" "http:")
=> "http://www/"
|
| absolute-uri uri => string | Procedure |
|
Yields a URI that may be a variation on uri that has been forced to absolute (by, e.g., dropping relative path components, or supplying a missing path). The result might not be an absolute URI, however, due to limitations of the library or insufficient information in the URI. For example: (absolute-uri "http://w/../a") => "http://w/a"
(absolute-uri "http://w") => "http://w/"
|
| normalized-uri uri => string | Procedure |
|
Yields a possibly "normalized" variation on URI uri, such as by consistent use of escaping. The exact behavior of this procedure will change in future versions of the library. |
Note: The Object URI API is only sparsely documented, although many of its procedures have analogues in the String URI API, which is documented in the preceding section.
| uriobj? v | Procedure |
| string->uriobj str => uriobj | Procedure |
| string/base->uriobj str base-uri => uriobj | Procedure |
| string/base-uriobj->uriobj str base-uriobj => uriobj | Procedure |
| substring->uriobj str start end => uriobj | Procedure |
| substring/base->uriobj str start end base-uri => uriobj | Procedure |
| substring/base-uriobj->uriobj str start end base-uriobj => uriobj | Procedure |
| uri->uriobj uri => uriobj | Procedure |
| display-uriobj uriobj port => undef | Procedure |
| display-uriobj/nofragment uriobj port => undef | Procedure |
| uriobj->string uriobj => string | Procedure |
| uriobj->string/nofragment uriobj => string | Procedure |
| uriobj-scheme uriobj => urischeme | Procedure |
| uriobj-with-scheme uriobj urischeme => uriobj | Procedure |
| string->urischeme str => urischeme | Procedure |
| symbol->urischeme sym => urischeme | Procedure |
| urischeme->string => string | Procedure |
| urischeme-hierarchical? urischeme | Procedure |
| urischeme-default-portnum urischeme => integer-or-f | Procedure |
| uriobj-fragment uriobj => string-or-f | Procedure |
| uriobj-fragment/escaped uriobj => string-or-f | Procedure |
| uriobj-with-fragment uriobj fragment => uriobj | Procedure |
| uriobj-with-fragment/escaped uriobj fragment => uriobj | Procedure |
| uriobj-hierarchical? uriobj => boolean | Procedure |
| uriobj-uriserver uriobj => uriserver | Procedure |
| uriobj-uriserver+path+query uri => (uriserver, uripath, uriquery) | Procedure |
| uri-uriserver uri => uriserver | Procedure |
| uri-uriserver+uripath+uriquery uri => (uriserver, uripath, uriquery) | Procedure |
| uriobj-userinfo+host+portnum uriobj => (string-or-f, string-or-f, integer-or-f) | Procedure |
| uriobj-portnum uriobj => integer-or-f | Procedure |
| make-uriserver userinfo host portnum => uriserver | Procedure |
| make-uriserver/default-portnum userinfo host portnum default-portnum => uriserver | Procedure |
| make-or-reuse-uriserver userinfo host portnum base-uriserver => uriserver | Procedure |
| make-or-reuse-uriserver/default-portnum userinfo host portnum base-uriserver default-portnum => uriserver | Procedure |
| string->uriserver str => uriserver | Procedure |
| string/base->uriserver str base-uriserver => uriserver | Procedure |
| string/default-portnum->uriserver str default-portnum => uriserver | Procedure |
| string/base/default-portnum->uriserver str base-uriserver default-portnum => uriserver | Procedure |
| substring->uriserver str start end => uriserver | Procedure |
| substring/base->uriserver str start end base-uriserver => uriserver | Procedure |
| substring/default-portnum->uriserver str start end default-portnum => uriserver | Procedure |
| substring/base/default-portnum->uriserver str start end base-uriserver default-portnum => uriserver | Procedure |
| uriserver-userinfo uriserver => string-or-f | Procedure |
| uriserver-host uriserver => string-or-f | Procedure |
| uriserver-portnum uriserver => integer-or-f | Procedure |
| uriserver-userinfo+host+portnum uriserver => (string-or-f, string-or-f, integer-or-f) | Procedure |
| write-uriserver uriserver port | Procedure |
| uriserver-with-default-portnum uriserver default-portnum => uriserver | Procedure |
| resolved-uriserver uriserver base-uriserver => uriserver | Procedure |
| resolved-uriserver/default-portnum uriserver base-uriserver default-portnum => uriserver | Procedure |
| uri-path uri => uripath-or-f | Procedure |
| uri-path/noparams uri => uripath-or-f | Procedure |
| uriobj-uripath uriobj => uripath-or-f | Procedure |
| uriobj-uripath/noparams uriobj => uripath-or-f | Procedure |
| make-uripath upcount segments => uripath | Procedure |
| make-uripath/reverse upcount segments => uripath | Procedure |
| make-uripath/reverse/shareok upcount segments => uripath | Procedure |
| uripath-with-upcount uripath upcount => uripath | Procedure |
| string->uripath str => uripath | Procedure |
| string/base->uripath str base-uripath => uripath | Procedure |
| substring->uripath str start end => uripath | Procedure |
| substring/base->uripath str start end base-uripath => uripath | Procedure |
| uripath-upcount uripath => integer-or-f | Procedure |
| uripath-segments uripath => list | Procedure |
| uripath-segments/reverse uripath => list | Procedure |
| uripath-upcount+segments uripath => (integer-or-f, list) | Procedure |
| uripath-upcount+segments/reverse uripath => (integer-or-f, list) | Procedure |
| uripath-has-params? uripath => boolean | Procedure |
| write-uripath uripath port => undef | Procedure |
| write-uripath/leading-slash uripath port => undef | Procedure |
| uripath->string uripath => string | Procedure |
| uripath->string/leading-slash uripath => string | Procedure |
(uri-path-segments "//a/b") => ("b")
(uri-path-segments "/.//a/b") => (#f "a" "b")
(uripath->string (string->uripath "//b"))
=> "//b"
(uripath->string/leading-slash (string->uripath "//b"))
=> "/.//b"
(uripath->string/leading-slash (string->uripath "/a/b"))
=> "/a/b"
(uripath->string/leading-slash (string->uripath "/;p1/b"))
=> "/;p1/b"
|
| resolved-uripath uripath base-uripath => uripath | Procedure |
| absolute-uripath uripath => uripath | Procedure |
| uriobj-uriquery uriobj => uriquery | Procedure |
| string->uriquery str | Procedure |
| substring->uriquery str start end | Procedure |
| write-uriquery uriquery port => undef | Procedure |
| absolute-uriobj? uriobj => boolean | Procedure |
| resolved-uriobj uriobj base-uri => uriobj | Procedure |
| resolved-uriobj/base-uriobj uriobj base-uriobj => uriobj | Procedure |
| absolute-uriobj uriobj => uriobj | Procedure |
The uri.scm source code file defines a regression test suite for the
library itself, in procedure uri-internal:test. This test suite can
be disabled in the source code.