update.config

The update.config file controls how Traffic Server performs a scheduled update of specific local cache content. The file contains a list of URLs specifying objects that you want to schedule for update.

A scheduled update performs a local HTTP GET on the objects at the specific time or interval. You can control the following parameters for each specified object:

  • The URL
  • URL-specific request headers, which overrides the default
  • The update time and interval
  • The recursion depth

After you modify the update.config file, run the traffic_line -x command to apply changes. When you apply changes to one node in a cluster, Traffic Server automatically applies the changes to all other nodes in the cluster.

Supported Tag/Attribute Pairs

Scheduled update supports the following tag/attribute pairs when performing recursive URL updates:

  • <a href=" ">
  • <img src=" ">
  • <img href=" ">
  • <body background=" ">
  • <frame src=" ">
  • <iframe src=" ">
  • <fig src=" ">
  • <overlay src=" ">
  • <applet code=" ">
  • <script src=" ">
  • <embed src=" ">
  • <bgsound src=" ">
  • <area href=" ">
  • <base href=" ">
  • <meta content=" ">

Scheduled update is designed to operate on URL sets consisting of hundreds of input URLs (expanded to thousands when recursive URLs are included); it is not intended to operate on extremely large URL sets, such as those used by Internet crawlers.

Format

Each line in the update.config file uses the following format:

URL\request_headers\offset_hour\interval\recursion_depth\

The following list describes each field.

URL
HTTP-based URLs.
request_headers
Optional. A list of headers, separated by semicolons, passed in each GET request. You can define any request header that conforms to the HTTP specification; the default is no request header.
offset_hour
The base hour used to derive the update periods. The range is 00-23 hours.
interval
The interval (in seconds) at which updates should occur, starting at the offset hour.
recursion_depth
The depth to which referenced URLs are recursively updated, starting at the given URL. This field applies only to HTTP.

Examples

An example HTTP scheduled update is provided below:

http://www.company.com\User-Agent: noname user agent\13\3600\5\

The example specifies the URL and request headers, an offset hour of 13 (1 pm), an interval of one hour, and a recursion depth of 5. This would result in updates at 13:00, 14:00, 15:00, and so on. To schedule an update that occurs only once a day, use an interval value 86400 (i.e., 24 hours x 60 minutes x 60 seconds = 86400).

Specifying URL Regular Expressions (url_regex)

This section describes how to specify a url_regex. Entries of type url_regex within the configuration files use regular expressions to perform a match.

The following list provides examples to show how to create a valid url_regex.

x
Matches the character x
.
Match any character
^
Specifies beginning of line
$
Specifies end of line
[xyz]
A character class. In this case, the pattern matches either x, y, orz
[abj-oZ]
A character class with a range. This pattern matches a, b, any letter from j through o, or Z
[^A-Z]
A negated character class. For example, this pattern matches any character except those in the class.
r*
Zero or more r, where r is any regular expression.
r+
One or more r, where r is any regular expression.
r?
Zero or one r, where r is any regular expression.
r{2,5}
From two to five r, where r is any regular expression.
r{2,}
Two or more r, where r is any regular expression.
r{4}
Exactly four r, where r is any regular expression.
"[xyz]\"images"
The literal string [xyz]"images"
\X
If X is a, b, f, n, r, t, or v, then the ANSI-C interpretation of \x; otherwise, a literal X. This is used to escape operators such as *
\0
A NULL character
\123
The character with octal value 123
\x2a
The character with hexadecimal value 2a
(r)
Matches an r, where r is any regular expression. You can use parentheses to override precedence.
rs
The regular expression r, followed by the regular expression s
r|s
Either an r or an s
#<n>#
Inserts an end node, which causes regular expression matching to stop when reached. The value n is returned.

You can specify dest_domain=mydomain.com to match any host in mydomain.com. Likewise, you can specify dest_domain=. to match any request.