Internet-Draft Application Directives in robots.txt March 2026
Nottingham Expires 8 September 2026 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-nottingham-plan-b-latest
Updates:
9309 (if approved)
Published:
Intended Status:
Standards Track
Expires:
Author:
M. Nottingham

Application Directives in robots.txt

Abstract

This document defines a way for Web sites to express preferences about how their content is handled by specific applications in their robots.txt files.

About This Document

This note is to be removed before publishing as an RFC.

Status information for this document may be found at https://datatracker.ietf.org/doc/draft-nottingham-plan-b/.

information can be found at https://mnot.github.io/I-D/.

Source for this draft and an issue tracker can be found at https://github.com/mnot/I-D/labels/plan-b.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 8 September 2026.

Table of Contents

1. Introduction

Proprietary platforms built on the Internet often create choke points. As discussed in [CENTRALIZATION], the concentration of power thus formed is often difficult to mitigate using only technical mechanisms, but might be more effectively addressed through other means (e.g., legal regulation) with the assistance of technical accommodations. This document defines one such accommodation.

The Robots Exclusion Protocol [ROBOTS] allows Web site owners to "control how content served by their services may be accessed, if at all, by automatic clients known as crawlers." While this effectively directs cooperating crawlers' behavior when accessing a site, it does not address the use of the data obtained through crawling.

Experience has shown that while crawling a substantial portion of the Web does not tend to form a choke point, specific uses of crawled data can. In particular, Web search services can act in ways that are beneficial to the sites that they draw data from, directing traffic to them and thus promoting a healthy ecosystem, or they can be extractive, using crawled data to create resources without reference to the original sources.

This document defines a common mechanism for sites to express preferences for specific uses of their data by consuming services. Unlike [AIPREFS], it does not define a universal vocabulary; instead, it allows each consuming service to define its own bespoke controls, offering greater precision and avoiding definitional issues.

This mechanism is defined as a robots.txt extension. Its operation is separate from the control of crawling behaviour; it only controls the use of data once it is crawled.

For example, a site might wish to express that it does not want ExampleSearch to use its content with ExampleSearch's new "Widgets" feature. ExampleSearch has registered a "widgets" control, allowing the site to express this in its robots.txt file:

User-Agent: *
Allow: /
App-Directives: examplesearch;widgets=?0

In this manner, sites can provide specific directives to applications that use their data, and legal regulators that wish to direct the behaviour of choke point services can mandate that they define appropriate directives for sites to use.

1.1. Creating New Application Directives

To allow a site to express its content treatment preferences for specific applications, an identifier for the application needs to be chosen (in the above example, 'examplesearch') and the syntax and semantics of its directives need to be defined (in the example above, 'widgets=?0' to enable or disable the 'widgets' feature).

This specification creates IANA registries for application identifiers and directives to avoid collisions and facilitate easy discovery. Some, but not all, applications that consume data obtained from the Web are expected to register specific controls for their features (including but not limited to the entire application itself) in this registry.

However, this specification does not mandate registration. It is anticipated that legal authorities (especially competition regulators) could encourage or require certain applications to register appropriate directives for their features and then enforce their application.

This specification does not address what an appropriate directive might be, nor the process for determining it; it only provides a framework for their expression.

1.2. Interaction with AI Preferences

Application Directives compliment the vocabulary described in [AIPREFS]. While AI Preferences are generic and potentially applicable to any non-browser content consumer, Application Directives are tightly scoped to the application and semantics defined in the appropriate registry entry.

AI Preferences apply even to unknown content uses and consumers, while Application Directives only apply to the nominated application. Therefore, they are anticipated to be used together: AI Preferences to set general policy (especially for cases like AI training), and Application Directives to fine-tune the behavior of specific applications.

Because Application Directives are a more specific, targeted mechanism, they can be considered to override applicable AI preferences that are attached in the same robots.txt file, in the case of any conflict. However, such an override is only applicable within the defined scope of the given directive(s)' semantics.

1.3. Interaction with the User-Agent Line

Because the robots.txt format requires all extensions to be scoped to a User-Agent line, nonsensical configurations are possible. For example:

User-Agent: ExampleSearch/1.0
Allow: /
App-Directives: someothersearch;foo=bar

Here, directives for SomeOtherSearch are limited to content retrieved by the ExampleSearch crawler and are thus unlikely to be applied by SomeOtherSearch.

Therefore, it is RECOMMENDED that App-Directives extensions always occur in a group with "User-Agent: *", for broadest applicability.

1.4. Notational Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2. The App-Directives robots.txt Extension

This specification adds an App-Directive rule to the set of potential rules that can be included in a group for the robots.txt format.

The rule ABNF pattern from Section 2.2 of [ROBOTS] is extended as follows:

rule =/ app-directive

app-directive-rule = *WS "app-directive" *WS ":" *WS
                     [ path-pattern 1*WS ] app-directives *WS EOL
app-directives     = <List syntax, per Section 3.1 of FIELDS>

Each group contains zero or more App-Directive rules. Each App-Directive rule consists of a path and then Directives.

The path might be absent or empty; if a path present, a SP or HTAB separates it from the Directives.

The Directives use the syntax defined for Lists in Section 3.1 of [FIELDS]. Each member of the list is a Token (Section 3.3.4 of [FIELDS]) corresponding to a registered application identifier, per Section 3.1. Parameters on each member (Section 3.1.2 of [FIELDS]) correspond to directives for that application as registered in Section 3.2.

When multiple app-directive-rules with the same (character-for-character) path-pattern are present in a group, their app-directives are combined in the same manner as specified in Section 4.2 of [FIELDS]. As a result, this group:

User-Agent: *
Allow: /
App-Directives: examplesearch;widgets=?0
App-Directives: someothersearch;foo=bar

is equivalent to this group:

User-Agent: *
Allow: /
App-Directives: examplesearch;widgets=?0,someothersearch;foo=bar

2.1. Applying Application Directives

Application directives apply solely to the applications that they identify; their presence or absence does not communicate or imply anything about the behaviour of other applications, and likewise makes no statements about the behavior of crawlers.

When applying directives, an application MUST merge identical groups (per Section 2.2.1 of [ROBOTS] and choose the (possibly merged) group that matches its registered product token. If no group matches, the application MUST use the (possibly merged) group identified with "*".

When applying directives from a chosen group, an application MUST use those associated with the longest matching path-pattern, using the same path prefix matching rules as defined for Allow and Disallow. That is, the path prefix length is determined by counting the number of bytes in the encoded path.

Paths specified for App-Directive rules use the same percent-encoding rules as used for Allow/Disallow rules, as defined in Section 2.1 of [URI]. In particular, SP (U+20) and HTAB (U+09) characters need to be replaced with "%20" and "%09" respectively.

The ordering of rules in a group carries no semantics. Thus, app-directives rules can be interleaved with other rules (including Allow and Disallow) without any change in their meaning.

3. IANA Considerations

3.1. The Application Identifiers Registry

IANA should create a new registry, the Application Identifiers Registry.

The registry contains the following fields:

  • Application Identifier: a Token identifying the application; see Section 3.3.4 of [FIELDS]

  • Product Token: an identifier for the crawler associated with the application in robots.txt; see Section 2.2.1 of [ROBOTS]

  • Change Controller: Name and contact details (e.g., e-mail)

Registrations are made with Expert Review (Section 4.5 of [RFC8126]). The Expert(s) should assure that application identifiers are specific enough to identify the application and are not misleading as to the identity of the application or its controller.

3.2. The Application Directives Registry

IANA should create a new registry, the Application Directives Registry.

The registry contains the following fields:

  • Application Identifier: a value from the Application Identifiers Registry

  • Directive Name: an identifier for the directive; must be a Structured Fields key (see Section 4.1.1.3 of [FIELDS])

  • Directive Value Type: one of "Integer", "Decimal", "String", "Token", "Byte Sequence", "Boolean", "Date", or "Display String"; see Section 3.3 of [FIELDS]

  • Directive Description: A short description of the directive's semantics

  • Documentation URL: A URL to more complete documentation for the directive

  • Status: one of "active" or "deprecated"

Registrants in this registry MUST only register values for application identifiers that they control. The change controller for an entry in this registry is that of the corresponding application identifier.

New registrations are made with Expert Review (Section 4.5 of [RFC8126]). The Expert will assure that the change controller is correct (per above), that the value type is appropriate, and that the documentation URL is functioning.

4. Security Considerations

Like all uses of robots.txt, directives for applications are merely stated preferences; they have no technical enforcement mechanism. Likewise, because they are exposed to all clients of the Web site, they may expose information about the state of the application on the server, including sensitive paths.

5. References

5.1. Normative References

[FIELDS]
Nottingham, M. and P. Kamp, "Structured Field Values for HTTP", RFC 9651, DOI 10.17487/RFC9651, , <https://www.rfc-editor.org/rfc/rfc9651>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8126]
Cotton, M., Leiba, B., and T. Narten, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 8126, DOI 10.17487/RFC8126, , <https://www.rfc-editor.org/rfc/rfc8126>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.
[ROBOTS]
Koster, M., Illyes, G., Zeller, H., and L. Sassman, "Robots Exclusion Protocol", RFC 9309, DOI 10.17487/RFC9309, , <https://www.rfc-editor.org/rfc/rfc9309>.
[URI]
Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, DOI 10.17487/RFC3986, , <https://www.rfc-editor.org/rfc/rfc3986>.

5.2. Informative References

[AIPREFS]
Keller, P. and M. Thomson, "A Vocabulary For Expressing AI Usage Preferences", Work in Progress, Internet-Draft, draft-ietf-aipref-vocab-05, , <https://datatracker.ietf.org/doc/html/draft-ietf-aipref-vocab-05>.
[CENTRALIZATION]
Nottingham, M., "Centralization, Decentralization, and Internet Standards", RFC 9518, DOI 10.17487/RFC9518, , <https://www.rfc-editor.org/rfc/rfc9518>.

Author's Address

Mark Nottingham
Melbourne
Australia