Network Working Group M. Nottingham Internet-Draft 26 October 2025 Updates: 9309 (if approved) Intended status: Standards Track Expires: 29 April 2026 Application Directives in robots.txt draft-nottingham-plan-b-latest Abstract This document defines a way for sites to express preferences about how their content is handled by specific applications in their robots.txt files. About This Document This note is to be removed before publishing as an RFC. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-nottingham-plan-b/. information can be found at https://mnot.github.io/I-D/. Source for this draft and an issue tracker can be found at https://github.com/mnot/I-D/labels/plan-b. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 29 April 2026. Copyright Notice Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction 1.1. Creating New Application Directives 1.2. Interaction with AI Preferences 1.3. Interaction with the User-Agent Line 1.4. Notational Conventions 2. The App-Directives robots.txt Extension 2.1. Applying Application Directives 3. IANA Considerations 3.1. The Application Identifiers Registry 3.2. The Application Directives Registry 4. Security Considerations 5. References 5.1. Normative References 5.2. Informative References Author's Address 1. Introduction The Robots Exclusion Protocol [ROBOTS] allows Web site owners to "control how content served by their services may be accessed, if at all, by automatic clients known as crawlers." While this provides an effective way to direct cooperating crawlers' behaviour when accessing a site, it does not consider what happens afterwards: in particular, what is done with the data that is obtained through crawling. This has created tensions, especially when crawlers have more than one purpose, or when a purpose changes (for example, a search engine changes its user interface in a way that's undesirable to the site). [I-D.ietf-aipref-vocab] defines a universal vocabulary that describes how content should be handled by AI crawlers, and [I-D.ietf-aipref-attach] describes how that vocabulary should be attached to content in robots.txt and through other means. This allows sites to specify how their data should be handled in a manner that's separate to the question of how crawlers show behave when the access the site. However, it has become apparent that defining such a universal vocabulary necessitates imprecision, so as to be broadly applicable across both different implementations as well as over time. As a result, sites may not have obvious ways state their preferences regarding specific behaviours. To address this shortcoming, this document defines a complementary mechanism: a robots.txt extension that allows sites to express preferences about how specific applications should behave in certain circumstances. For example, a site might wish to express that it does not want ExampleSearch to use its content with ExampleSearch's new "Widgets" feature. ExampleSearch has registered a "widgets" control, so that the site can express this in its robots.txt file: User-Agent: * Allow: / App-Directives: examplesearch;widgets=?0 In this manner, sites can provide specific directives to applications that wish to use their data. 1.1. Creating New Application Directives To allow a site to express its preferences about how specific applications are to treat their content, an identifier for the application needs to be chosen (in the above example, 'examplesearch') and the syntax and semantics of its directives need to be defined (in the example above, 'widgets=?0' to enable or disable the 'widgets' feature). This specification creates an IANA registries for application identifiers and directives to facilitate easy discovery of these artefacts. It is expected that applications that consume data obtained by crawling the Web will register specific controls for their features (including but not limited to the entire application itself) in this registry. However, this specification does not mandate registration. It is expected that non-technical regulation (e.g., competition regulation) might play some role in encouraging or even requiring certain applications to register appropriate controls for their features. 1.2. Interaction with AI Preferences Application Directives are complimentary to the vocabulary described in [I-D.ietf-aipref-vocab]. Whereas the AI Preferences vocabulary are generic and potentially applicable to any application consuming a given piece of content, Application Directives are tightly scoped to the application and semantics defined in the appropriate registry entry. In particular, AI Preferences are applicable even to unknown uses and consumers of content, whereas Application Directives do not apply to any application except the one nominated. Because of this, it is anticipated that they will often be used together: AI Preferences to set general policy about how content is treated, and Application Directives to fine-tune the behavior of specific applications. Because Application Directives are a more specific, targeted mechanism, they can be considered to override applicable AI preferences that are attached in the same robots.txt file, in the case of any conflict. Such override is only applicable, however, within the defined scope of the semantics of the give directive(s). 1.3. Interaction with the User-Agent Line Because the robots.txt format requires that all extensions be scoped to a User-Agent line, it is possible for nonsensical things to be expressed. For example: User-Agent: ExampleSearch/1.0 Allow: / App-Directives: someothersearch;foo=bar Here, directives for SomeOtherSearch are limited to content retrieved by the ExampleSearch crawler, and so are unlikely to be applied by SomeOtherSearch. Therefore, it is RECOMMENDED that App-Directives extensions always occur in a group with "User-Agent: *", so that they are most broadly applicable. 1.4. Notational Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 2. The App-Directives robots.txt Extension This specification adds an App-Directive rule to the set of potential rules that can be included in a group for the robots.txt format. The rule ABNF pattern from Section 2.2 of [ROBOTS] is extended as follows: rule =/ app-directive app-directive-rule = *WS "app-directive" *WS ":" *WS [ path-pattern 1*WS ] app-directives *WS EOL app-directives = Each group contains zero or more App-Directive rules. Each App- Directive rule consists of a path and then Directives. The path might be absent or empty; if a path present, a SP or HTAB separates it from the Directives. The Directives use the syntax defined for Lists in Section 3.1 of [FIELDS]. Each member of the list is a Token (Section 3.3.4 of [FIELDS]) corresponding to a registered application identifier, per Section 3.1. Parameters on each member (Section 3.1.2 of [FIELDS]) correspond to directives for that application as registered in Section 3.2. When multiple app-directive-rules with the same (character-for- character) path-pattern are present in a group, their app-directives are combined in the same manner as specified in Section 4.2 of [FIELDS]. As a result, this group: User-Agent: * Allow: / App-Directives: examplesearch;widgets=?0 App-Directives: someothersearch;foo=bar is equivalent to this group: User-Agent: * Allow: / App-Directives: examplesearch;widgets=?0,someothersearch;foo=bar 2.1. Applying Application Directives Application directives apply solely to the applications that they identify; their presence or absence does not communicate or imply anything about the behaviour of other applications, and likewise makes no statements about the behavior of crawlers. When applying directives from a robots.txt file, an application MUST merge identical groups (per Section 2.2.1 of [ROBOTS] and choose either the (possibly merged) group that matches its registered product token. If there is no matching group, the application MUST use the (possibly merged) group identified with "*". When applying directives from a chosen group, an application MUST use those associated with the longest matching path-pattern, using the same path prefix matching rules as defined for Allow and Disallow. That is, the path prefix length is determined by counting the number of bytes in the encoded path. Paths specified for App-Directive rules use the same percent-encoding rules as used for Allow/Disallow rules, as defined in Section 2.1 of [URI]. In particular, SP (U+20) and HTAB (U+09) characters need to be replaced with "%20" and "%09" respectively. The ordering of rules in a group carries no semantics. Thus, app- directives rules can be interleaved with other rules (including Allow and Disallow) without any change in their meaning. 3. IANA Considerations 3.1. The Application Identifiers Registry IANA should create a new registry, the Application Identifiers Registry. The registry contains the following fields: * Application Identifier: _a Token identifying the application; see Section 3.3.4 of [FIELDS]_ * Product Token: _an identifier for the crawler associated with the application in robots.txt; see Section 2.2.1 of [ROBOTS]_ * Change Controller: _Name and contact details (e.g., e-mail)_ Registrations are made with Expert Review (Section 4.5 of [RFC8126]). The Expert(s) should assure that application identifiers are specific enough to identify the application and are not misleading as to the identity of the application or its controller. 3.2. The Application Directives Registry IANA should create a new registry, the Application Directives Registry. The registry contains the following fields: * Application Identifier: _a value from the Application Identifiers Registry_ * Directive Name: _an identifier for the directive; must be a Structured Fields key (see Section 4.1.1.3 of [FIELDS])_ * Directive Value Type: _one of "Integer", "Decimal", "String", "Token", "Byte Sequence", "Boolean", "Date", or "Display String"; see Section 3.3 of [FIELDS]_ * Directive Description: _A short description of the directive's semantics_ * Documentation URL: _A URL to more complete documentation for the directive_ * Status: _one of "active" or "deprecated"_ Registrants in this registry MUST only register values for application identifiers that they control. The change controller for an entry in this registry is that of the corresponding application identifier. New registrations are made with Expert Review (Section 4.5 of [RFC8126]). The Expert will assure that the change controller is correct (per above), that the value type is appropriate, and that the documentation URL is functioning. 4. Security Considerations Like all uses of robots.txt, directives for applications are merely stated preferences; they have no technical enforcement mechanism. Likewise, because they are exposed to all clients of the Web site, they may expose information about the state of the application on the server, including sensitive paths. 5. References 5.1. Normative References [FIELDS] Nottingham, M. and P. Kamp, "Structured Field Values for HTTP", RFC 9651, DOI 10.17487/RFC9651, September 2024, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 8126, DOI 10.17487/RFC8126, June 2017, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [ROBOTS] Koster, M., Illyes, G., Zeller, H., and L. Sassman, "Robots Exclusion Protocol", RFC 9309, DOI 10.17487/RFC9309, September 2022, . [URI] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, DOI 10.17487/RFC3986, January 2005, . 5.2. Informative References [I-D.ietf-aipref-attach] Illyes, G. and M. Thomson, "Associating AI Usage Preferences with Content in HTTP", Work in Progress, Internet-Draft, draft-ietf-aipref-attach-03, 4 September 2025, . [I-D.ietf-aipref-vocab] Keller, P. and M. Thomson, "A Vocabulary For Expressing AI Usage Preferences", Work in Progress, Internet-Draft, draft-ietf-aipref-vocab-03, 4 September 2025, . Author's Address Mark Nottingham Melbourne Australia Email: mnot@mnot.net URI: https://www.mnot.net/