Application Directives in robots.txt

Internet-Draft	Application Directives in robots.txt	December 2025
Nottingham	Expires 9 June 2026	[Page]

Abstract

This document defines a way for sites to express preferences about how their content is handled by specific applications in their robots.txt files.¶

The Robots Exclusion Protocol [ROBOTS] allows Web site owners to "control how content served by their services may be accessed, if at all, by automatic clients known as crawlers." While this provides an effective way to direct cooperating crawlers' behaviour when accessing a site, it does not consider what happens afterwards: in particular, what is done with the data that is obtained through crawling. This has created tensions, especially when crawlers have more than one purpose, or when a purpose changes (for example, a search engine changes its user interface in a way that's undesirable to the site).¶

[I-D.ietf-aipref-vocab] defines a universal vocabulary that describes how content should be handled by AI crawlers, and [I-D.ietf-aipref-attach] describes how that vocabulary should be attached to content in robots.txt and through other means. This allows sites to specify how their data should be handled in a manner that's separate to the question of how crawlers show behave when the access the site.¶

However, it has become apparent that defining such a universal vocabulary necessitates imprecision, so as to be broadly applicable across both different implementations as well as over time. As a result, sites may not have obvious ways state their preferences regarding specific behaviours.¶

To address this shortcoming, this document defines a complementary mechanism: a robots.txt extension that allows sites to express preferences about how specific applications should behave in certain circumstances.¶

For example, a site might wish to express that it does not want ExampleSearch to use its content with ExampleSearch's new "Widgets" feature. ExampleSearch has registered a "widgets" control, so that the site can express this in its robots.txt file:¶

User-Agent: *
Allow: /
App-Directives: examplesearch;widgets=?0

In this manner, sites can provide specific directives to applications that wish to use their data.¶

1.1. Creating New Application Directives

To allow a site to express its preferences about how specific applications are to treat their content, an identifier for the application needs to be chosen (in the above example, 'examplesearch') and the syntax and semantics of its directives need to be defined (in the example above, 'widgets=?0' to enable or disable the 'widgets' feature).¶

This specification creates an IANA registries for application identifiers and directives to facilitate easy discovery of these artefacts. It is expected that applications that consume data obtained by crawling the Web will register specific controls for their features (including but not limited to the entire application itself) in this registry.¶

However, this specification does not mandate registration. It is expected that non-technical regulation (e.g., competition regulation) might play some role in encouraging or even requiring certain applications to register appropriate controls for their features.¶

1.2. Interaction with AI Preferences

Application Directives are complimentary to the vocabulary described in [I-D.ietf-aipref-vocab]. Whereas the AI Preferences vocabulary are generic and potentially applicable to any application consuming a given piece of content, Application Directives are tightly scoped to the application and semantics defined in the appropriate registry entry.¶

In particular, AI Preferences are applicable even to unknown uses and consumers of content, whereas Application Directives do not apply to any application except the one nominated. Because of this, it is anticipated that they will often be used together: AI Preferences to set general policy about how content is treated, and Application Directives to fine-tune the behavior of specific applications.¶

Because Application Directives are a more specific, targeted mechanism, they can be considered to override applicable AI preferences that are attached in the same robots.txt file, in the case of any conflict. Such override is only applicable, however, within the defined scope of the semantics of the give directive(s).¶

1.3. Interaction with the User-Agent Line

Because the robots.txt format requires that all extensions be scoped to a User-Agent line, it is possible for nonsensical things to be expressed. For example:¶

User-Agent: ExampleSearch/1.0
Allow: /
App-Directives: someothersearch;foo=bar

Here, directives for SomeOtherSearch are limited to content retrieved by the ExampleSearch crawler, and so are unlikely to be applied by SomeOtherSearch.¶

Therefore, it is RECOMMENDED that App-Directives extensions always occur in a group with "User-Agent: *", so that they are most broadly applicable.¶

1.4. Notational Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶

2. The App-Directives robots.txt Extension

This specification adds an App-Directive rule to the set of potential rules that can be included in a group for the robots.txt format.¶

The rule ABNF pattern from Section 2.2 of [ROBOTS] is extended as follows:¶

rule =/ app-directive

app-directive-rule = *WS "app-directive" *WS ":" *WS
                     [ path-pattern 1*WS ] app-directives *WS EOL
app-directives     = <List syntax, per Section 3.1 of FIELDS>

Each group contains zero or more App-Directive rules. Each App-Directive rule consists of a path and then Directives.¶

The path might be absent or empty; if a path present, a SP or HTAB separates it from the Directives.¶

The Directives use the syntax defined for Lists in Section 3.1 of [FIELDS]. Each member of the list is a Token (Section 3.3.4 of [FIELDS]) corresponding to a registered application identifier, per Section 3.1. Parameters on each member (Section 3.1.2 of [FIELDS]) correspond to directives for that application as registered in Section 3.2.¶

When multiple app-directive-rules with the same (character-for-character) path-pattern are present in a group, their app-directives are combined in the same manner as specified in Section 4.2 of [FIELDS]. As a result, this group:¶

User-Agent: *
Allow: /
App-Directives: examplesearch;widgets=?0
App-Directives: someothersearch;foo=bar

is equivalent to this group:¶

User-Agent: *
Allow: /
App-Directives: examplesearch;widgets=?0,someothersearch;foo=bar

2.1. Applying Application Directives

Application directives apply solely to the applications that they identify; their presence or absence does not communicate or imply anything about the behaviour of other applications, and likewise makes no statements about the behavior of crawlers.¶

When applying directives from a robots.txt file, an application MUST merge identical groups (per Section 2.2.1 of [ROBOTS] and choose either the (possibly merged) group that matches its registered product token. If there is no matching group, the application MUST use the (possibly merged) group identified with "*".¶

When applying directives from a chosen group, an application MUST use those associated with the longest matching path-pattern, using the same path prefix matching rules as defined for Allow and Disallow. That is, the path prefix length is determined by counting the number of bytes in the encoded path.¶

Paths specified for App-Directive rules use the same percent-encoding rules as used for Allow/Disallow rules, as defined in Section 2.1 of [URI]. In particular, SP (U+20) and HTAB (U+09) characters need to be replaced with "%20" and "%09" respectively.¶

The ordering of rules in a group carries no semantics. Thus, app-directives rules can be interleaved with other rules (including Allow and Disallow) without any change in their meaning.¶

3. IANA Considerations

3.1. The Application Identifiers Registry

IANA should create a new registry, the Application Identifiers Registry.¶

The registry contains the following fields:¶

Application Identifier: a Token identifying the application; see Section 3.3.4 of [FIELDS]¶
Product Token: an identifier for the crawler associated with the application in robots.txt; see Section 2.2.1 of [ROBOTS]¶
Change Controller: Name and contact details (e.g., e-mail)¶

Registrations are made with Expert Review (Section 4.5 of [RFC8126]). The Expert(s) should assure that application identifiers are specific enough to identify the application and are not misleading as to the identity of the application or its controller.¶

3.2. The Application Directives Registry

IANA should create a new registry, the Application Directives Registry.¶

The registry contains the following fields:¶

Application Identifier: a value from the Application Identifiers Registry¶
Directive Name: an identifier for the directive; must be a Structured Fields key (see Section 4.1.1.3 of [FIELDS])¶
Directive Value Type: one of "Integer", "Decimal", "String", "Token", "Byte Sequence", "Boolean", "Date", or "Display String"; see Section 3.3 of [FIELDS]¶
Directive Description: A short description of the directive's semantics¶
Documentation URL: A URL to more complete documentation for the directive¶
Status: one of "active" or "deprecated"¶

Registrants in this registry MUST only register values for application identifiers that they control. The change controller for an entry in this registry is that of the corresponding application identifier.¶

New registrations are made with Expert Review (Section 4.5 of [RFC8126]). The Expert will assure that the change controller is correct (per above), that the value type is appropriate, and that the documentation URL is functioning.¶

5. References

5.1. Normative References

[FIELDS]: Nottingham, M. and P. Kamp, "Structured Field Values for HTTP", RFC 9651, DOI 10.17487/RFC9651, September 2024, <https://www.rfc-editor.org/rfc/rfc9651>.
[RFC2119]: Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8126]: Cotton, M., Leiba, B., and T. Narten, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 8126, DOI 10.17487/RFC8126, June 2017, <https://www.rfc-editor.org/rfc/rfc8126>.
[RFC8174]: Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.
[ROBOTS]: Koster, M., Illyes, G., Zeller, H., and L. Sassman, "Robots Exclusion Protocol", RFC 9309, DOI 10.17487/RFC9309, September 2022, <https://www.rfc-editor.org/rfc/rfc9309>.
[URI]: Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, DOI 10.17487/RFC3986, January 2005, <https://www.rfc-editor.org/rfc/rfc3986>.

5.2. Informative References

[I-D.ietf-aipref-attach]: Illyes, G. and M. Thomson, "Associating AI Usage Preferences with Content in HTTP", Work in Progress, Internet-Draft, draft-ietf-aipref-attach-04, 28 October 2025, <https://datatracker.ietf.org/doc/html/draft-ietf-aipref-attach-04>.
[I-D.ietf-aipref-vocab]: Keller, P. and M. Thomson, "A Vocabulary For Expressing AI Usage Preferences", Work in Progress, Internet-Draft, draft-ietf-aipref-vocab-05, 1 December 2025, <https://datatracker.ietf.org/doc/html/draft-ietf-aipref-vocab-05>.

Application Directives in robots.txt

Abstract

About This Document

Status of This Memo

Copyright Notice

Table of Contents

1. Introduction