Parties and browsers

Living Document,


tl;dr

Don’t use the terms "first party" and "third party" in specs. Instead, use precisely-defined terms that map directly to browsers' actual understanding of security and privacy boundaries.

Introduction

When talking about web features with security or privacy impact, folks often talk about "first parties" and "third parties". Everyone sort of knows what we mean when we use these terms, but it turns out that we often mean different things, and what we each think these terms mean usually doesn’t map cleanly onto the technical mechanisms browsers actually use to distinguish different actors for security or privacy purposes. Given this, editors should avoid using the terms "first party" and "third party". Instead, they should use site, origin, and other precisely-defined terms that map directly to the relevant web security or privacy boundary they’re working with (see § 3 Proposal).

Note: I wrote this post as background/input into the Improve definition of parties and trust relationships across W3C breakout session at TPAC 2020.

1. First, second, and third parties

1.1. In colloquial usage

When you’re looking at a web page, your browser tells you which site you’re on via its location bar, a text field at the top of the browser window where you can type in a URL to visit or something to search for. Consider the case of someone who’s navigated to example.com:

┌──────────────────────────────────────────────────────────────────────────────┐
│┌───┐┌───┐         ┌──────────────────────────────────────┐                  ×│
││ ← ││ → │         │             example.com              │                   │
│└───┘└───┘         └──────────────────────────────────────┘                   │
├──────────────────────────────────────────────────────────────────────────────┤
│  _       __     __                             __                            │
│ | |     / /__  / /________  ____ ___  ___     / /_____     ____ ___  __  __  │
│ | | /| / / _ \/ / ___/ __ \/ __ `__ \/ _ \   / __/ __ \   / __ `__ \/ / / /  │
│ | |/ |/ /  __/ / /__/ /_/ / / / / / /  __/  / /_/ /_/ /  / / / / / / /_/ /   │
│ |__/|__/\___/_/\___/\____/_/ /_/ /_/\___/   \__/\____/  /_/ /_/ /_/\__, /    │
│                                                                   /____/     │
│                                                      __                      │
│                     ___  _  ______ _____ ___  ____  / /__                    │
│                    / _ \| |/_/ __ `/ __ `__ \/ __ \/ / _ \                   │
│                   /  __/>  </ /_/ / / / / / / /_/ / /  __/                   │
│                   \___/_/|_|\__,_/_/ /_/ /_/ .___/_/\___/                    │
│                                           /_/                                │
│                                   __         _ __       __                   │
│                    _      _____  / /_  _____(_) /____  / /                   │
│                   | | /| / / _ \/ __ \/ ___/ / __/ _ \/ /                    │
│                   | |/ |/ /  __/ /_/ (__  ) / /_/  __/_/                     │
│                   |__/|__/\___/_.___/____/_/\__/\___(_)                      │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘
A browser window. The location bar contains the address of the website, example.com, marked like this. The body of the page says "Welcome to my example website!"

Which party is which in this example?

1.2. In browsers and their tracking policies

This is more or less, but not quite, how these terms get used today in browser tracking policies. Here’s Mozilla’s Anti Tracking Policy's definition of "first party":

A first party is a resource or a set of resources on the web operated by the same organization, which is both easily discoverable by the user and with which the user intends to interact. An intention to interact is characterized by a deliberate action, such as clicking a link, submitting a form, or reloading a page. Merely hovering over, muting, pausing, or closing a given piece of content does not constitute an intention to interact. Interactions with other parties are considered third-party, even if the user is transiently informed in context (for example, in the form of a redirect).

The WebKit Tracking Prevention Policy definition was based on Mozilla’s, but is a bit different:

A first party is a website that a user is intentionally and knowingly visiting, as displayed by the URL field of the browser, and the set of resources on the web operated by the same organization. In practice, we consider resources to belong to the same party if they are part of the same registrable domain: a public suffix plus one additional label. Example: site.example, www.site.example, and s.u.b.site.example are all the same party since site.example is their shared registrable domain.

They both define third party as any party that does not fall within the definition of first party above.

1.3. In laws & regulations

N.B.: IANAL! Take this section with enough salt to worry your doctor.

The terms "first party", "second party", and "third party" arose centuries ago in contract law, and are used in modern privacy laws & regulations like Europe’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA).

1.3.1. Contract law

A contract is an agreement that binds two parties, the "first party" and the "second party". Now, the "first" party isn’t more important than the "second"—they are equal parties to the contract, and these terms are themselves interchangeable. If you and I enter into a contract with each other, I probably think of myself as the first and you as the second, whereas for you, you’re the first and I’m the second.

A party not bound by the contract is a "third party" to that contract.

Note: While the terms "first party" and "third party" originate in contract law, people aren’t invoking law when they use the terms in the context of the web. They’re just using ordinary English terms that happen to have an etymology that traces back to law.

1.3.2. GDPR

GDPR defines "third party" in Article 4 :

‘third party’ means a natural or legal person, public authority, agency or body other than the data subject, controller, processor and persons who, under the direct authority of the controller or processor, are authorised to process personal data;

1.3.3. CCPA

CCPA defines "third party" in §1798.140(w):

  1. “Third party” means a person who is not any of the following:
    1. The business that collects personal information from consumers under this title.
    2. A person to whom the business discloses a consumer’s personal information for a business purpose pursuant to a written contract, provided that the contract:
      1. Prohibits the person receiving the personal information from:
        1. Selling the personal information.
        2. Retaining, using, or disclosing the personal information for any purpose other than for the specific purpose of performing the services specified in the contract, including retaining, using, or disclosing the personal information for a commercial purpose other than providing the services specified in the contract.
        3. Retaining, using, or disclosing the information outside of the direct business relationship between the person and the business.
      2. Includes a certification made by the person receiving the personal information that the person understands the restrictions in subparagraph (A) and will comply with them.

1.4. Deficiencies of these definitions for spec work

The legal definitions of party-as-person only rarely correspond to the browser policies' notion of party-as-resource/website, which themselves don’t actually line up with site or origin as used in specs and implementations. To use terms that have such different meanings in different contexts risks constant confusion as people from one context make assumptions about the use of these terms in another context.

Another reason to avoid using terms from policy land in specs is that laws, policies, and regulations are always changing and evolving, and basic concepts of web architecture shouldn’t be dependent on which legal regime you happen to find yourself under in a certain time and place.

2. Sites and Origins

The concepts that browsers' security and privacy features are actually built on are sites and origins. (If you’re familiar with these terms, go ahead and skip to § 3 Proposal.)

2.1. What is a site?

A site is either an opaque origin or a (scheme, host) tuple. The host component is a registrable domain, which is sometimes (though somewhat incorrectly) called an eTLD+1. This is a "domain name," the part of a hostname that a domain registrar has allocated to somebody.

These URLs are serializations of sites:

These URLs are not serializations of sites

2.2. What is an origin?

An origin is either an opaque origin or a (scheme, host, port, domain) tuple. https://example.com:1701 and https://another.example.com are both (serializations of) origins.

2.3. How 'site' and 'origin' get used today

Consider these URLs:

The first two URLs are same origin with one another—if you obtain an origin from each of them, you get the same origin: https://an.example.com.

The second and third URLs are same site with one another—if you obtain a site from each of them, you get the same site: https://example.com. But they are not same origin with one another, because the host components of their origins differ.

The third and fourth URLs are neither same origin nor same site with one another.

Depending on the security or privacy boundary being enforced, there are a number of things browsers might check:

In general, security boundaries are defined in terms of origins, not sites. For instance, the same-origin policy is the basic security policy of the web.

Privacy boundaries are typically defined in terms of sites, which is unfortunate, because the concept of site depends on the Public Suffix List, and the Public Suffix List is known to have a number of problems. But we’re probably stuck with this—for legacy reasons, whether or not a cookie is exposed to an origin depends on a schemelessly same site check, and tracking is primarily conducted with cookies.

3. Proposal

The terms "first party" and "third party" are ambiguous and carry a lot of baggage from their usage in policy, law, and regulation, so they should not be used in specs. Given the sorts of checks browsers actually need to perform, here are a number of precise alternatives (developed in privacycg/storage-partitioning#16) that should be used instead:

An environment is same site when its top-level origin is null, or when its origin is same site with its top-level origin. An environment is cross site when it is not same site.

Instead of defining the terms first-party-site context and third-party-site context, the Storage Access API should talk about same site and cross site contexts.

An environment is same origin when its top-level origin is null, or when its origin is same origin with its top-level origin. An environment is cross origin when it is not same origin.

An environment is strictly same site when it is same site and, if it has a parent, its parent is strictly same site. An environment is strictly cross site when it is not strictly same site.

Instead of saying "third party cookie", instead say "strictly cross site cookie."

An environment is strictly same origin when it is same origin and, if it has a parent, its parent is strictly same origin. An environment is strictly cross origin when it is not strictly same origin.

Do we actually need to define strictly same origin and strictly cross origin?

Acknowledgments

Many thanks to Alice Boxhall, Anne van Kesteren, Dan Appelquist, David Singer, Eryn Wells, Jen Simmons, John Wilander, Maciej Stachowiak, and Wendy Seltzer for their helpful comments on earlier drafts.