The third-party cookie is dying, and Google is trying to create its replacement. No one should mourn the death of the cookie as we know it. For more than two decades, the third-party cookie has been the lynchpin in a shadowy, seedy, multi-billion dollar advertising-surveillance industry on the Web; phasing out tracking cookies and other persistent third-party identifiers is long overdue. However, as the foundations shift beneath the advertising industry, its biggest players are determined to land on their feet.
Google is leading the charge to replace the third-party cookie with a new suite of technologies to target ads on the Web. And some of its proposals show that it hasn’t learned the right lessons from the ongoing backlash to the surveillance business model. This post will focus on one of those proposals, Federated Learning of Cohorts (FLoC), which is perhaps the most ambitious—and potentially the most harmful.
FLoC is meant to be a new way to make your browser do the profiling that third-party trackers used to do themselves: in this case, boiling down your recent browsing activity into a behavioral label, and then sharing it with websites and advertisers. The technology will avoid the privacy risks of third-party cookies, but it will create new ones in the process. It may also exacerbate many of the worst non-privacy problems with behavioral ads, including discrimination and predatory targeting.
Google’s pitch to privacy advocates is that a world with FLoC (and other elements of the “privacy sandbox”) will be better than the world we have today, where data brokers and ad-tech giants track and profile with impunity. But that framing is based on a false premise that we have to choose between “old tracking” and “new tracking.” It’s not either-or. Instead of re-inventing the tracking wheel, we should imagine a better world without the myriad problems of targeted ads.
We stand at a fork in the road. Behind us is the era of the third-party cookie, perhaps the Web’s biggest mistake. Ahead of us are two possible futures.
In one, users get to decide what information to share with each site they choose to interact with. No one needs to worry that their past browsing will be held against them—or leveraged to manipulate them—when they next open a tab.
In the other, each user’s behavior follows them from site to site as a label, inscrutable at a glance but rich with meaning to those in the know. Their recent history, distilled into a few bits, is “democratized” and shared with dozens of nameless actors that take part in the service of each web page. Users begin every interaction with a confession: here’s what I’ve been up to this week, please treat me accordingly.
Users and advocates must reject FLoC and other misguided attempts to reinvent behavioral targeting. We implore Google to abandon FLoC and redirect its effort towards building a truly user-friendly Web.
What is FLoC?
In 2019, Google presented the Privacy Sandbox, its vision for the future of privacy on the Web. At the center of the project is a suite of cookieless protocols designed to satisfy the myriad use cases that third-party cookies currently provide to advertisers. Google took its proposals to the W3C, the standards-making body for the Web, where they have primarily been discussed in the Web Advertising Business Group, a body made up primarily of ad-tech vendors. In the intervening months, Google and other advertisers have proposed dozens of bird-themed technical standards: PIGIN, TURTLEDOVE, SPARROW, SWAN, SPURFOWL, PELICAN, PARROT… the list goes on. Seriously. Each of the “bird” proposals is designed to perform one of the functions in the targeted advertising ecosystem that is currently done by cookies.
FLoC is designed to help advertisers perform behavioral targeting without third-party cookies. A browser with FLoC enabled would collect information about its user’s browsing habits, then use that information to assign its user to a “cohort” or group. Users with similar browsing habits—for some definition of “similar”—would be grouped into the same cohort. Each user’s browser will share a cohort ID, indicating which group they belong to, with websites and advertisers. According to the proposal, at least a few thousand users should belong to each cohort (though that’s not a guarantee).
If that sounds dense, think of it this way: your FLoC ID will be like a succinct summary of your recent activity on the Web.
Google’s proof of concept used the domains of the sites that each user visited as the basis for grouping people together. It then used an algorithm called SimHash to create the groups. SimHash can be computed locally on each user’s machine, so there’s no need for a central server to collect behavioral data. However, a central administrator could have a role in enforcing privacy guarantees. In order to prevent any cohort from being too small (i.e. too identifying), Google proposes that a central actor could count the number of users assigned each cohort. If any are too small, they can be combined with other, similar cohorts until enough users are represented in each one.
For FLoC to be useful to advertisers, a user’s cohort will necessarily reveal information about their behavior.
According to the proposal, most of the specifics are still up in the air. The draft specification states that a user’s cohort ID will be available via Javascript, but it’s unclear whether there will be any restrictions on who can access it, or whether the ID will be shared in any other ways. FLoC could perform clustering based on URLs or page content instead of domains; it could also use a federated learning-based system (as the name FLoC implies) to generate the groups instead of SimHash. It’s also unclear exactly how many possible cohorts there will be. Google’s experiment used 8-bit cohort identifiers, meaning that there were only 256 possible cohorts. In practice that number could be much higher; the documentation suggests a 16-bit cohort ID comprising 4 hexadecimal characters. The more cohorts there are, the more specific they will be; longer cohort IDs will mean that advertisers learn more about each user’s interests and have an easier time fingerprinting them.
One thing that is specified is duration. FLoC cohorts will be re-calculated on a weekly basis, each time using data from the previous week’s browsing. This makes FLoC cohorts less useful as long-term identifiers, but it also makes them more potent measures of how users behave over time.
New privacy problems
FLoC is part of a suite intended to bring targeted ads into a privacy-preserving future. But the core design involves sharing new information with advertisers. Unsurprisingly, this also creates new privacy risks.
Fingerprinting
The first issue is fingerprinting. Browser fingerprinting is the practice of gathering many discrete pieces of information from a user’s browser to create a unique, stable identifier for that browser. EFF’s Cover Your Tracks project demonstrates how the process works: in a nutshell, the more ways your browser looks or acts different from others’, the easier it is to fingerprint.
Google has promised that the vast majority of FLoC cohorts will comprise thousands of users each, so a cohort ID alone shouldn’t distinguish you from a few thousand other people like you. However, that still gives fingerprinters a massive head start. If a tracker starts with your FLoC cohort, it only has to distinguish your browser from a few thousand others (rather than a few hundred million). In information theoretic terms, FLoC cohorts will contain several bits of entropy—up to 8 bits, in Google’s proof of concept trial. This information is even more potent given that it is unlikely to be correlated with other information that the browser exposes. This will make it much easier for trackers to put together a unique fingerprint for FLoC users.
Google has acknowledged this as a challenge, but has pledged to solve it as part of the broader “Privacy Budget” plan it has to deal with fingerprinting long-term. Solving fingerprinting is an admirable goal, and its proposal is a promising avenue to pursue. But according to the FAQ, that plan is “an early stage proposal and does not yet have a browser implementation.” Meanwhile, Google is set to begin testing FLoC as early as this month.
Fingerprinting is notoriously difficult to stop. Browsers like Safari and Tor have engaged in years-long wars of attrition against trackers, sacrificing large swaths of their own feature sets in order to reduce fingerprinting attack surfaces. Fingerprinting mitigation generally involves trimming away or restricting unnecessary sources of entropy—which is what FLoC is. Google should not create new fingerprinting risks until it’s figured out how to deal with existing ones.
Cross-context exposure
The second problem is…