Monthly Archives: March 2013

We don’t need no stinkin’ 3rd party cookies!

By Eric Picard (Originally published on AdExchanger.com)

I’ve been writing about some of the ethical issues with “opt-out” third-party tracking for a long time. It’s a practice that makes me extremely uncomfortable, which is not where I started out. You can read my opus on this topic here.

In this article, I want to go into detail about why third-party cookies aren’t needed by the ecosystem, and why doing away with them as a default setting is both acceptable and not nearly as harmful as many are claiming.

 

First order of business: What is a third-party cookie?

When a user visits a web page, they load a variety of content. Some of this content comes from the domain they’re visiting. (For simplicity sake, let’s call it Publisher.com.) Some comes from third parties that are loading this content onto Publisher.com’s web site. (let’s call it ContentPartner.com.) An example would be that you could visit a site about cooking, and the Food Network could provide some pictures or recipes that the publisher embeds into the page. Those pictures and recipes sit on servers controlled by the content partner and point to that partner’s domain.

When content providers deliver content to a browser, they have the opportunity to set a cookie. When you’re visiting Publisher.com’s page, it can set a first-party cookie because you’re visiting its web domain. In our example above, ContentPartner.com is also delivering content to your browser from within Publisher.com’s page, so the kind of cookie it can deliver is a third-party cookie. There are many legitimate reasons why both parties would drop a cookie on your browser.

If this ended there, we probably wouldn’t have a problem. But this construct – allowing content from multiple domains to be mapped together into one web page, which was really a matter of convenience when the web first was created – is the same mechanism the ad industry uses to drop tracking pixels and ads onto publishers’ web pages.

For example, you might visit Publisher.com and see an ad delivered by AdServer.com. And on every page of that site, you might load tracking pixels delivered by TrackingVendor1.com, TrackingVendor2.com, etc. In this case, only Publisher.com can set a first-party cookie. All the other vendors are setting third-party cookies.

There are many uses for third-party cookies that most people would have no issue with, but some uses of third-party cookies have privacy advocates up in arms. I’ll wave an ugly stick at this issue and just summarize it by saying: Companies that have no direct relationship with the user are tracking that user’s behavior across the entire web, creating profiles on him or her, and profiting off of that user’s behavior without his or her permission.

This column isn’t about whether that issue is ethical or acceptable, because allowing third-party cookies to be active by default is done at the whim of the browser companies. I’ve predicted for about five years that the trend would head toward all browsers blocking them by default. So far Safari (Apple’s browser) doesn’t allow third-party cookies by default, and Mozilla’s Firefox has announced it will block them by default in the next version of Firefox.

Why I don’t think blocking third-party cookies is a problem

There are many scenarios where publishers legitimately need to deliver content from multiple domains. Sometimes several publishers are owned by one company, and they share central resources across those publishers, such as web analytics, ad serving, and content distribution networks (like Akamai). It has been standard practice in many of these cases for publishers to map their vendors against their domain, which by the way allows them to set first-party cookies as well.

How do they accomplish this? They set a ‘subdomain’ that is mapped to the third party’s servers. Here’s an example:

Publisher.com wants to use a web analytics provider but set cookies from its own domain. It creates a subdomain called WebAnalytics.Publisher.com using its Domain Name Server, or DNS. (I won’t get too technical, but DNS is the way that the Internet maps IP addresses – the numeric identifier of servers – to domain names.) It’s honestly as simple as one of the publisher’s IT people opening up a web page that manages their DNS, creating a subdomain name, and mapping it to a specific IP address. And that’s it.

This allows the third-party vendor to place first-party cookies onto the browser of the user visiting Publisher.com. This is a standard practice that is broadly used across the industry, and it’s critically important to the way that the Internet functions. There are many reasons vendors use subdomains, not just to set first-party cookies. For instance, this is standard practice in the web analytics space (except for Google Analytics) and for content delivery networks (CDNs).

So why doesn’t everybody just map subdomains and set first-party cookies?

First, let me say that while it is fairly easy to map a subdomain for the publisher’s IT department, it would be impractical for a demand-side platform (DSP) or other buy-side vendor to go out and have every existing publisher map a subdomain for them. For those focused on first-party data on the advertiser side, they’ll still have access to that data in this world. But for broader data sets, they’ll be picking up their targeting data via the exchange as pushed through by the publisher on the impression itself. For the data management platforms (DMPs), given that this is their primary business, it is a reasonable thing for them to map subdomains for each publisher and advertiser that they work with.

Also, the thing that vendors like about third-party cookies is that by default they work across domains. That means that data companies could set pixels on every publisher’s web site they could convince to place their pixels, and then automatically they would track one cookie across every site they visited. Switching to first-party cookies breaks that broad set of actions across multiple publishers into pockets of activity at the individual publisher level. There is no cheap, convenient way to map one user’s activity across multiple publishers. And only those companies that have a vested interest – the DMPs – will make that investment, and it will limit the number of small vendors who can’t make that investment from participating.

But, is that so bad?

So does moving to first-party cookies break the online advertising industry?

Nope. But it does complicate things. Let me tell you about a broadly used practice in our industry – one that every single data company uses on a regular basis. It’s a practice that gets very little attention today but is pretty ubiquitous. It’s called cookie mapping.

Here’s how it works: Let’s say one vendor has its unique anonymous cookies tracking user behavior and creating big profiles of activity, and it wants to use that data on a different vendor’s servers. In order for this to work, the two vendors need to map together their profiles, finding unique users (anonymously) who are the same user across multiple databases. How this is done is extremely technical, and I’m not going to mangle it by trying to simplify the process. But at a very high level, it’s something like this:

Media Buyer wants to use targeting data on an exchange using a DSP. The DSP enables the buyer to access data from multiple data vendors. The DSP has its own cookies that it sets (today these are third-party cookies) on users when it runs ads. The DSP and the data vendor work with a specialist vendor to map together the DSP’s cookies and the data vendor’s cookies. These cooking mapping specialists (Experian and Acxiom are examples, but others provide this service as well) use a complex set of mechanisms to map together overlapping cookies between the two vendors. They also have privacy auditing processes in place to ensure that this is done in an ethical and safe way to ensure that none of the vendors gets access to personally identifiable data.

Note that this same process is used between advertisers and publishers and their own DMPs so that first-party data from CRM and user registration databases can be mapped to behavioral and other kinds of data.

The trend for data companies in the last few years has been to move into DMP mode, working directly with the advertisers and publishers rather than trying to survive as third-party data providers. This move was intelligent – almost prescient of the change that is happening in the browser space right now.

My short take on this evolution

I feel that this change is both inevitable and positive. It puts more power back in the hands of publishers; it solidifies their value proposition as having a direct relationship with the consumer, and will drive a lot more investment in data management platforms and other big data by publishers. The last few years have seen a data asymmetry problem arise where the buyers had more data available to them than the publishers, and the publishers had no insight into the value of their own audience. They didn’t understand why the buyer was buying their audience. This will fall back into equilibrium in this new world.

Long tail publishers will need to rely on their existing vendors to ensure they can easily map a first-party cookie to a data pool – these solutions need to be baked by companies who cater to long tail publishers, such as ad networks. The networks will need to work with their own DMP and data solutions to ensure that they’re mapping domains together on behalf of their long tail publishers and pushing that targeting data with the inventory into the exchanges. The other option for longer tail publishers is to license their content to larger publishers who can aggregate this content into their sites. It will require some work, which also means ambiguity and stress. But certainly this is not insurmountable.

I also will say that first-party tracking is both ethical and justifiable. Third-party tracking without the user’s permission is ethically a challenging issue, and I’d argue that it’s not in the best interest of our industry to try and perpetuate – especially since there are viable and acceptable alternatives.

That doesn’t mean switching off of third-party cookies is free or easy. But in my opinion, it’s the right way to do this for long-term viability.

What everyone should know about ad serving

By Eric Picard (Originally published in iMediaConnection.com)

Publisher-side ad servers such as DoubleClick for Publishers, Open AdStream, FreeWheel, and others are the most critical components of the ad industry. They’re responsible ultimately for coordination of all the revenue collected by the publisher, and they do an amazing amount of work.

Many people in the industry — especially on the business side of the industry — look at their ad server as mission critical, sort of in the way they look at the electricity provided by their power utility. Critical — but only in that it delivers ads. To ad operations or salespeople, the ad server is most often associated with how they use the user interface — really the workflow they interact with directly. But this is an oversight on their part.

The way that the ad server operates under the surface is actually something everyone in the industry should understand. Only by understanding some of the details of how these systems function can good business decisions be made.

Ad delivery

Ad servers by nature make use of several real-time systems, the most critical being ad delivery. But ad delivery is not a name that adequately describes what those systems do. An ad delivery system is really a decision engine. It reviews an ad impression in the exact moment that it is created (by a user visiting a page), reviews all the information about that impression, and makes the decision about which ad it should deliver. But the real question is this: How does that decision get made?

An impression could be thought of as a molecule made up of atoms. Each atom is an attribute that describes something about that impression. These atomic attributes can be simple media attributes, such as the page location that the ad is imbedded into, the category of content that the page sits within, or the dimensions of the creative. They can be audience attributes such as demographic information taken from the user’s registration data or a third-party data company. They can be complex audience segments provided by a DMP such as “soccer mom” — which is in itself almost a molecular object made up of the attributes of female, parent, children in sports — and of course various other demographic and psychographic atomic attributes.

When taken all together, these attributes define all the possible interpretations of that impression. The delivery engine now must decide (all within a few milliseconds) how to allocate that impression against available line items. This real-time inventory allocation issue is the most critical moment in the life of an impression. Most people in our industry have no understanding of what happens in that moment, which has led to many uninformed business, partnership, and vendor licensing decisions over the years, especially when it comes to operations, inventory management, and yield.

Real-time inventory allocation decides which line items will be matched against an impression. The way these decisions get made reflects the relative importance placed on them by the engineers who wrote the allocation rules. These, of course, are informed by business people who are responsible for yield and revenue, but the reality is that the tuning of allocation against a specific publisher’s needs is not possible in a large shared system. So the rules get tuned as best they can to match the overarching case that most customers face.

Inventory prediction

Well before the impression is generated and has to be allocated out to the impressions in real-time, inventory was sold in advance based on predictions of how much volume would exist in the future. We call these predicted impressions “avails” (for “available to sell”) in our industry, and they’re essentially the basis for how all guaranteed impressions are sold.

We’ll get back to the real-time allocation in a moment, but first let’s talk a bit about avails. The avails calculation done by another component of the ad server, responsible for inventory prediction, is one of the hardest computer science problems facing the industry today. Predicting how much inventory will exist is hard — and extremely complicated.

Imagine if you will that you’ve been asked to predict a different kind of problem than ad serving — perhaps traffic patterns on a state highway system. As you might imagine, predicting how many cars will be on the entire highway next month is probably not very hard to do with a pretty high degree of accuracy. There’s historical data going back years of time, month by month. So you could take a look at the month of April for the last five years, see if there’s any significant variance, and use a bit of somewhat sophisticated math to determine a confidence interval for how many cars will be on the highway in the month of April 2013.

But imagine that you now wanted to zoom into a specific location — let’s say the Golden Gate Bridge. And you wanted to break that prediction down further, let’s say Wednesday, April 3. And let’s say that we wanted to predict not only how many cars would be on the bridge that day, but how many cars with only one passenger. And further, we wanted to know how many of those cars were red and driven by women. And of those red, female-driven cars, how many of them are convertible sports cars? Between 2 and 3 p.m.

Even if you could get some kind of idea how many matches you’ve had in the past, predicting at this level of granularity is very hard. Never mind that there are many outside factors that could affect this; there are short-term issues that could help get more accurate as you get closer in time to the event such as weather and sporting events, and there are much more unpredictable events such as car accidents, earthquakes, etc.

This is essentially the same kind of prediction problem as the avails prediction problem that we face in the online advertising industry. Each time we layer on one bit of data (some defining attribute) onto our inventory definition, we make it harder and harder to predict with any accuracy how many of those impressions will exist. And because we’ve signed up for a guarantee that this inventory will exist, the engineers creating the algorithms that predict how much inventory will exist need to be very conservative on their estimates.

When an ad campaign is booked by an account manager at the publisher, they “pull avails” based on their read of the RFP and media plan and try to find matching inventory. These avails are then reserved in the system (the system puts a hold on avails that are being sent back to the buyer based for a period of time) until the insertion order (I/O) is signed by the buyer. At this moment, a preliminary allocation of predicted avails (impressions that don’t exist yet) is made by a reservation system, which divvies out the avails among the various I/Os. This is another kind of allocation that the ad server does in advance of the campaign actually running live, and it has as much (or even more) impact as the real-time allocation does on overall yield.

How real-time allocation decisions get made

Once a contract has been signed to guarantee that these impressions will in fact be delivered, it’s up to the delivery engine’s allocation system to decide which of the matching impressions to assign to which line items. The primary criteria used to make this decision is how far behind the matching line items are for successfully delivering against their contract, which we call “starvation” (i.e., is the line item starving to death or is it on track to fulfill its obligated impression volume?).

Because the engineers who wrote the avails prediction algorithms were conservative, the system generally has a lot of wiggle room when it comes to delivering against most line items that are not too complex. That means there are usually more impressions available when the impressions are allocated than were predicted ahead of time. So when all the matching line items are not starving, there are other decision criteria that can be used. The clearest one is yield, (i.e., of the available line items to allocate, which one of those lines will get me the most money for this impression?).

Implications of real-time allocation and inventory prediction

There’s a tendency in our industry to think about ad inventory as if it “exists” ahead of time, but as we’ve just seen, an impression is ephemeral. It exists only for a few milliseconds in the brain of a computer that decides what ad to send to the user’s machine. Generally there are many ways that each impression could be fulfilled, and the systems involved have to make millions or billions of decisions every hour.

We tend to think about inventory in terms of premium and remnant, or through a variety of lenses. But the reality is before the inventory is sold or unsold, premium or remnant, or anything else, it gets run through this initial mechanism. In many cases, inventory that is extremely valuable gets allocated to very low CPM impression opportunities or even to remnant because of factors having little to do with what that impression “is.”

There are many vendors in the space, but let’s chat for a moment about two groups of vendors: supply-side platforms (SSPs) and yield management companies.

Yield management firms focus on providing ways for publishers to increase yield on inventory (get more money from the same impressions), and most have different strategies. The two primary companies folks talk to me about these days are Yieldex and Maxifier. Yieldex focuses on the pre-allocation problem — the avails reservations done by account managers as well as the inventory prediction problem. Yieldex also provides a lot of analytics capabilities and is going to factor significantly in the programmatic premium space as well. Maxifier focuses on the real-time allocation problem and finds matches between avails that drive yield up, and it improves matches on other performance metrics like click-through and conversions, as well as any other KPI the publisher tracks, such as viewability or even engagement. Maxifier does this while ensuring that campaigns deliver, since premium campaigns are paid on delivery but measured in many cases on performance. The company is also going to figure heavily into the programmatic premium space, but in a totally different way than Yieldex. In other words, neither company really competes with each other.

Google’s recent release of its dynamic allocation features for the ad exchange (sort of the evolution of the Admeld technology) also plays heavily into real-time allocation and yield decisions. Specifically, the company can compare every impression’s yield opportunity between guaranteed (premium) line items and the response from the DoubleClick Exchange (AdX) to determine on a per-impression basis which will pay the publisher more money. This is very close to what Maxifier does, but Maxifier does this across all SSPs and exchanges involved in the process. Publishers I’ve talked to using all of these technologies have gushed to me about the improvements they’ve seen.

SSPs are another animal altogether. While the yield vendors above are focused on increasing the value of premium inventory and/or maximizing yield between premium and exchange inventory (I think of this as pushing information into the ad server to increase value), the SSPs are given remnant inventory to optimize for yield among all the various venues for clearing remnant inventory. By forcing competition among ad networks, exchanges, and other vehicles, they can drive the price up on remnant inventory.

How to apply this article to your business decisions

I’ve had dozens of conversations with publishers about yield, programmatic premium, SSPs, and other vendors. The most important takeaway I can leave you with is that you should think about premium yield optimization as a totally different track than discussions about remnant inventory.

When it comes to remnant inventory, whoever gets the first “look” at the inventory is likely to provide the highest increase in yield. So when testing remnant options, you have to ensure that you’re testing each one exactly the same way — never beneath each other. Most SSPs and exchanges ultimately provide the same exact demand through slightly different lenses. This means that barring some radical technical superiority — which none have shown me to be the case so far — the decision most likely will come down to ease of integration and ultimately customer service.