By Eric Picard (Originally published in iMediaConnection, April 11th, 2013)
Targeting data is ubiquitous in online advertising and has become close to “currency” as we think about it in advertising. And I mean currency in the same way that we think about Nielsen ratings in TV or impression counts in digital display. We pay for inventory today in many cases based on a combination of the publisher, the content associated with the impression, and the data associated with a variety of elements. This includes the IP address of the computer (lots of derived data comes from this), the context of the page, various content categories and quality metrics, and — of course — behavioral and other user-based targeting attributes.
But for all the vetting done by buyers of base media attributes, such as the publisher or the page or quality scores, there’s still very little understanding of where targeting data comes from. And even less when it comes to understanding how it should be valued and why. So this article is about just that topic: how targeting data is derived and how you should think about it from a value perspective.
Let’s get the basic stuff out of the way: anything derived from the IP address and user agent. When a browser visits a web page, it spits out a bunch of data to the servers that it accesses. The two key attributes are IP address and user agent. The IP address is a simple one; it’s the number assigned to the user’s computer by the internet to allow that computer to be identified by the various servers it touches. It’s a unique number that allows an immense amount of information to be inferred; the key piece of information inferred is the geography of the user.
There are lots of techniques used here to varying degrees of “granularity.” But we’ll just leave it at the idea that companies have amassed lists of IP addresses assigned to specific geographic locations. It’s pretty accurate in most cases, but there are still scenarios where people are connected to the internet via private networks (such as a corporate VPN) that confuse the world by assigning IP addresses to users in one location when they are actually in another. This was the classic problem with IP address based geography back in the days of dial-up, when most users showed up as part of Reston, Va. (where AOL had its data centers). Today where most users are on broadband, the mapping is much more accurate and comprehensive.
As important as geography are the various mappings that are done against location. Claritas, Prism, and other derived data products make use of geography to map a variety of attributes to the user browsing the page. And these techniques have moved out of traditional media (especially direct-response mailing lists) to digital and are quite useful. The only issue is that the further down the chain of assumptions used to derive attributes, the more muddled things become. Statistically, the data still is relevant, but on a per-user basis it is potentially completely inaccurate. That shouldn’t stop you from using this information, nor should you devalue it — but just be clear that there’s a margin of error here.
User agent is an identifier for the browser itself, which can be used to target users of specific browsers but also to identify non-browser activity that chooses to identify itself. For instance, various web crawlers such as search engines identify themselves to the server delivering a web page, and ad servers know not to count those ad impressions as human. This assumes good behavior on behalf of the programmers, and sometimes “real” user agents are spoofed when the intent is to create fake impressions. Sometimes a malicious ad network or bad actor will do this to create fake traffic to drive revenue.
Crawled data
There’s a whole class of data that’s derived by sending a robot to a web page, crawling through the content on the page, and classifying the content based on all sorts of analysis. This mechanism is how Google, Bing, and other search engines classify the web. Contextual targeting systems like AdSense classify the web pages into keywords that can be matched by ad sales systems. And quality companies, like Trust Metrics and others, scan pages and use hundreds or thousands of criteria to value the rank of the page — everything from ensuring that the page doesn’t contain porn or hate speech to analyzing the amount of white space around images and ads and the number of ads on a page.
User targeting
Beyond the basics of browser, IP, and page content, the world is much less simple. Rather than diving into methodologies and trying to simplify a complex problem, I’ll simply list and summarize the options here:
Registration data: Publishers used to require registration in order to access their content and, in that process, request a bunch of data such as address, demographics, psychographics, and interests. This process fell out of favor for many publishers over the years, but it’s coming back hard. Many folks in our industry are cynical about registration data, using their own experiences and feelings to discount the validity of user registration data. But in reality, this data is highly accurate; even for large portals, it is often higher than 70 percent accurate, and for news sites and smaller publishers, it’s much more accurate.
Interestingly, the use of co-registration through Facebook, Twitter, LinkedIn, and others is making this data much more accurate. One of the most valuable things about registration data is that it creates a permanent link between a user and the publisher that lives beyond the cookie. Subsequently captured data from various sessions is extremely accurate even if the user fudged his or her registration information.
First-party behavioral data: Publishers and advertisers have a great advantage over third parties in that they have a direct relationship with the user. This gives them incredible opportunities to create deeply refined targeting segments based on interest, behavior, and especially from custom created content such as showcases, contests, and other registration information. Once a publisher or advertiser creates a profile of a user, it has the means to track and store very rich targeting data — much richer in theory than a third party could easily create. For instance, you might imagine that Yahoo Finance benefits highly from registered users who track their stock portfolio via the site. Similarly, users searching for autos, travel, and other vertical-specific information create immense targeting value.
Publishers curbed their internal targeting efforts years ago because they found that third-party data companies were buying targeted campaigns on publishers and then their high-cost, high-value targeting data was leaking away to third parties. But the world has shifted again, and publishers and advertisers both are benefiting highly from the data management platforms (DMPs) that are now common on the market. The race toward using first-party cookies as the standard for data collection is further strengthening publishers’ positions. Targeted content categories and contests are another way that publishers and advertisers have a huge advantage over third parties.
Creating custom content or contests with the intent to derive high-value audience data that is extremely vertical or particularly valuable is easy when you have a direct relationship with the user. You might imagine that Amazon has a huge lead in the market when it comes to valuation of users by vertical product interest. Similarly, big publishers can segment users into buckets based on their interest in numerous topics that can be used to extrapolate value.
Third-party data: There are many methods used to track and value users based on third-party cookies (those pesky cookies set by companies that generally don’t have a direct relationship with the user — and which are tracking them across websites). Luckily there are lots of articles out there (including many I’ve written) on how this works. But to quickly summarize: Third-party data companies generally make use of third-party cookies that are triggered on numerous websites across the internet via the use of tracking pixels. These pixels are literally just a 1×1 pixel image (sometimes called a “clear pixel”), or even just a simple no-image JavaScript call from the third-party server, that allows them to set and/or access a cookie that they can set on the users’ browsers. These cookies are extremely useful to data companies in tracking users because the same cookie can be accessed on any website, on any domain, across sessions, and sometimes across years of time.
Unfortunately for the third-party data companies, third-party cookies have recently come under intense scrutiny since Apple’s Safari doesn’t allow them by default and Firefox has announced that it will set new defaults in its next browser version to block third-party cookies. This means that those companies relying exclusively on third-party cookies will see their audience share erode and will need to fall back on other methods of tracking and profiling users. Note that these companies all use anonymous cookies and work hard to be safe and fair in their use of data. But the reality is that this method is becoming harder for companies to use.
By following users across websites, these companies can amass large and comprehensive profiles of users such that advertising can be targeted against them in deep ways and more money can be made from those ad impressions.
Read more at http://www.imediaconnection.com/content/33972.asp#qakIxCXJbl9KpiG3.99