Some search engines have added their own commands to the rules governing how search engine bots behave. The Automated Content Access Protocol (ACAP) proposal, unveiled Thursday by a consortium of publishers at the global headquarters of The Associated Press, seeks to have those extra commands – and more – apply across the board. With the ACAP commands, sites could try to limit how long search engines retain copies in their indexes, for instance, or tell the crawler not to follow any of the links that appear within a Web page.
[SOURCE: Associated Press, AUTHOR: Anick Jesdanun]
ACAP: Don’t Look Here! Please Don’t Look Here! Or Else?
http://lauren.vortex.com/archive/000333.html
Greetings. If the publishing industry’s Automated Content Access
Protocol (ACAP) project ( http://www.the-acap.org ) has been flying
under your radar during its gestation period of the last year or so,
don’t feel too bad. Unless you’re a serious follower of the complex
dance between publishers and search engines, you probably didn’t
even know that the publishing industry wants to “extend” the
venerable Robots Exclusion Protocol (robots.txt) with a
complex new system to control search engines’ access to and use of
content. Here’s an excellent new Associated Press article
regarding the ACAP announcement (AP is a member of the ACAP project):
( http://tinyurl.com/yre3pf ).
Very briefly, ACAP defines a detailed new system for sites to to
tell search engines what they may or may not index, and adds
mechanisms to specify that certain materials may only be held in
indexes for limited periods of time, or only displayed in thumbnail
form, or other restrictions — conceivably such as “don’t index this
material unless you’ve paid us first.”
The official ACAP line is that this system will encourage the
availability of more materials on the Internet. In fact, it is
generally acknowledged that robots.txt is a relatively simplistic
mechanism, and various new search enhancement systems (such as
Sitemaps (originated by Google in 2005) have indeed been evolving.
While ACAP claims broad participation by the publishing industry
(true enough), its assertion that major search engines are on board
is somewhat problematic, given that none of the search engine
biggies that would spring immediately to mind have signed up to
date, or agreed to abide by ACAP specifications when crawling sites
(they all basically say that they are evaluating the standard).
I’m certainly in favor of publishers being able to control their
content. On the other hand, my general view has long been that you
shouldn’t put materials online (when not protected by explicit user
access controls such as passwords or certificate-based control
mechanisms) unless you’re willing for the public to access it in a
reasonably open manner.
This philosophy doesn’t mean that you’re giving up your copyrights
by placing information on the Web. But I would suggest that
attempting to shift a complicated onus of responsibility for exactly
how available materials may be handled by search engines, appears
questionable and permeated with considerable risks to the Internet
at large.
Though ACAP is currently a voluntary standard, it might be assumed
that future attempts will be made to give it some force of law and
associated legal standing in court cases involving search engine use
and display of indexed materials.
The fundamental ACAP structure appears to be weighted in a manner
that provides the bulk of potential benefits to the publishing
entities and the greater part of risks to the search engines. One
possible outcome of this skewing could be that search engines,
concerned regarding litigation risks associated with not complying
exactly with any given set of ACAP directives, might restrict their
indexing of any involved sites in rather broad strokes. This could
result in a dramatic *decrease* in available materials for the
public, rather than the ACAP group’s suggested increase.
In essence, rather than the current framework where entities putting
materials on the Web take responsibility for what they’ve placed
online, the ACAP structure would appear to allow pretty much
anything to be placed online with far fewer concerns by their
publishers, who would presumably feel secure in the knowledge that
it would be up to search engines to obey ACAP or face possible
lawsuits and related actions.
Boiled down to the bottom line, I can’t help but sense that the
intended shift in responsibility that appears to be associated with
ACAP could lead to an entire new wave of litigation and possible
information restrictions — enriching lawyers to be sure — but
quite possibly being a significant negative development for Internet
users in general.
It’s too early in the ACAP life cycle to make any truly definitive
calls regarding the benefits, risks, or even basic viabilities of
ACAP itself.
But I believe that an “orange alert” is in order. ACAP is a
potentially major development with possibly wide-ranging impacts on
an exceedingly broad range of activities that are common on the
Internet today. At the very least, we should all be watching events
in this area with great care and with considerable healthy
skepticism.
–Lauren–
Lauren Weinstein
lauren@vortex.com or lauren@pfir.org
Tel: +1 (818) 225-2800
http://www.pfir.org/lauren