NEWS WEB SITES SEEK MORE SEARCH CONTROL

Some search engines have added their own commands to the rules governing how search engine bots behave. The Automated Content Access Protocol (ACAP) proposal, unveiled Thursday by a consortium of publishers at the global headquarters of The Associated Press, seeks to have those extra commands – and more – apply across the board. With the ACAP commands, sites could try to limit how long search engines retain copies in their indexes, for instance, or tell the crawler not to follow any of the links that appear within a Web page.


[SOURCE: Associated Press, AUTHOR: Anick Jesdanun]

About joly

isoc member since 1995

One thought on “NEWS WEB SITES SEEK MORE SEARCH CONTROL

  1. ACAP: Don’t Look Here! Please Don’t Look Here! Or Else?

    http://lauren.vortex.com/archive/000333.html

    Greetings. If the publishing industry’s Automated Content Access
    Protocol (ACAP) project ( http://www.the-acap.org ) has been flying
    under your radar during its gestation period of the last year or so,
    don’t feel too bad. Unless you’re a serious follower of the complex
    dance between publishers and search engines, you probably didn’t
    even know that the publishing industry wants to “extend” the
    venerable Robots Exclusion Protocol (robots.txt) with a
    complex new system to control search engines’ access to and use of
    content. Here’s an excellent new Associated Press article
    regarding the ACAP announcement (AP is a member of the ACAP project):
    ( http://tinyurl.com/yre3pf ).

    Very briefly, ACAP defines a detailed new system for sites to to
    tell search engines what they may or may not index, and adds
    mechanisms to specify that certain materials may only be held in
    indexes for limited periods of time, or only displayed in thumbnail
    form, or other restrictions — conceivably such as “don’t index this
    material unless you’ve paid us first.”

    The official ACAP line is that this system will encourage the
    availability of more materials on the Internet. In fact, it is
    generally acknowledged that robots.txt is a relatively simplistic
    mechanism, and various new search enhancement systems (such as
    Sitemaps (originated by Google in 2005) have indeed been evolving.

    While ACAP claims broad participation by the publishing industry
    (true enough), its assertion that major search engines are on board
    is somewhat problematic, given that none of the search engine
    biggies that would spring immediately to mind have signed up to
    date, or agreed to abide by ACAP specifications when crawling sites
    (they all basically say that they are evaluating the standard).

    I’m certainly in favor of publishers being able to control their
    content. On the other hand, my general view has long been that you
    shouldn’t put materials online (when not protected by explicit user
    access controls such as passwords or certificate-based control
    mechanisms) unless you’re willing for the public to access it in a
    reasonably open manner.

    This philosophy doesn’t mean that you’re giving up your copyrights
    by placing information on the Web. But I would suggest that
    attempting to shift a complicated onus of responsibility for exactly
    how available materials may be handled by search engines, appears
    questionable and permeated with considerable risks to the Internet
    at large.

    Though ACAP is currently a voluntary standard, it might be assumed
    that future attempts will be made to give it some force of law and
    associated legal standing in court cases involving search engine use
    and display of indexed materials.

    The fundamental ACAP structure appears to be weighted in a manner
    that provides the bulk of potential benefits to the publishing
    entities and the greater part of risks to the search engines. One
    possible outcome of this skewing could be that search engines,
    concerned regarding litigation risks associated with not complying
    exactly with any given set of ACAP directives, might restrict their
    indexing of any involved sites in rather broad strokes. This could
    result in a dramatic *decrease* in available materials for the
    public, rather than the ACAP group’s suggested increase.

    In essence, rather than the current framework where entities putting
    materials on the Web take responsibility for what they’ve placed
    online, the ACAP structure would appear to allow pretty much
    anything to be placed online with far fewer concerns by their
    publishers, who would presumably feel secure in the knowledge that
    it would be up to search engines to obey ACAP or face possible
    lawsuits and related actions.

    Boiled down to the bottom line, I can’t help but sense that the
    intended shift in responsibility that appears to be associated with
    ACAP could lead to an entire new wave of litigation and possible
    information restrictions — enriching lawyers to be sure — but
    quite possibly being a significant negative development for Internet
    users in general.

    It’s too early in the ACAP life cycle to make any truly definitive
    calls regarding the benefits, risks, or even basic viabilities of
    ACAP itself.

    But I believe that an “orange alert” is in order. ACAP is a
    potentially major development with possibly wide-ranging impacts on
    an exceedingly broad range of activities that are common on the
    Internet today. At the very least, we should all be watching events
    in this area with great care and with considerable healthy
    skepticism.

    –Lauren–
    Lauren Weinstein
    lauren@vortex.com or lauren@pfir.org
    Tel: +1 (818) 225-2800
    http://www.pfir.org/lauren

Leave a Reply