What are Internet cookies

Most Internet cookies are incredibly simple, but they are one of those
things that have taken on a life of their own. Cookies started receiving
tremendous media attention back in 2000 because of Internet privacy
concerns, and the debate still rages.

On the other hand, cookies provide capabilities that make the Web much
easier to navigate. The designers of almost every major site use them
because they provide a better user experience and make it much easier to
gather accurate information about the site's visitors.

In this article, we will take a look at the basic technology behind cookies,
as well as some of the features they enable.
Cookie BasicsIn April of 2000 I read an in-depth article on Internet
privacy in a large, respected
newspaper,
and that article contained a definition of cookies. Paraphrasing, the
definition went like this:

Cookies are programs that Web sites put on your hard disk. They sit on
your computer gathering information about you and everything you do on the
Internet, and whenever the Web site wants to it can download all of the
information the cookie has collected. *[wrong]*

Definitions like that are fairly common in the press. The problem is, none
of that information is correct. Cookies are not
programs,
and they cannot run like programs do. Therefore, they cannot gather any
information on their own. Nor can they collect any personal information
about you from your machine.

Here is a valid definition of a cookie: A cookie is a piece of text that a Web
server can store on a
user's hard disk . Cookies
allow a Web site to store information on a user's machine and later retrieve
it. The pieces of information are stored as *name-value pairs*.

For example, a Web site might generate a unique ID number for each visitor
and store the ID number on each user's machine using a cookie file.

If you use Microsoft's Internet Explorer to browse the Web, you can see all
of the cookies that are stored on your machine. The most common place for
them to reside is in a directory called *c:\windows\cookies*. When I look in
that directory on my machine, I find 165 files. Each file is a *text
file*that contains name-value pairs, and there is one file for each
Web site that
has placed cookies on my machine.

You can see in the directory that each of these files is a simple, normal
text file. You can see which Web site placed the file on your machine by
looking at the file name (the information is also stored inside the file).
You can open each file by clicking on it.

For example, I have visited *goto.com*, and the site has placed a cookie on
my machine. The cookie file for goto.com contains the following information:

UserID A9A3BECE0563982D www.goto.com/

*Goto.com* has stored on my machine a single name-value pair. The name of
the pair is *UserID*, and the value is *A9A3BECE0563982D*. The first time I
visited goto.com, the site assigned me a unique ID value and stored it on my
machine.

(Note that there probably are several other values stored in the file after
the three shown above. That is housekeeping information for the browser.)

*Amazon.com* stores a bit more information on my machine. When I look at the
cookie file Amazon has created on my machine, it contains the following:

session-id-time 954242000 amazon.com/
session-id 002-4135256-7625846 amazon.com/
x-main eKQIfwnxuF7qtmX52x6VWAXh@Ih6Uo5H amazon.com/
ubid-main 077-9263437-9645324 amazon.com/

It appears that Amazon stores a main user ID, an ID for each session, and
the time the session started on my machine (as well as an x-main value,
which could be anything).

The vast majority of sites store just one piece of information -- a *user ID
* -- on your machine. But a site can store many name-value pairs if it wants
to.

A name-value pair is simply a named piece of data. It is not a program, and
it cannot "do" anything. A Web site can retrieve only the information that
it has placed on your machine. It cannot retrieve information from other
cookie files, nor any other information from your machine.
How Does Cookie Data Move?As you saw in the previous section, cookie data
is simply name-value pairs stored on your hard disk by a Web site. That is
all cookie data is. The Web site stores the data, and later it receives it
back. A Web site can only receive the data it has stored on your machine. It
cannot look at any other cookie, nor anything else on your machine.

The data moves in the following manner:

- If you type the URL of a Web site into your browser, your browser
sends a request to the Web site for the page (see How Web Servers
Work<http://computer.howstuffworks.com/web-server.htm>for a
discussion). For example, if you type the URL
*http://www.amazon.com*into your browser, your browser will contact
Amazon's server and request its home page.

- When the browser does this, it will look on your machine for a
cookie file that Amazon has set. If it finds an Amazon cookie file, your
browser will send all of the name-value pairs in the file to Amazon's server
along with the URL. If it finds no cookie file, it will send no cookie data.

- Amazon's Web server receives the cookie data and the request for a
page. If name-value pairs are received, Amazon can use them.

- If no name-value pairs are received, Amazon knows that you have not
visited before. The server creates a new ID for you in Amazon's database and
then sends name-value pairs to your machine in the header for the Web
page <http://computer.howstuffworks.com/web-page.htm> it sends. Your
machine stores the name-value pairs on your hard disk.

- The Web server can change name-value pairs or add new pairs whenever
you visit the site and request a page.

There are other pieces of information that the server can send with the
name-value pair. One of these is an *expiration date*. Another is a
*path*(so that the site can associate different cookie values with
different parts
of the site).

*You have control over this process.* You can set an option in your browser
so that the browser informs you every time a site sends name-value pairs to
you. You can then accept or deny the values.
How Do Web Sites Use Cookies?Cookies evolved because they solve a big
problem for the people who implement Web sites. In the broadest sense, a
cookie allows a site to store *state information* on your machine. This
information lets a Web site remember what *state* your browser is in. An ID
is one simple piece of state information -- if an ID exists on your machine,
the site knows that you have visited before. The state is, "Your browser has
visited the site at least one time," and the site knows your ID from that
visit.

Web sites use cookies in many different ways. Here are some of the most
common examples:

- Sites can *accurately determine how many people actually visit the
site.* It turns out that because of proxy
servers<http://computer.howstuffworks.com/firewall4.htm>,
caching <http://computer.howstuffworks.com/cache.htm>,
concentrators<http://computer.howstuffworks.com/vpn10.htm>and so on,
the only way for a site to accurately count visitors is to set a
cookie with a unique ID for each visitor. Using cookies, sites can
determine:
- How many visitors arrive
- How many are new vs. repeat visitors
- How often a visitor has visited

The way the site does this is by using a *database*. The first time a
visitor arrives, the site creates a new ID in the database and sends the ID
as a cookie. The next time the user comes back, the site can increment a
counter associated with that ID in the database and know how many times that
visitor returns.

- Sites can *store user preferences* so that the site can look
different for each visitor (often referred to as *customization*). For
example, if you visit *msn.com*, it offers you the ability to "change
content/layout/color." It also allows you to enter your zip code and get
customized weather information. When you enter your zip code, the following
name-value pair gets added to MSN's cookie file:

WEAT CC=NC%5FRaleigh%2DDurham(r)ION= www.msn.com/

Since I live in Raleigh, NC, this makes sense.

Most sites seem to store preferences like this in the site's database
and store nothing but an ID as a cookie, but storing the actual values in
name-value pairs is another way to do it (we'll discuss later why this
approach has lost favor).

- E-commerce sites
<http://computer.howstuffworks.com/ecommerce.htm>can implement things
like
*shopping carts* and *"quick checkout" options*. The cookie contains
an ID and lets the site keep track of you as you add different things to
your cart. Each item you add to your shopping cart is stored in the site's
database along with your ID value. When you check out, the site knows what
is in your cart by retrieving all of your selections from the database. It
would be impossible to implement a convenient shopping mechanism without
cookies or something like them.

In all of these examples, note that what the database is able to store is
things you have selected from the site, pages you have viewed from the site,
information you have given to the site in online forms, etc. All of the
information is stored in the site's database, and in most cases, a cookie
containing your unique ID is all that is stored on your computer. Problems
with CookiesCookies are not a perfect state mechanism, but they certainly
make a lot of things possible that would be impossible otherwise. Here are
several of the things that make cookies imperfect.

- *People often share machines* - Any machine that is used in a public
area, and many machines used in an office environment or at home, are shared
by multiple people. Let's say that you use a public machine (in a library,
for example) to purchase something from an on-line store. The store will
leave a cookie on the machine, and someone could later try to purchase
something from the store using your account. Stores usually post large
warnings about this problem, and that is why. Even so, mistakes can happen.
For example, I had once used my wife's machine to purchase something from
Amazon. Later, she visited Amazon and clicked the "one-click" button, not
realizing that it really does allow the purchase of a book in exactly one
click.

On something like a Windows NT machine or a UNIX machine that uses *
accounts* properly, this is not a problem. The accounts separate all
of the users' cookies. Accounts are much more relaxed in other operating
systems, and it is a problem.

If you try the example above on a public machine, and if other people
using the machine have visited HowStuffWorks, then the history URL may show
a very long list of files.

- *Cookies get erased* - If you have a problem with your browser and
call tech support, probably the first thing that tech support will ask you
to do is to erase all of the temporary Internet files on your machine. When
you do that, you lose all of your cookie files. Now when you visit a site
again, that site will think you are a new user and assign you a new cookie.
This tends to skew the site's record of new versus return visitors, and it
also can make it hard for you to recover previously stored preferences. This
is why sites ask you to *register* in some cases -- if you register
with a user name and a password, you can login, even if you lose your cookie
file, and restore your preferences. If preference values are stored directly
on the machine (as in the MSN weather example above), then recovery is
impossible. That is why many sites now store all user information in a
central database and store only an ID value on the user's machine.

If you erase your cookie file for *HowStuffWorks* and then revisit the
history URL in the previous section, you will find that *HowStuffWorks
* has no history for you. The site has to create a new ID and cookie
file for you, and that new ID has no data stored against it in the database.
(Also note that the HowStuffWorks Registration
System<http://computer.howstuffworks.com/register.htm>allows you to
reset your history list whenever you like.)

- *Multiple machines* - People often use more than one machine during
the day. For example, I have a machine in the office, a machine at home and
a laptop <http://computer.howstuffworks.com/laptop.htm> for the road.
Unless the site is specifically engineered to solve the problem, I will have
three unique cookie files on all three machines. Any site that I visit from
all three machines will track me as three separate users. It can be annoying
to set preferences three times. Again, a site that allows registration and
stores preferences centrally may make it easy for me to have the same
account on three machines, but the site developers must plan for this when
designing the site.

If you visit the history URL demonstrated in the previous section from
one machine and then try it again from another, you will find that your
history lists are different. This is because the server created two IDs for
you, one on each machine.

There are probably not any easy solutions to these problems, except asking
users to register and storing everything in a central database.

When you register with the HowStuffWorks registration system, the problem is
solved in the following way: The site remembers your cookie value and stores
it with your registration information. If you take the time to login from
any other machine (or a machine that has lost its cookie files), then the
server will modify the cookie file on that machine to contain the ID
associated with your registration information. You can therefore have
multiple machines with the same ID value.
Why the Fury Around Cookies?If you have read the article to this point,
you may be wondering why there has been such an uproar in the media about
cookies and Internet privacy. You have seen in this article that cookies are
benign text files, and you have also seen that they provide lots of useful
capabilities on the Web.

There are two things that have caused the strong reaction around cookies:

- The first is something that has plagued consumers for decades. Let's
say that you purchase something from a traditional mail order catalog. The
catalog company has your name, address and phone number from your order, and
it also knows what items you have purchased. It can *sell your
information* to others who might want to sell similar products to you.
That is the fuel that makes *telemarketing* and *junk mail* possible.

On a Web site, the site can track not only your purchases, but also
the pages that you read, the ads that you click on, etc. If you then
purchase something and enter your name and address, the site potentially
knows much more about you than a traditional mail order company does. This
makes *targeting* much more precise, and that makes a lot of people
uncomfortable.

Different sites have different policies. HowStuffWorks has a strict privacy
policy and does not
sell or share any personal information about our readers with any third
party except in cases where you specifically tell us to do so (for example,
in an opt-in e-mail
program). We do aggregate
information together and distribute it. For
example, if a reporter asks me how many visitors HowStuffWorks has or which
page on the site is the most popular, we create those aggregate statistics
from data in the database.

- The second is unique to the Internet. There are certain
infrastructure providers that can actually create cookies that are visible
on multiple sites.
DoubleClickis
the most famous example of this. Many companies use DoubleClick to
serve ad
banners on their
sites. DoubleClick can place small (1x1 pixels) GIF files on the site that
allow DoubleClick to load cookies on your machine. DoubleClick can then
track your movements across multiple sites. It can potentially see the
search strings that you type into search
engines(due more
to the way some search engines implement their systems, not
because anything sinister is intended). Because it can gather so much
information about you from multiple sites, DoubleClick can form very rich
*profiles*. These are still anonymous, but they are rich.

DoubleClick then went one step further. By acquiring a company,
DoubleClick threatened to link these rich anonymous profiles back to name
and address information -- it threatened to personalize them, and then sell
the data. That began to look very much like spying to most people, and that
is what caused the uproar.

DoubleClick and companies like it are in a unique position to do this
sort of thing, because they serve ads on so many sites. *Cross-site
profiling* is not a capability available to individual sites, because
cookies are site specific

1 comments:

Anonymous said...

reverse cell phone lookup

Post a Comment

There was an error in this gadget
This Day in History

Today's Birthday

In the News

Quote of the Day
There was an error in this gadget