Notes for Web Server Capabilities Demo by Max Moroz

Note: I suggest that you read the first two sections; browse the rest only if you are interested in those particular topics.

What Is A Web Server?

When you type a URL, and get back the content of the page, the page is served to you. A web server is a computer that serves web pages to visitors.

The software that runs on the a web server is also sometimes called web server. In that sense, "web server" belongs to the following list: word processor, spreadsheet, web browser, web server. There are dozens of different web server software packages made, just as there are dozens of different word processors. In both cases, there are only a few among them that are really popular; in the case of web servers, these are Apache, Microsoft IIS, Netscape Enterprise. Unlike the word processor and spreadsheet markets, where Microsoft at present has no real competition, the web server market currently is hotly contested.

Compare two word processors: Microsoft Word and Notepad. The former is enormously more powerful. But the latter is free, starts a much faster, and occupies 100 times less space on disk. So, both are useful in different cases.

With web servers, the situation is similar. Amazon's criteria for a web server are (a) reliability, (b) security, (c) speed, (d) scalability, and (e) cost. If you want to set up a trial web server to test if it is going to help your business, and to gain some experience before committing to a certain EC strategy, your criteria are more likely to be: (a) simplicity, (b) compatibility with your existing systems, (c) cost.

A regular computer connected to the Internet can be a web server. You can even run other applications on it at the same time! For example, you can take the same notebook that you carry to class, and make it a web server by installing some web server software on it. As long as you don't get too many hits, your visitors will not notice anything.

Cookies

A cookie is a small piece of information (usually about 100 bytes) that is put on a visitor's hard drive when he visits a web site. That cookie is useful only for one thing: it is later sent back to the web server whenever that person goes to the same URL again. Actually, it is possible to make a cookie so that it is sent to all subpages of that URL (which makes sense: otherwise, a web site would lose track of a visitor the moment he leaves the home page...).

When cookies are sent, they are set to expire, i.e. disappear, either at a certain time, or at the end of the current browser session. Often, cookies are set to expire in 2025 or later, with the intent of keeping them for as long as possible. Cookies also get deleted by the browser, if too much space is wasted by all the cookies combined (which is very rare). Finally, a cookie can be deleted by the user; however, it takes some skill and is different in Netscape and Internet Explorer.

For Netscape, the cookies are very easy to see and delete: just go to the directory netscape/users/yourname/ and look at the file cookies.txt. There, you see for each cookie:

The web server will see the name and the value of the cookie, whenever appropriate (it is not interested in the other parts of the cookie, since they are used just to determine whether to show this cookie to this web server!).

How the cookies are used is completely up to the web server. The main applications are:

If you like to know more, browse this excellent article about cookies (which also has additional resources).

WN Server Built-In Search

WN has a built-in search feature. To use it, go the URL of the following type:

http://www.anderson.ucla.edu/search=synopsis, or

http://www.anderson.ucla.edu/search=context,

i.e., indicate in the URL only the directory you are interested in (and not the actual file), and append it with ?search=.... In the examples above we start the search at the home directory of the Anderson school; of course, you could specify any subdirectory if you want a more restricted search.

There are several types of searches, but the most useful ones are "synopsis" and "context". Synopsis allows you to search through titles and keywords of all the web pages, both in the directory you specify, and in its subdirectories. Context allows to search through the full text of all pages in the directory you specify, but not in any of its subdirectories. The latter restriction is enforced to prevent a big load on the web server: imagine if someone asks the web server to search full-text all the pages in the main Anderson school directory and below. Accessing all those thousands of files, and looking through each of them, may well take many minutes, during which time the web server will perform its main function with considerable delay.

WN Search & Keeping Information Private

WN's built-in search doesn't let anyone see a file that they could not see in a usual way, by typing in the URL.

So, suppose I have a file secret.htm in my home personal home page directory that I intend to keep private to myself and a class friend. I don't make any links to this file from anywhere, so I hope it will remain hidden. Then someone goes to http://personal.anderson.ucla.edu/max.moroz/search=context, and does a search on some very common combination of characters, e.g. "the". The search results are likely to include this "hidden" file. Before we complain about this security hole, we should note that if someone, instead of doing search, simply typed http://personal.anderson.ucla.edu/max.moroz/secret.htm, they would have also seen this file (i.e., the file would have been served to him)! In other words, the file was not really hidden from the world in the first place. (You can try it -- I've put that file up just for this purpose!).

Now, it is true that a visitor is unlikely to guess the path and name for my secret files (of course, I'll call them unusual names, which may be hard to remember even myself). So, the search does in some sense open otherwise hidden territory. But if I am in any way serious about not letting people see something, I must not rely on such logic as "they sure wouldn't guess the filename". There are standard, and much more reliable, ways to hide information from the outside world; and they don't take any more time or skill than picking filenames that no one can guess. Normally, everything is done in UNIX by setting proper file permissions (which determine who can read and write files, and run programs). However, in the Anderson network, running webify resets all permissions so as to allow anyone in the world to read everything under your web-personal directory, and anyone with an account at the Anderson school to read everything under your web-internal directory. Therefore, don't put in those two directories, or in their subdirectories, any files that you want kept private.

CGI Mechanism

CGI (Common Gateway Mechanism) is a mechanism that allows to transmit information between the visitor of a web page and software on the web server. This mechanism, luckily, became a standard long before Microsoft and Netscape had a chance to set their own standards to help their own web servers take the lead. Although CGI will still have a role to play, its ground is being taken over by new techniques, primarily the Java language, and also JavaScript, Active HTML, Dynamic HTML, and more. Comprehensive standards are now almost impossible to adopt, probably until someone actually conquers the whole market and will make its products de-facto standards, as Microsoft did with the PC operating system market.

Note: my next demo will compare CGI with Java and JavaScript. If you can't wait until then, take a look at this comparison chart.

Coming back to CGI, assume that I manage Netscape's web server, which of course runs Netscape Enterprise software. In addition, I have other software on this computer, which allows a variety of services to be provided to my visitors. Ann is visiting my web site. The following items cause information flow from Ann to my computer:

All this information is received by the Enterprise software, but is not processed by it; instead it gets passed on to whatever software I designated. How is the software designated? Simple:

If Ann goes to a URL that is a CGI file, e.g. http://form.netscape.com/directory/cgi-bin/community.cgi?cp=let04rmdy&SRC=DIRECTORY, the software is just what you see: it is contained in the file cgi-bin/community.cgi. This file is not an HTML page; it is an executable, i.e. a file that can be run (like wsftp.exe or setup.exe). The text after the question mark is an instruction for the program.

If Ann submits a form, the software is specified in the same way, but it is done in the HTML source code of the page like this:

<form action="http://form.netscape.com/directory/cgi-bin/register" method="post">

Ann can see this, if she wishes, through the menu item "View/Page Source" in the Netscape browser.

If Ann clicks on a so-called server-side clickable image, the software is also specified in the HTML code. Note that there are several types of images on the web:

When Ann clicks on a client-side image, her click is processed immediately by her computer, e.g., by sending her to another URL. When she clicks on a server-side image, her click coordinates go to the web server, which passes them on via the CGI mechanism to the appropriate program, which in return sends whatever it considers fit, e.g., the link to another web page. Thus, the CGI mechanism plays a role in server-side images, but not in client-side images. The latter is becoming more popular, so this use of CGI will gradually disappear.

What information does my software get about Ann? First, all the cookies that my web server previously stored on Ann's computer. Second, the regular information that you saw in the Privacy Demo: the URL Ann is viewing, Ann's host IP address, her web browser, and the page she came from. Finally, the information specific to the item that caused the CGI to operate: if Ann submitted a form, all the data from the form; if Ann clicked on an image map, the coordinates of the point she clicked on; if Ann went to URL that I marked as CGI, the part of the URL after the question mark, if any.

What does a program do once it receives all this information? Well, it can put Ann into our customer database, as long as Ann filled in the registration form correctly. Or it can initiate a shipment of our product, after checking that Ann's credit card is valid. Or it can send back an HTML page with specs and pricing information about a product Ann inquired about (using a product search feature of our web site). Or it can simply redirect Ann to a web page of an affiliated company that should handle Ann's requests further.

The simplest way to use the CGI mechanism is for the web counters. For simple counters, no information from the visitor is used. The counter is just incremented by 1 every time anyone accesses it, and sends back a nice picture containing the current number (you probably noticed that all counters are graphical). You may ask, how the counter knows that a visitor came to a site? After all, you neither submit a form, nor click on a graphical menu, nor go to a special URL that contains cgi-bin or cgi in it.

The answer is easy if you know a little about regular images in HTML. To put an image on your page, you say something like:

<IMG SRC="http://personal.anderson.ucla.edu/johndoe/mypicture.gif">

Now, if instead of mypicture.gif you put a name of a program (according to the CGI rules), e.g.,

<IMG SRC="http://personal.anderson.ucla.edu/johndoe/mycounter.cgi">,

the browser would obtain the image from that program. So, the browser itself would go to the URL http://personal.anderson.ucla.edu/johndoe/mycounter.cgi, thus activating the CGI process.

To see one of the lengthiest CGI uses, look at the URL that Dejanews goes to after you click on the search button. dnquery.xp in that URL is the name of the program that will do the search; and everything after the question mark are data going to that program, which includes your search string and the options you set. The format is the standard for CGI: name1=value1&name2=value2&name3=value3 etc., where nameX is name of one of the parameters, and valueX is the value that parameter gets. It is a bit difficult to read, since it has no spaces, and also because some characters (such as space, quote marks, etc.) are represented by their ASCII codes in hexadecimal (i.e., instead of the quotes, you see %22).