How a webpage makes it to the client, from the webserver(simplified)


Step 1
The user specifies what server, and what port to connect to. For example, if the user uses Internet Explorer he/she would type www.ilopia.com, and since Internet Explorer defaults to port 80 if no port is typed, the webbrowser knows what to do.

Step 2
The browser must now find out the IP of the server to connect to. This is not special for a webbrowser to webserver connection, this is what happens for all communication over Internet using domain names. The browser will typically first ask the system for this information, which will in one or another way find out the IP (cached on the local machine, cached in the ISP DNS server, or in worst case, ask a top level DNS server).

Step 3
The client now makes the connection to the server. This is done using the IP and port only. So far, the webserver have no idea what the client making the request want.

Step 4
The client now sends a request message. This is sent using the HTTP protocol. The simpliest request would look like:

GET /index.html HTTP/1.1

This request asks for the page index.html. A request like this does not specify any Host Header. So if the webserver is configured with Host Headers only, the Web Server does not know what to send to a client making this request. The server should (says the RFC) reply with a 400 Bad Request on a request like this. A valid request looks like this:

GET /index.html HTTP/1.1
Host: www.ilopia.com

Step 5 The server looks at the request message, and takes action. In the latter request, it will use the host header sent, and see if there is something matching that host headers. If there is, it will serve index.html from that home folder. If www.ilopia.com is not found as a host header in the webserver, it will use the default (if any) home folder(this is IIS default settings, might not be true for other webservers).

Step 6 The server will now respond to the request, by sending some header information, and the content of the requested webpage.

Demonstration


So, let's make a demonstration here (which actually works in the minute i write this, but might now work later...). Let's do a request for a page on the server www.ilopia.com.

telnet www.ilopia.com 80
GET /index.html HTTP/1.1 <ENTER>
Host: www.microsoft.com <ENTER>

The server will now reply back with this information:

HTTP/1.1 200 OK
Content-Length: 55
Content-Type: text/html
Last-Modified: Sat, 06 Mar 2004 21:24:38 GMT
Accept-Ranges: bytes
ETag: "f2ed676ec13c41:4a5"
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Sat, 06 Mar 2004 21:39:43 GMT

<html>
<body>
This is NOT Microsoft
</body>
</html>

Yes, we actually asked for the Host www.microsoft.com, and got a reply back. But this is in no way related to the www.microsoft.com you are familiar with. So, what happened?

We connected to the server www.ilopia.com on port 80, which was looked up to be 217.208.8.97. We then sent a request for the page index.html and the Host Header information was www.microsoft.com. The webserver does not care if the domain name www.microsoft.com is looked up to be the same IP as the webserver. What the webserver only care about is that there was a request for this Host, so it is either on the WebServer, or not. It does not try to look it up in any way, using external resources. And since I have a Host Header for www.microsoft.com on this server, the client got back a page!

The log file on the webserver would now look like:

#Software: Microsoft Internet Information Services 6.0
#Version: 1.0
#Date: 2004-03-06 22:01:59
#Fields: date time s-sitename s-computername s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken 
2004-03-06 22:01:59 W3SVC2122583390 ILOPIA 217.208.8.97 GET /index.html - 80 - 192.168.0.5 HTTP/1.1 - - - www.microsoft.com 200 0 0 302 53 16645


Since we did not give it any other header fields, we have a lot of dashes.

Links

Host in RFC 2616
RFC 2616

Keywords: communication client webbrowser webserver request response