Web Technologies/2021-2022/Laboratory 1

Communication architectures

 * Client-server (eg. Messenger, IRC, WhatsApp, Netflix) - centralized
 * Peer to peer (eg. BitTorrent, Blockchain, Gnutella, Kazza) - decentralized

About client server architectures
Most inter-process communication uses the client server model. These terms refer to the two processes which will be communicating with each other. One of the two processes, the client, connects to the other process, the server, typically to make a request for information. A good analogy is a person who makes a phone call to another person.

NOTE: The client needs to know of the existence of and the address of the server, but the server does not need to know the address of (or even the existence of) the client prior to the connection being established.

NOTE: Once a connection is established, both sides can send and receive information.

The system calls for establishing a connection are somewhat different for the client and the server, but both involve the basic construct of a socket i.e. one end of an inter-process communication channel. Each of the two processes must establish their own sockets.

There exist two different kinds of sockets TCP (Transmission Control Protocol, which are connection oriented) and UDP (User Datagram Protocol, which are datagram oriented).


 * TCP implies an overhead for establishing a connection, but the guarantee that our packages arrival is confirmed and they are ordered correctly.
 * UDP does not have a connection overhead, but we do not know if our packages have arrived and they are in the correct order (such mechanisms have to be implemented by the developers over UDP)

The Internet usually relies on this kind of communication model.

In what follows we present the steps required for both client and server to create sockets and communicate through them

Testing client/server using GNU NetCat (nc) on UNIX systems:
You can use the GNU NetCat (nc) tool for testing TCP and UDP clients and servers.

Starting a TCP server with netcat: $ nc -l 8081 Connecting to a TCP server with a netcat client: $ nc localhost 8081 Starting a UDP server with netcat: $ nc -u -l 8081 Connecting to a UDP server with a netcat client: $ nc -u localhost 8081 Links:


 * Berkeley sockets
 * Internet sockets
 * OSI model
 * TCP/IP
 * TCP
 * Internet Address
 * Internet Port
 * DNS

TCP/IP
Set of protocols used for Internet communication (and other similar networks)

Its name comes from :


 * Transmission Control Protocol - TCP
 * Internet Protocol - IP

TCP is characterised by state; a message is always followed by a response. This in contrast with the UDP (User Datagram Protocol) protocol where there is no ensurance that a message has arrived at its destination.

It is composed of four layers: Link layer, Internet layer, Transport layer and Application layer. In contrast the OSI (Open System Interconnection) model is made up of 7 layers: Physical, Data, Network, Transport, Session, Presentation and Application layer.

TCP and UDP work at Transport level.

URI -- Uniform Resource Identifier
Used to identify resources. It is made up of the following subclasses:


 * URL - denotes a resource using the exact location by encoding the exact access method and parameters.
 * URN - denotes a resource by uniquely identifying the resource and not relating to its location.

URL syntax
URL: http:// [: ]/[ ][? ]

URN syntax
Specification: RFC 1630.



Example: foo://example.com:8042/over/there?name=ferret#nose \ /  \______________/\_________/ \_________/ \__/    |           |             |           |        | scheme     authority        path       query   fragment |  ______________________|_   / \ /                        \   urn:example:animal:ferret:nose

Design criteria
(Quoted from RFC 1630)


 * Extensible:
 * new naming schemes may be added later.


 * Complete:
 * It is possible to encode any naming scheme.


 * Printable:
 * It is possible to express any URI using 7-bit ASCII characters so that URIs may, if necessary, be passed using pen and ink.

HTTP
Is an application-level protocol for distributed, collaborative, hypermedia information systems. Its name comes from Hyper Text Transfer Protocol.

Lead to the creation of the World Wide Web (WWW) in 1990 by Tim Berners-Lee.

The HTTP/1.1 standard were released in June 1999. Amongst its features we enumerate: persistent connections, pipelining, virtual hosting, chunked transfer.

Actors

 * User agent:
 * Is a client application which contacts a server on behalf of the user:
 * download client
 * web browser
 * web spider


 * Server:
 * Is a server application which receives requests and answers them


 * Proxy:
 * Is a server application that receives requests and decides to serve them itself, or pass them to the real server, or through a chain of servers. The requests and responses transferred may be modified by it:
 * caching proxy;
 * anonymizing proxy;
 * transparent proxy;
 * reverse proxy;

Protocol
Specifications: RFC 2616.

Request: [method] [resource] [version] [header]: [value]  Example request: GET /index.html HTTP/1.1 Host: www.example.com  Response: [version] [status] [message] [header]: [value]  [body]... Example response: HTTP/1.1 200 OK Date: Mon, 23 May 2005 22:38:34 GMT Server: Apache/1.3.27 (Unix) (Red-Hat/Linux) Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT Etag: "3f80f-1b6-3e1cb03b" Accept-Ranges: bytes Content-Length: 438 Connection: close<CRLF> Content-Type: text/html; charset=UTF-8<CRLF> <CRLF> <Content ...>

Request methods
HEAD - used to retrieve only the header of the response. Useful for requesting the meta-information without the actual content.

GET - Used to retrieve both the meta-information and the content of the resource. It is the most used method. It should have no side-effects. (It should be a safe method)

POST - Used to send some data to be processed. (For example as result of filling and sending some user forms)

PUT - used to replace a resource

DELETE - used to remove a resource

TRACE - used to debug or diagnosticate a request. Each server should echo the received request

OPTIONS - used to identify the capabilities of the server

CONNECT

Status codes

 * 2xx -- Success
 * 200 -- OK
 * 201 -- Created
 * 202 -- Accepted
 * 3xx -- Redirection
 * 301 -- Moved permanently
 * 302 -- Moved temporarily
 * 4xx -- Client error
 * 400 -- Bad request
 * 401 -- Unauthorised
 * 403 -- Forbidden
 * 404 -- Not found
 * 405 -- Method not allowed
 * 5xx -- Server error
 * 500 -- Internal server error
 * 501 -- Not implemented

Headers
Headers are important to HTTP, as they define some important characteristics of the connection and data sent or received.


 * Accept

Accept: text/plain


 * Accept-Charset

Accept-Charset: iso-8859-5


 * Accept-Encoding

Accept-Encoding: compress, gzip


 * Accept-Language

Accept-Language: da


 * Content-Encoding

Content-Encoding: gzip


 * Content-Language

Content-Language: da


 * Content-Length

Content-Length: 348


 * Content-Type

Content-Type: text/html; charset=utf-8


 * Host

Host: www.w3.org


 * If-Modified-Since

If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT


 * Last-Modified

Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT


 * Server

Server: Apache/1.3.27 (Unix) (Red-Hat/Linux)


 * User-Agent

User-Agent: Mozilla/5.0 (Linux; X11; UTF-8)

Links:


 * HTTP -- Wikipedia
 * List of headers -- Wikipedia
 * List of status codes -- Wikipedia
 * Mime type

Python's socket module

 * socket
 * socket.AF_UNIX
 * socket.AF_INET
 * socket.AF_INET6
 * socket.SOCK_STREAM
 * socket.SOCK_DGRAM
 * socket.socket
 * socket.bind
 * socket.listen
 * socket.accept
 * socket.close
 * socket.recv
 * socket.send
 * socket.sendall
 * socket.setsockopt

Miscellaneous - connecting to remote machines through digital certificates
During the following labs you could require to connect to remote machines in order to publish your web pages and projects. Because authentication requires you to enter each time a username and password bellow is an easier way which is base on digital certificates. After following the instructions bellow you will be able to connect from any Linux machine having the generated private key:


 * cd ~/.ssh
 * ssh-keygen -t rsa
 * choose no passphrase when asked and accept the default filename of id_rsa
 * scp id_rsa.pub @ :.ssh/authorized_keys
 * provide your password when asked and that’s the last time you’ll have to do it!

If you wish to connect to several remote machines you can reuse the created id_rsa.pub and copy it on each of them as indicated above.

Exercises

 * Create a simple chat application such that:
 * you have one client and one server
 * the client (human user) must send messages to the server (computer) which in turn will respond to them automatically:
 * For example:
 * client: hello
 * server: hi! what's your name?
 * client: John
 * server: nice to meet you John!
 * client: ...


 * Implement a simple HTTP client application which (bonus - for lab 3):
 * takes on the command line an URL as an argument
 * parses the given URL to obtain all the needed information, or uses the default values for the missing information
 * contacts the specified web server
 * requests the resource
 * interprets the received status line
 * prints the response body
 * handles the most common errors that can be encountered

IMPORTANT: It is forbidden to use an existing HTTP library or class; you should implement the HTTP protocol yourself. You may use the URL class for parsing the argument. HINT: Use a socket connection to retrieve data (client).

Alexandru Munteanu, 26-09-2021, alexandru.munteanu@e-uvt.ro