TCP/IP

 

Introduction to TCP/IP
Summary:  TCP and IP were developed by a Department of Defense (DOD) research  project to connect a number different networks designed by different  vendors into a network of networks (the "Internet"). It was initially  successful because it delivered a few basic services that everyone needs  (file transfer, electronic mail, remote logon) across a very large  number of client and server systems. Several computers in a small  department can use TCP/IP (along with other protocols) on a single LAN.  The IP component provides routing from the department to the enterprise  network, then to regional networks, and finally to the global Internet.  On the battlefield a communications network will sustain damage, so the  DOD designed TCP/IP to be robust and automatically recover from any node  or phone line failure. This design allows the construction of very  large networks with less central management. However, because of the  automatic recovery, network problems can go undiagnosed and uncorrected  for long periods of time. 
As with all other communications protocol, TCP/IP is composed of layers: 
- IP - is responsible for moving packet of data from node to node. IP forwards each packet based on a four byte destination address (the IP number). The Internet authorities assign ranges of numbers to different organizations. The organizations assign groups of their numbers to departments. IP operates on gateway machines that move data from department to organization to region and then around the world.
 - TCP - is responsible for verifying the correct delivery of data from client to server. Data can be lost in the intermediate network. TCP adds support to detect errors or lost data and to trigger retransmission until the data is correctly and completely received.
 - Sockets - is a name given to the package of subroutines that provide access to TCP/IP on most systems.
 
Network of Lowest Bidders
The  Army puts out a bid on a computer and DEC wins the bid. The Air Force  puts out a bid and IBM wins. The Navy bid is won by Unisys. Then the  President decides to invade Grenada and the armed forces discover that  their computers cannot talk to each other. The DOD must build a  "network" out of systems each of which, by law, was delivered by the  lowest bidder on a single contract. 
 The  Internet Protocol was developed to create a Network of Networks (the  "Internet"). Individual machines are first connected to a LAN (Ethernet  or Token Ring). TCP/IP shares the LAN with other uses (a Novell file  server, Windows for Workgroups peer systems). One device provides the  TCP/IP connection between the LAN and the rest of the world. 
To  insure that all types of systems from all vendors can communicate,  TCP/IP is absolutely standardized on the LAN. However, larger networks  based on long distances and phone lines are more volatile. In the US,  many large corporations would wish to reuse large internal networks  based on IBM's SNA. In Europe, the national phone companies  traditionally standardize on X.25. However, the sudden explosion of high  speed microprocessors, fiber optics, and digital phone systems has  created a burst of new options: ISDN, frame relay, FDDI, Asynchronous  Transfer Mode (ATM). New technologies arise and become obsolete within a  few years. With cable TV and phone companies competing to build the  National Information Superhighway, no single standard can govern  citywide, nationwide, or worldwide communications. 
The  original design of TCP/IP as a Network of Networks fits nicely within  the current technological uncertainty. TCP/IP data can be sent across a  LAN, or it can be carried within an internal corporate SNA network, or  it can piggyback on the cable TV service. Furthermore, machines  connected to any of these networks can communicate to any other network  through gateways supplied by the network vendor. 
Addresses
Each  technology has its own convention for transmitting messages between two  machines within the same network. On a LAN, messages are sent between  machines by supplying the six byte unique identifier (the "MAC"  address). In an SNA network, every machine has Logical Units with their  own network address. DECNET, Appletalk, and Novell IPX all have a scheme  for assigning numbers to each local network and to each workstation  attached to the network. 
On  top of these local or vendor specific network addresses, TCP/IP assigns  a unique number to every workstation in the world. This "IP number" is a  four byte value that, by convention, is expressed by converting each  byte into a decimal number (0 to 255) and separating the bytes with a  period. For example, the PC Lube and Tune server is 130.132.59.234. 
An  organization begins by sending electronic mail to  Hostmaster@INTERNIC.NET requesting assignment of a network number. It is  still possible for almost anyone to get assignment of a number for a  small "Class C" network in which the first three bytes identify the  network and the last byte identifies the individual computer. The author  followed this procedure and was assigned the numbers 192.35.91.* for a  network of computers at his house. Larger organizations can get a "Class  B" network where the first two bytes identify the network and the last  two bytes identify each of up to 64 thousand individual workstations.  Yale's Class B network is 130.132, so all computers with IP address  130.132.*.* are connected through Yale. 
The  organization then connects to the Internet through one of a dozen  regional or specialized network suppliers. The network vendor is given  the subscriber network number and adds it to the routing configuration  in its own machines and those of the other major network suppliers. 
There  is no mathematical formula that translates the numbers 192.35.91 or  130.132 into "Yale University" or "New Haven, CT." The machines that  manage large regional networks or the central Internet routers managed  by the National Science Foundation can only locate these networks by  looking each network number up in a table. There are potentially  thousands of Class B networks, and millions of Class C networks, but  computer memory costs are low, so the tables are reasonable. Customers  that connect to the Internet, even customers as large as IBM, do not  need to maintain any information on other networks. They send all  external data to the regional carrier to which they subscribe, and the  regional carrier maintains the tables and does the appropriate routing. 
New  Haven is in a border state, split 50-50 between the Yankees and the Red  Sox. In this spirit, Yale recently switched its connection from the  Middle Atlantic regional network to the New England carrier. When the  switch occurred, tables in the other regional areas and in the national  spine had to be updated, so that traffic for 130.132 was routed through  Boston instead of New Jersey. The large network carriers handle the  paperwork and can perform such a switch given sufficient notice. During a  conversion period, the university was connected to both networks so  that messages could arrive through either path. 
Subnets
Although the individual subscribers do not need to tabulate network numbers or provide explicit routing, it is convenien
t  for most Class B networks to be internally managed as a much smaller  and simpler version of the larger network organizations. It is common to  subdivide the two bytes available for internal assignment into a one  byte department number and a one byte workstation ID. 

The  enterprise network is built using commercially available TCP/IP router  boxes. Each router has small tables with 255 entries to translate the  one byte department number into selection of a destination Ethernet  connected to one of the routers. Messages to the PC Lube and Tune server  (130.132.59.234) are sent through the national and New England regional  networks based on the 130.132 part of the number. Arriving at Yale, the  59 department ID selects an Ethernet connector in the C& IS  building. The 234 selects a particular workstation on that LAN. The Yale  network must be updated as new Ethernets and departments are added, b
ut it is not effected by changes outside the university or the movement of machines within the department. 
A Uncertain Path
Every time a message arrives at an IP router, it makes an individual decisio
n  about where to send it next. There is concept of a session with a  preselected path for all traffic. Consider a company with facilities in  New York, Los Angeles, Chicago and Atlanta. It could build a network  from four phone lines forming a loop (NY to Chicago to LA to Atlanta to  NY). A message arriving at the NY router could go to LA via either  Chicago or Atlanta. The reply could come back the other way. 
How does the router make a decision between routes? There is no correct answer
.  Traffic could be routed by the "clockwise" algorithm (go NY to Atlanta,  LA to Chicago). The routers could alternate, sending one message to  Atlanta and the next to Chicago. More sophisticated routing measures  traffic patterns and sends data through the least busy link. 
If  one phone line in this network breaks down, traffic can still reach its  destination through a roundabout path. After losing the NY to Chicago  line, data can be sent NY to Atlanta to LA to Chicago. This provides  continued service though with degraded performance. This kind of  recovery is the primary design feature of IP. The loss of the line is  immediately detected by the routers in NY and Chicago, but somehow this  information must be sent to the other nodes. Otherwise, LA could  continue to send NY messages through Chicago, where t
hey  arrive at a "dead end." Each network adopts some Router Protocol which  periodically updates the routing tables throughout the network with  information about changes in route status. 
If  the size of the network grows, then the complexity of the routing  updates will increase as will the cost of transmitting them. Building a  single network that covers the entir
e  US would be unreasonably complicated. Fortunately, the Internet is  designed as a Network of Networks. This means that loops and redundancy  are built into each regional carrier. The regional network handles its  own problems and reroutes messages internally. Its Router Protocol  updates the tables in its own routers, but no routing updates need to  propagate from a regional carrier to the NSF spine or to the other  regions (unless, of course, a subscriber switches permanently from one  region to another). 
Undiagnosed Problems
IBM designs its SNA networks to be centrally managed. If any error occurs, it is rep
orted  to the network authorities. By design, any error is a problem that  should be corrected or repaired. IP networks, however, were designed to  be robust. In battlefield conditions, the loss of a node or line is a  normal circumstance. Casualties can be sorted out later on, but the  network must stay up. So IP networks are robust. They automatically (and  silently) reconfigure themselves when something goes wrong. If there is  enough redundancy built into the system, then communication is  maintained. 
In  1975 when SNA was designed, such redundancy would be prohibitively  expensive, or it might have been argued that only the Defense Department  could afford it. Today, however, simple routers cost no more than a PC.  However, the TCP/IP design that, "Errors are normal and can be largely  ignored," produces problems of its own. 
Data  traffic is frequently organized around "hubs," much like airline  traffic. One could imagine an IP router in Atlanta routing messages for  smaller cities throughout the Southeast. The problem is that data  arrives without a reservation. Airline companies experience the problem  around major events, like the Super Bowl. Just before the game, everyone  wants to fly into the city. After the game, everyone wants to fly out.  Imbalance occurs on the network wh
en  something new gets advertised. Adam Curry announced the server at  "mtv.com" and his regional carrier was swamped with traffic the next  day. The problem is that messages come in from the entire world over  high speed lines, but they go out to mtv.com over what was then a slow  speed phone line. 
Occasionally  a snow storm cancels flights and airports fill up with stranded  passengers. Many go off to hotels in town. When data arrives at a  congested router, there is no place to send the overflow. Excess packets  are simply discarded. It becomes the responsibility of the sender to  retry the data a few seconds later and to persist until it finally gets  through. This
recovery is provided by the TCP component of the Internet protocol. 
TCP  was designed to recover from node or line failures where the network  propagates routing table changes to all router nodes. Since the update  takes some time, TCP is slow to initiate recovery. The TCP algorithms  are not tuned to optimally handle packet loss due to traffic congestion.  Instead, the traditional Internet response to traffic problems has been  to increase the speed of lines and equipment in order to say ahead of  growth in demand. 
TCP  treats the data as a stream of bytes. It logically assigns a sequence  number to each byte. The TCP packet has a header that says, in effect,  "This packet starts with byte 379
642  and contains 200 bytes of data." The receiver can detect missing or  incorrectly sequenced packets. TCP acknowledges data that has been  received and retransmits data that has been lost. The TCP design means  that error recovery is done end-to-end between the Client and Server  machine. There is no formal standard for tracking problems in the middle  of the network, though each network has adopted some ad hoc tools. 
Need to Know
There are three levels of TCP/IP knowledge. Those who administer a regional o
r  national network must design a system of long distance phone lines,  dedicated routing devices, and very large configuration files. They must  know the IP numbers and physical locations of thousands of subscriber  networks. They must also have a formal network monitor strategy to  detect problems and respond quickly. 
Each  large company or university that subscribes to the Internet must have  an intermediate level of network organization and expertise. A half  dozen routers might be configured to connect several dozen departmental  LANs in several buildings. All traffic outside the organization would  typically be routed to a single connection to a regional network  provider. 
However,  the end user can install TCP/IP on a personal computer without any  knowledge of either the corporate or regional network. Three pieces of  information are required: 
- The IP address assigned to this personal computer
 - The part of the IP address (the subnet mask) that distinguishes other machines on the same LAN (messages can be sent to them directly) from machines in other depar tments or elsewhere in the world (which are sent to a router machine)
 - The IP address of the router machine that connects this LAN to the rest of the world.
 
In  the case of the PCLT server, the IP address is 130.132.59.234. Since  the first three bytes designate this department, a "subnet mask" is  defined as 255.255.255.0 (255 is the largest byte value and represents  the number with all bits turned on). It is a Yale convention (which we  recommend to everyone) that the router for each department have station  number 1 within the department network. Thus the PCLT router is  130.132.59.1. Thus the PCLT server is configured with the values: 
- My IP address: 130.132.59.234
 - Subnet mask: 255.255.255.0
 - Default router: 130.132.59.1
 
The  subnet mask tells the server that any other machine with an IP address  beginning 130.132.59.* is on the same department LAN, so messages are  sent to it directly. Any IP address beginning with a different value is  accessed indirectly by sending the message through the router at  130.132.59.1 (which is on the departmental LAN).

No comments:
Post a Comment