=head1 NAME DXSpiderWeb Orthogonal Communications Protocol $Revision$ =head1 SYNOPSIS ,,,,,|,... =head1 ABSTRACT For many years DX Clusters have used a protocol which was designed for a non-looped tree of nodes. This has probably never, reliably, been achieved in practice; certainly not recently. This document describes a complete replacement for that protocol. It allows a fully looped network, is inherently extensible and should be simple to implement (especially in perl). All implementations of this protocol shall B use this protocol for inter-node communications. =head1 DESCRIPTION This protocol is encoded in UTF8 with HTTP style escaping. It is designed to be an extensible basis for any type of one to many "instant" line-based communications tasks. This protocol is designed to be flood routed in a meshed network in as efficient a manner as possible. Each message consists of a L and a L. The two sections are separated with the '|' character and the whole message is terminated in the standard RFC/Internet manner with the ascii characters. It follows that these characters (as well as a small number of other reserved characters) can only be sent escaped. This is described further in the L. Most of this document is concerned with the L, however some L which all implementation should issue and must accept are described. =head1 Routing Section The application that implements this protocol is essentially a line oriented message router. One line equals one message. Each line is effectively a datagram. It is assumed that nodes are connected to each other using a "reliable" streaming protocol such as TCP/IP or AX25. Having said that: in context, messages in this protocol could be multi/broadcast, either "as is" or wrapped in some other framing protocol. Because this is an unreliable, best effort, "please route my packets through your node" protocol, there is no guarantee that a message will get to the other side of a mesh of nodes. There may be a discontinuity either caused by outage or deliberate filtering. However, as it is envisaged that most messages will be flood routed or, in the case of directed messages (those that have L and/or L fields) down some/most/all interfaces showing a route for that direction, it is unlikely that messages will be lost in practice. =head2 Field Description Only the first three fields in the L are compulsory and indicate that this is a broadcast to be sent to all nodes coming from the L. If the message needs to be identified as coming from a user on a node, then the L field is added. Adding a L and/or L field will restrict the destinations or recipients that receive this message. The L field is incremented on receipt of a message on a node. Fields are separated by the comma ',' character with the last field required followed by the vertical bar '|' character. If trailing fields are missed out then superfluous commas can also be left out. If intervening fields are missing then no space needs to be left for the separating comma. The characters allowed in the routing section are restricted. Any invalid characters in any field will cause the whole message to be silently dropped. More detailed descriptions of the fields follow: =over =item B This is a compulsory field. It is the name of the originating node. The field can contain up to 12 characters in the set [-A-Z0-9_] in any order. Higher layers may restrict this further. The field must not be changed by any other node. =item B This is a compulsory field. It is a 10 hexadecimal digit string which consists of a day no (1-31), a flag to indicate NTP syncronisation in use, seconds within that day (0-86399) [total of 6 hex digits] that are concatenated with a sequence number (0-65535) [4 hex digits] making the total of 10 hexadecimal digits. The date portion is constructed as: my $date = ((((gmtime)[3] < 1) | $ntpflag) < 18) | (time % 86400); The sequence number is simply an unsigned short (or 16 bit) number starting at 0. Each message originated at this node will increment the sequence number. =item B This is a compulsory field. It is the number of hops from the originating node. It is incremented immediately on receipt and before determining its value. So the originating node sends a message with a L of 0, the neighbouring nodes must increment this field before passing it on to higher layers for onward processing. Implementations may have an upper limit to this field and may silently drop incoming messages with a L count greater than the limit. =item B This field is optional. It is the identifier of the originating user. If it is missing then the message is assumed to come from the originating node itself. It can consist of up to 12 characters in the set [-A-Z0-9_] in any order. Higher layers may restrict this further. =item B This field is optional. It is a string of up to 12 characters in the set [-A-Z0-9_] in any order. This field is used either to indicate particular node destination or to differentiate this broadcast in some way by making this message as a member of a L. Any message can be sent down any L. The names of Ls and their usage is entirely up to the implementor. It is assumed that node names can be differentiated from user names and L names. If the field is set to a particular node destination, it will be routed (rather than broadcast) to that node. However, any intervening nodes are free to duplicate the message and send it down more than one, likely looking, interface - depending on any network policies that may pertain. =item B This field is optional. It is a string of up to 12 characters in the set [-A-Z0-9_] in any order. Higher layers may restrict this further. Conventionally this field is used to indicate the user to whom this message is directed. In an ideal world the L field will be set, by the originating node, to the identifier of the node on which this user resides. If the L field is not set then this message will be broadcast. However, should a node become apparent (on route) then nodes are free to fill in the L field and proceed with a more directed approach. If it becomes apparent (on route) that there may be more than one possible L destination for a L then a node may duplicate the message (keeping the same L) and route it onwards. Because of the L inherent in the system, it is indeterminate as to which destination will receive the message. It is possible for all or just some destinations to receive the message. The tuple (L, L) will determine uniqueness. This field can, in the case where L is set to the name of a node, be set to a L. If this is the case then this will cause this message to be sent to a L on the L node only. =back =head2 Channel Channels are a concept very similar to that on IRC. It is a way of segregating data flows in a network. In principle, subject to local policy or application requirements, any data (or L) can be sent down any channel. It is up to the implementation whether to use this feature or not. =head2 Routing It is assumed that nodes will be connected in a looped network with more than one route available (in many cases) to another node. In anycase, most traffic is not directed, but broadcast to all users on all nodes. Each message is uniquely identified by the (L,L) tuple. The basic system will learn which interfaces can see what nodes by looking at the tuple and merging that with the L count. Each interface remembers the latest L with the lowest L for each L that arrives on that interface. It also remembers the number of messages for that L that has been received on that interface. Any message for onward broadcast is duplicated and sent out on all interfaces that it did not come in on. Any message that is directed to a particular node will be sent out on the "best" interface based on routing information gathered so far. If there is more than one possible route then, depending on network or local policy, the message may be duplicated and sent on other interfaces as well. =head2 DeDuplication On receipt of a message, its unique tuple (L,L) is checked against a hash table. If it exists: the message is silently dropped. If it does not exist in the hash table then the tuple is added. The hash table is periodically cleaned, removing tuples that have expired. The length of time a tuple remains in the hash table is implementation dependant but could easily be several days, if required. This mechanism only ensures that a message broadcast around the network travels the least distance and through the fewest nodes possible. It is up to higher layers to make sure that data carried is not, itself, duplicated! =head2 Examples # on link startup from GB7BAA (both sides hello) GB7TLH,3D02350001,0,GB7BAA|HELLO,Aranea,1.2,24.123 GB7BAA,3D02355421,1,GB7TLH|HELLO,Aranea,1.1,23.245 # on user startup to GB7TLH GB7TLH,3D042506F2,0,G1TLH|HELLO,PClient,1.3 # on user disconnection GB7TLH,3D9534F32D,0,G1TLH|BYE # a talk (actually 'text') message to a user (some distance away # from the origin node) GB7TLH,3D03450019,3,G1TLH,GB7BAA,G8TIC|T,Hiya Mike what's happening? # a talk/chat/text message to a channel or group GB7TLH,0413525F23,2,G1TLH,VHF|T,2m is opening on MS # a ping to find the whereabouts and distance of a user from a node # the hex number on the end is the ping ID GB7TLH,1512346543,0,,,G7BRN|PING,9F4D # the same from a user on GB7TLH GB7TLH,1512346543,0,G1TLH,,G7BRN|PING,23 # this effectively asks whether the user is on-line on a particular node GB7TLH,1512346543,0,G1TLH,GB7DJK,G7BRN|PING,35DE # A possible reply, same ID as ping followed by the no of hops on the # ping that was received GB7DJK,1512450534,3,G7BRN,GB7TLH,G1TLH|PONG,35DE,3 =head1 Command Section The L of the message contains the actual data being passed. It is called the Command Section because all commands are identified with a L which is implemented by the software using this protocol. The L is separated from its data by a comma ','. All fields in any subsequent data shall be separated by a comma ','. All fields shall be HTTP encoded such that reserved characters (comma ',', vertical bar '|', percent '%', equals '=' and non printable characters less than 127 (or %7F in hex) [including newline and carraige return] are tranlated to their two hex digit equivalent preceeded by the percent '%' character. For example: "%0D%0A" is "". "hello%2C there" is "hello, there" This is not standard CSV, fields are not quoted (delimited with either ' or "). All national characters above 127 are UTF8 encoded in the standard perl 5.8.x way. It follows that all (perl) programs that are written according to this specification must say: use UTF8; A message (or line) is terminated with 0x0d 0x0a. Incoming messages must be accepted even when terminated with just . Care must be taken to make sure that fields have any reserved characters encoded. In particular: it is perfectly permissible to have characters in a field - so long as they are escaped. Fields come in two styles: either simple fields (just containing data) or B=B pairs. Each pair must be separated from the next by a comma ','. The B must consist of the set of characters [a-z0-9_] (ie lowercase letters, digits and underscore), with a leading letter. The B must be HTTP encoded as specified above and can otherwise contain any character. There is no maximum size specified for a message. It is up to each implimentation to enforce one (if only for their own protection). =head2 Tag The L consists of string of uppercase letters and digits, starting with a leading, uppercase, letter. Tags should be as short as is meaningful. Valid tags would be: DX PC23 ANN Invalid tags include: 1AAA dx Ann =head2 Standard Commands There are a number of L which must be accepted by all implementations. =over =item B,,,, Command sent on connection to another node. Both sides send their information to the other. All the possible arguments are optional, although some of the arguments should be sent in order to help diagnose problems. This command is broadcast. =item B, Command sent to all connections when the software is shutting down. This is sent by the node just before shutdown occurs. This is really only used to help the network prune its routing tables. It isn't a requirement. The field is optional. =item B,, Command sent when a node has disconnected from this node. This message is sent when an interface shuts down. It need not be sent if a L from an interface for that node has just been received. This command should be broadcast. The is mandatory and is the name of the interface that has just disconnected. =item B, Command to send a ping to a node or user. This command is used both by the software and users to determine a) whether a node or user exists and b) how good the path is between them. =item B,, Command to reply to a successful ping =back =head1 AUTHOR Dirk Koopman, G1TLH, Edjk@tobit.co.ukE =head1 COPYRIGHT AND LICENSE Copyright 2004 by Dirk Koopman, G1TLH This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. $Revision$ =cut