X-Git-Url: http://gb7djk.dxcluster.net/gitweb/gitweb.cgi?a=blobdiff_plain;f=techdoc%2Fprotocol.pod;h=426241a4efb68f25e54352c7633b7a038cf8a974;hb=refs%2Fheads%2Fnewdisc;hp=d429d044eace8e7c4b028c7e0e50389909db619b;hpb=3c74f791e2f4c3e534c837f3a7d5ca596c76a5ee;p=spider.git diff --git a/techdoc/protocol.pod b/techdoc/protocol.pod index d429d044..426241a4 100644 --- a/techdoc/protocol.pod +++ b/techdoc/protocol.pod @@ -1,85 +1,198 @@ +# -*- perl -*- =head1 NAME -DXSpiderWeb Orthogonal Communications Protocol $Revision$ +Aranea Orthogonal Communications Protocol + +$Revision$ =head1 SYNOPSIS - ,,,,,|,... + ,,,[,]|,... =head1 ABSTRACT For many years DX Clusters have used a protocol which was designed -for a non-looped tree of nodes. This has probably never, reliably, -been achieved in practice; certainly not recently. This document -describes a complete replacement for that protocol. It allows a +for a non-looped tree ofLs. This environment has probably never, reliably, +been achieved in practice; certainly not recently. + +There have always been loops, sometimes bringing the network to its +knees. In modern usage, both in order to get some resilience and also +to expedite information flow, we use internet based, deliberately +looped networks with filtering. Whilst this works, after a fashion, there +are all sorts of problems that the current PC protocol can never +address. + +This document +describes a complete replacement for the PC protocol. It allows a fully looped network, is inherently extensible and should be simple to implement (especially in perl). -All implementations of this protocol shall B use this protocol +All implementations shall use b this protocol for inter-node communications. =head1 DESCRIPTION -This protocol is encoded in UTF8 with HTTP style escaping. It is -designed to be an extensible basis for any type of one to many +This protocol is +designed to be an extensible basis for any type of one -> many "instant" line-based communications tasks. -This protocol is designed to be flood routed in a meshed network in -as efficient a manner as possible. +The protocol is designed to be flood routed in a meshed network in +as efficient a manner as possible. The reason we have chosen this +mechanism is that most L need to be broadcast to allLs. + +Experience has shown thatLs will appear and (more infrequently) +disappear without much (or any) notice. +Therefore, the constantly changing and uncoordinated +nature of the network doesn't lend itself to fixed routing policies. Therefore, +whilst metrics and routing tables (more like routing hint tables) will be +built up over time, an aggressive aging algorithm will also be employed to prevent +a lot of stale routing information being retained. + +Having said that: where routes have +been learned through past traffic, and this data is recent, then direct routing should be used. +Those L that could be routed (likely to be mainly single line one to +one "talk" L) that, anyway, +happen sufficiently infrequently that, should they need to be flood routed +(because no route has been learned yet), it is a small cost overall. + +=head1 Messages + +A message is a single line of UTF8 encoded and HTTP escaped text +terminated in the standard internet manner with a . Each message consists of a L and a L. -The two sections are separated with the '|' character and the whole -message is terminated in the standard RFC/Internet manner with the -ascii characters. It follows that these -characters (as well as a small number of other reserved characters) +The two sections are separated with the '|' character. +It follows that these +characters (as well as non-printable characters, , and +a small number of other reserved characters) can only be sent escaped. This is described further in the -L. +L and L. Most of this document is concerned with the L, however -some L which all implementation should issue and +some L which all implementations should issue and must accept are described. +=head1 Applications + +In the past messaging applications such as DX Cluster software have maintained +a fairly strict division between Ls and Ls. This protocol attempts +to get away from that by deliberately blurring (or, in some cases, removing) +any distinction between the two. + +Applications that use this protocol are essentially all peers and therefore the +only real difference between Ls and Ls is that a node has one or more +listeners running that will, +potentially, allow incoming connections from other Ls, Ls or Ls. These +routable entities are called Ls. + +Any application that is a sink and/or source of data; is capable of obeying +the protocol message construction rules and understands how to deduplicate incoming messages +correctly can operate as a routeable entity or L in this protocol. It is called an L. + +An L is called a L if it accepts connections from Ls and is +prepared to route messages on their behalf to other Ls or Ls. In addition it +may provide some other, usually simpler, interface (eg simple telnet access) for direct user access. Acting +in the protocol, on their behalf. + +The concept of an L has been invented because modern clients are +capable of being more intelligent than simple +character based connections such as telnet or ax25. They also wish to be able to +distinguish between the various classes of message, such as: DX spots, +announces, talk, logging info etc. It is a pain to have to do it, as now, +by trying to make sense of the (slightly different for each piece of node +software) human readable "user" version of the output. Far better to pass on +regular, specified, easily computer decodable versions of the message, +(i.e. in this protocol) and leave +the human presentation to the application. + +It also helps to modularise the various interfaces that may be implemented such +as the legacy, character based connections of existing PC protocol based nodes. +They should be treated +as local clients, in fact as Ls, B as peers in this protocol. It is likely that, in order +to do this, some extra Ls will need to be defined at application level. + +=head1 Definitions + +In this document we use a number of terms that need to be defined. + +=head2 Terminal + +A L is a routable entity, in other words: a callsign or service that can be routed +to, that lives at one or a few Ls. + +=head2 User + +A L is a connection to a L (that allows such connections) +that does not occur in protocol. All Ls shall be identified with a name +of up to 12 characters in the set [-0-9A-Z_]. All messages have to be routed via the +L to which this L is connected. + +=head2 Endpoint + +An L is a connection to a L that uses the protocol. From a routing point of +view, it is indistiguishable from a L. The L is responsible for creating and decoding +well formed protocol messages. An L does not route beyond the immediate L(s) to +which it is connected. It may also be a L connected to a L which provides some +addressable service (such as a database) that can be queried. + +=head2 Node + +A L is connected to other Ls. It is responsible for routing messages in protocol +from other Ls or Ls, whether directly connected or not. Optionally, a L +may provide other interfaces, such as direct L connections or legacy PC protocol speaking +DX Clusters. + +=head2 Channel + +A L is a L address that is not a L. It is (unless qualified by a L) +broadcast on all a Ls interfaces unless preventing by some filtering or other local policy on +that L. + +=head2 Service + +A L is application that either plugs into or connects as an L to a L. It is an +application that, in effect, is a database. In other words: queries are sent to the L and it sends +back a reply. + =head1 Routing Section The application that implements this protocol is essentially a line oriented message router. One line equals one message. Each line is effectively a datagram. -It is assumed that nodes are connected to +It is assumed thatLs are connected to each other using a "reliable" streaming protocol such as TCP/IP or -AX25. Having said that: in context, messages in this protocol could be +AX25. Having said that: in context, L in this protocol could be multi/broadcast, either "as is" or wrapped in some other framing protocol. -Because this is an unreliable, best effort, "please route my packets -through your node" protocol, there is no guarantee that a message -will get to the other side of a mesh of nodes. There may be a +Although the physical transport between Ls is reliable, the actual message +is unreliable, because this is an unreliable, best effort, "please route my packets +through your node" protocol. There is no guarantee that a message +will get to the other side of a mesh of Ls. There may be a discontinuity either caused by outage or deliberate filtering. -However, as it is envisaged that most messages will be flood routed or, -in the case of directed messages (those that have L and/or -L fields) down some/most/all interfaces showing a route for that -direction, it is unlikely that messages will be lost in practice. +However, as it is envisaged that most L will be flood routed or, +in the case of directed L (those that have a L containing a L of some kind) +down some/most/all interfaces showing a route for that +direction, it is unlikely that L will be lost in practice. -=head2 Field Description +Assuming that there is a path between all the Ls in a network, then it is guaranteed +that a message will be delivered everywhere, eventually. It is possible (indeed likely) that +copies of a message +will arrive at Ls more than once. Ls are responsible for deduplicating those messages +using the information in the L. -Only the first three fields in the L are compulsory -and indicate that this is a broadcast to be sent to all nodes coming -from the L. If the message needs to be identified as coming -from a user on a node, then the L field is added. +=head2 Field Description -Adding a L and/or L field will restrict the destinations -or recipients that receive this message. +All fields in the L are compulsory except the L field. If it is missing +so is the separating comma. The L field is incremented on receipt of a message on a node. Fields are separated by the comma ',' character with the last field required followed by the vertical bar '|' character. -If trailing fields are missed out then superfluous commas can also -be left out. If intervening fields are missing then no space needs -to be left for the separating comma. - The characters allowed in the routing section are restricted. Any invalid characters in any field will cause the whole message to be silently dropped. @@ -91,11 +204,42 @@ More detailed descriptions of the fields follow: =item B This is a compulsory field. It is the name of the originating node. -The field can contain up to 12 characters in the set [-A-Z0-9_] in +The field can contain up to 12 characters in the set [-A-Z0-9_/] in any order. Higher layers may restrict this further. The field must not be changed by any other node. +=item B + +This is the Group (or Channel) to be used for this data. It is compulsory. + +It is a string of up to 12 characters +in the set [-A-Z0-9_/] in any order. + +Optionally, for extra routing to +a particular L connected at a specific L, or even a +particular L in a L, +it may have another 12 character +string in the same set, concatenated with the first string. The two strings are separated by a ':' +character. For example: + + DX # the DX group + GB7DJK # the node GB7DJK + G1TLH # the user or endpoint G1TLH + GB7DJK:G1TLH # the user G1TLH at GB7DJK + DX:G1TLH # the user G1TLH in the DX group + +This field can contain either a L or some other string which is interpreted +as broadcastable group address. Any message that has a L that is not recognised as a L must +be broadcast. + +This means that messages to callsigns, for whom no specific routing information is available, +will be found by means of a broadcast. Hopefully this will cause some kind of activity o.b.o +that callsign will allow routing tables to be gathered that narrow down the scope of any future +message to that callsign through the network. + +Remember that not all Ls may pass every L field, depending on local policy. + =item B This is a compulsory field. It is a 10 hexadecimal digit string which @@ -107,7 +251,7 @@ that are concatenated with a sequence number (0-65535) The date portion is constructed as: - my $date = ((((gmtime)[3] < 1) | $ntpflag) < 18) | (time % 86400); + my $date = ((((gmtime)[3] << 1) | $ntpflag) << 18) | (time % 86400); The sequence number is simply an unsigned short (or 16 bit) number starting at 0. @@ -126,79 +270,17 @@ neighbouring nodes must increment this field before passing it on to higher layers for onward processing. Implementations may have an upper limit to this field and may -silently drop incoming messages with a L count greater than the +silently drop incoming L with a L count greater than the limit. -=item B - -This field is optional. It is the identifier of the originating -user. If it is missing then the message is -assumed to come from the originating node itself. - -It can consist of up to 12 characters in the set [-A-Z0-9_] -in any order. Higher layers may restrict this further. - -=item B - -This field is optional. It is a string of up to 12 characters -in the set [-A-Z0-9_] in any order. - -This field is used either to indicate particular node destination -or to differentiate this broadcast in some way by making this -message as a member of a L. Any message can be sent -down any L. The names of Ls and their usage -is entirely up to the implementor. - -It is assumed that node names can be differentiated from user -names and L names. - -If the field is set to a particular node destination, it will -be routed (rather than broadcast) to that node. However, any -intervening nodes are free to duplicate the message and send -it down more than one, likely looking, interface - depending on any -network policies that may pertain. - -=item B - -This field is optional. It is a string of up to 12 characters -in the set [-A-Z0-9_] in any order. Higher layers may restrict -this further. - -Conventionally this field is used to indicate the user to whom -this message is directed. In an ideal world the L field -will be set, by the originating node, to the identifier of the node -on which this user resides. - -If the L field is not set then this message will be -broadcast. However, should a node become apparent (on route) -then nodes are free to fill in the L field and proceed -with a more directed approach. - -If it becomes apparent (on route) that there may be more than -one possible L destination for a L then a node -may duplicate the message (keeping the same L) and -route it onwards. Because of the L inherent in -the system, it is indeterminate as to which destination will -receive the message. It is possible for all or just some -destinations to receive the message. The tuple (L, -L) will determine uniqueness. +=item B -This field can, in the case where L -is set to the name of a node, be set to a L. If this -is the case then this will cause this message to be sent to -a L on the L node only. +The L field is optional. When present, it represents a L at +the originating L. If it is missing then either it is not relevant or it +is assumed to be the L. =back -=head2 Channel - -Channels are a concept very similar to that on IRC. It is a -way of segregating data flows in a network. In principle, subject -to local policy or application requirements, any data (or -L) can be sent down any channel. - -It is up to the implementation whether to use this feature or not. - =head2 Routing It is assumed that nodes will be connected in a looped network with @@ -212,7 +294,7 @@ tuple. The basic system will learn which interfaces can see what nodes by looking at the tuple and merging that with the L count. Each interface remembers the latest L with the lowest L for each L that arrives on that interface. It also remembers -the number of messages for that L that has been received on +the number of L for that L that has been received on that interface. Any message for onward broadcast is duplicated and sent out on all @@ -244,45 +326,64 @@ duplicated! =head2 Examples # on link startup from GB7BAA (both sides hello) - GB7TLH,3D02350001,0,GB7BAA|HELLO,Aranea,bld=24.123 - GB7BAA,3D02355421,1,GB7TLH|HELLO,Aranea,bld=23.245 + GB7TLH,ROUTE,3D02350001,0|HELLO,Aranea,1.2,24.123 + GB7BAA,ROUTE,3D02355421,1|HELLO,Aranea,1.1,23.245 # on user startup to GB7TLH - GB7TLH,3D042506F2,0,G1TLH|HELLO,PClient,ver=1.03 + GB7TLH,ROUTE,3D042506F2,0,G1TLH|HELLO,PClient,1.3 # on user disconnection - GB7TLH,3D9534F32D,0,G1TLH|BYE + GB7TLH,ROUTE,3D9534F32D,0,G1TLH|BYE # a talk (actually 'text') message to a user (some distance away # from the origin node) - GB7TLH,3D03450019,3,G1TLH,GB7BAA,G8TIC|T,Hiya Mike what's happening? + GB7TLH,G8TIC,3D03450019,3,G1TLH|T,Hiya Mike whats happening? - # a talk/chat/text message to a channel or group - GB7TLH,0413525F23,2,G1TLH,VHF|T,2m is opening on MS + # a talk/chat/text message to a Group + GB7TLH,VHF,0413525F23,2,G1TLH|T,2m is opening on MS # a ping to find the whereabouts and distance of a user from a node # the hex number on the end is the ping ID - GB7TLH,1512346543,0,,,G7BRN|PING,9F4D - - # the same from a user on GB7TLH - GB7TLH,1512346543,0,G1TLH,,G7BRN|PING,23 + GB7TLH,G7BRN,1512346543,0,G1TLH|PING,9F4D # this effectively asks whether the user is on-line on a particular node - GB7TLH,1512346543,0,G1TLH,GB7DJK,G7BRN|PING,35DE + GB7TLH,GB7BAA:G7BRN,1512346543,0,G1TLH|PING,35DE # A possible reply, same ID as ping followed by the no of hops on the - # ping that was received - GB7DJK,1512450534,3,G7BRN,GB7TLH,G1TLH|PONG,35DE,3 + # ping that was received thus telling you how far away it is. + GB7BAA,G1TLH,1512450534,3,G7BRN|PONG,35DE,3 =head1 Command Section The L of the message contains the actual data being passed. It is called the Command Section because all commands -are identified with a L which is implemented by -the software using this protocol. +are identified with a L each of which is implemented by +the software using this protocol. Each (usually) is followed by one +or more L. + +=head2 Tag + +The L consists of string of uppercase letters and digits, starting +with a leading, uppercase, letter. Tags should be as short as is meaningful. + +Valid tags would be: + + DX + PC23 + ANN -The L is separated from its data by a comma ','. All fields +Invalid tags include: + + 1AAA + dx + Ann + +The L is separated from its data L by a comma ','. + +=head2 Fields + +All fields in any subsequent data shall be separated by a comma ','. All fields shall be HTTP encoded such that reserved characters (comma ',', @@ -290,7 +391,7 @@ vertical bar '|', percent '%', equals '=' and non printable characters less than 127 (or %7F in hex) -[including newline and carraige return] are tranlated to +[including newline and carraige return] are translated to their two hex digit equivalent preceeded by the percent '%' character. For example: @@ -308,7 +409,7 @@ are written according to this specification must say: use UTF8; A message (or line) is terminated with -0x0d 0x0a. Incoming messages must be accepted even when terminated +0x0d 0x0a. Incoming L must be accepted even when terminated with just . Care must be taken to make sure that fields have any reserved characters @@ -325,23 +426,6 @@ specified above and can otherwise contain any character. There is no maximum size specified for a message. It is up to each implimentation to enforce one (if only for their own protection). -=head2 Tag - -The L consists of string of uppercase letters and digits, starting -with a leading, uppercase, letter. Tags should be as short as is meaningful. - -Valid tags would be: - - DX - PC23 - ANN - -Invalid tags include: - - 1AAA - dx - Ann - =head2 Standard Commands There are a number of L which must be accepted by @@ -351,23 +435,59 @@ all implementations. =item B -Command sent on connection to another node. + HELLO,,,, + +Command sent on connection to another node. Both sides send their information +to the other. All the possible arguments are optional, although some of the +arguments should be sent in order to help diagnose problems. This command is +broadcast. -=item B +=item B -Command sent to voluntarily disconnect a connection. + BYE, + +Command sent to all connections when the software is shutting down. This is sent +by the node just before shutdown occurs. This is really only used to help the +network prune its routing tables. It isn't a requirement. The field +is optional. =item B -Command sent when a node has disconnected from this node. + DISC,, + +Command sent when a node has disconnected from this node. This message is sent when +an interface shuts down. It need not be sent if a L from an interface for +that node has just been received. This command should be broadcast. + +The is mandatory and is the name of the interface that has just +disconnected. =item B -Command to send a ping to a node or user. + PING,, + +Command to send a ping to a node or user. This command is used both by the software +and users to determine a) whether a node or user exists and b) how good the path is +between them. + +The is a unique string which is usually the hexadecimal equivalent of an +integer that is incremented every time it is used. But it can be anything that +will identify this ping using the tuple (L,) as unique. =item B -Command to reply to a successful ping + PONG,,, + +Command to reply to a ping. This is sent as a reply to an incoming ping command. +The is the one supplied and the is the number of +hops it took for the ping to arrive. + +=item B + + T, + +All implementations must be able to send "text" (encoded as specified in +L). There would be little point in doing all this otherwise! =back @@ -377,7 +497,7 @@ Dirk Koopman, G1TLH, Edjk@tobit.co.ukE =head1 COPYRIGHT AND LICENSE -Copyright 2004 by Dirk Koopman, G1TLH +Copyright 2004-2005 by Dirk Koopman, G1TLH This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.