=head1 NAME

DXSpiderWeb Orthogonal Communications Protocol

=head1 SYNOPSIS

 <Origin>,<TimeSeq>,<Hop>,<FrmUser>,<To>,<ToUser>|<Tag>,<Data>...

=head1 ABSTRACT

For many years DX Clusters have used a protocol which was designed 
for a non-looped tree of nodes. This has probably never, reliably, 
been achieved in practice; certainly not recently. This document 
describes a complete replacement for that protocol. It allows a
fully looped network, is inherently extensible and should be simple
to implement (especially in perl).

All implementations of this protocol shall B<only> use this protocol
for inter-node communications. 

=head1 DESCRIPTION

This protocol is encoded in UTF8 with HTTP style escaping. It is
designed to be an extensible basis for any type of one to many
"instant" line-based communications tasks.

This protocol is designed to be flood routed in a meshed network in
as efficient a manner as possible.

The protocol consists of a L<Routing Section> and a L<Command Section>. 
The two sections are separated with the '|' character. 

Most of this document is concerned with the L<Routing Section>, however
some L<Standard Commands> which all implementation should issue and
must accept are described.

=head2 Routing Section

The application that implements this protocol is essentially a line
oriented message router. One line equals one message. Each line is
effectively a datagram. 

It is assumed that nodes are connected to
each other using a "reliable" streaming protocol such as TCP/IP or
AX25. Having said that: in context, elements of the protocol could be 
multi or broadcast, either "as is" or wrapped in some other framing
protocol. 

Because this is an unreliable, best effort, "please route my packets
through your node" protocol, there is no guarantee that a message
will get to the other side of a mesh of nodes. There may be a
discontinuity either caused by outage or deliberate filtering. 

However, as it is envisaged that most messages will be flood routed or,
in the case of directed messages (those that have a E<lt>tonodeE<gt> or
E<lt>touserE<gt>) down all interfaces showing a route for that
direction, it is unlikely that messages will be lost in practice.

=head3 Field Description

Only the first three fields in the L<routing section> are compulsory
and indicate that this is a broadcast to be sent to all nodes coming
from the L<Origin>. If the message needs to be identified as coming
from a user on a node, then the L<FrmUser> field is added.

Adding a L<To> and/or L<ToUser> field will restrict the destinations
or recipients that receive this message. 

The L<Hop> field is incremented on receipt of a message on a node.

Fields are separated by the comma ',' character with the last field 
required followed by the vertical bar '|' character.

If trailing fields are missed out then superfluous commas can also
be left out. If intervening fields are missing then no space needs
to be left for the separating comma.  

The characters allowed in the routing section are restricted. Any 
invalid characters in any field will cause the whole message to be
silently dropped.

More detailed descriptions of the fields follow:

=over

=item Origin

This is a compulsory field. It is the name of the originating node.
The field can contain up to 12 characters in the set [-A-Z0-9_] in
any order. Higher layers may restrict this further.

The field must not be changed by any other node.

=item TimeSeq

This is a compulsory field. It is a 10 hexadecimal digit string which
consists of a day no (1-31), seconds within that day (0-86399) [6
hex digits] that are concatenated with a sequence number (0-65535)
[4 hex digits] making the total of 10.

The date portion is constructed as:

  my $date = ((gmtime)[3] << 18) | (time % 86400);

The sequence number is simply an unsigned short (or 16 bit) number
starting at 0. 

Each message originated at this node will increment the sequence
number.

=item Hop

This is a compulsory field. It is the number of hops from the 
originating node. It is incremented immediately on receipt and
before determining its value. 

So the originating node sends a message with a L<Hop> of 0, the
neighbouring nodes must increment this field before passing
it on to higher layers for onward processing.

Implementations may have an upper limit to this field and may
silently drop incoming messages with a L<Hop> count greater than the
limit.

=item FrmUser

This field is optional. It is the identifier of the originating
user.  If it is missing then the message is 
assumed to come from the originating node itself. 

It can consist of up to 12 characters in the set [-A-Z0-9_] 
in any order. Higher layers may restrict this further.

=item To

This field is optional. It is a string of up to 12 characters 
in the set [-A-Z0-9_] in any order. 

This field is used either to indicate particular node destination
or to differentiate this broadcast in some way by making this
message as a member of a L<Channel>. Any message can be sent
down any L<Channel>. The names of L<Channel>s and their usage
is entirely up to the implementor.  

It is assumed that node names can be differentiated from user
names and L<Channel> names.

If the field is set to a particular node destination, it will
be routed (rather than broadcast) to that node. However, any
intervening nodes are free to duplicate the message and send
it down more than one, likely looking, interface - depending on any
network policies that may pertain. 

=item ToUser

This field is optional. It is a string of up to 12 characters
in the set [-A-Z0-9_] in any order. Higher layers may restrict 
this further.

Conventionally this field is used to indicate the user to whom
this message is directed. In an ideal world the L<To> field
will be set, by the originating node, to the identifier of the node
on which this user resides. 

If the L<To> field is not set then this message will be 
broadcast. However, should a node become apparent (on route)
then nodes are free to fill in the L<To> field and proceed
with a more directed approach. 

If it becomes apparent (on route) that there may be more than
one possible L<To> destination for a L<ToUser> then a node
may duplicate the message (keeping the same L<TimeSeq>) and
route it onwards. Because of the L<deduplication> inherent in 
the system, it is indeterminate as to which destination will
receive the message. It is possible for all or just some 
destinations to receive the message. The tuple (L<Origin>,
L<TimeSeq>) will determine uniqueness. 

This field can, in the case where L<To>
is set to the name of a node, be set to a L<Channel>. If this
is the case then this will cause this message to be sent to
a L<Channel> on the L<To> node only.
 
=back 

=head3 Channel

Channels are a concept very similar to that on IRC. It is a 
way of segregating data flows in a network. In principle, subject
to local policy or application requirements, any data (or
L<Command Section>) can be sent down any channel.

It is up to the implementation whether to use this feature or not.  

=head3 Routing

It is assumed that nodes will be connected in a looped network with
more than one route available (in many cases) to another node.

In anycase, most traffic is not directed, but broadcast to all users
on all nodes.

Each message is uniquely identified by the (L<Origin>,L<TimeSeq>) 
tuple. The basic system will learn which interfaces can see what nodes
by looking at the tuple and merging that with the L<Hop> count. 
Each interface remembers the latest L<TimeSeq> with the lowest L<Hop>
for each L<Origin> that arrives on that interface. It also remembers
the number of messages for that L<Origin> that has been received on
that interface.

Any message for onward broadcast is duplicated and sent out on all
interfaces that it did not come in on. 

Any message that is directed to a particular node will be sent out on
the "best" interface based on routing information gathered so far. If there
is more than one possible route then, depending on network or local
policy, the message may be duplicated and sent on other interfaces
as well.
  
=head3 DeDuplication

On receipt of a message, its unique tuple (L<Origin>,L<TimeSeq>) is
checked against a hash table. If it exists: the message is silently
dropped. If it does not exist in the hash table then the tuple is
added.

The hash table is periodically cleaned, removing tuples that 
have expired. The length of time a tuple remains in the hash table
is implementation dependant but could easily be several days, if
required.

This mechanism only ensures that a message broadcast around the network
travels the least distance and through the fewest nodes possible. It
is up to higher layers to make sure that data carried is not, itself,
duplicated! 
 
=head2 Command Section

The Command Section of the message contains the actual data being
passed. It is called the Command Section because all commands
are identified with a L<Command Tag> which is implemented by 
the software using this protocol.

=head3 Command Tag

The Command Tag consists of string of uppercase letters and digits, starting
with a leading, uppercase, letter. Tags should be as short as is meaningful.

Valid tags would be:

 DX
 PC23
 ANN

Invalid tags include:

 1AAA
 dx
 Ann

There are a number of standard commands which must be accepted by 
all implementations.

=head1 AUTHOR

Dirk Koopman, G1TLH, E<lt>djk@tobit.co.ukE<gt>

=head1 COPYRIGHT AND LICENSE

Copyright 2004 by Dirk Koopman, G1TLH

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.

=cut