protocol

TCPCam

TCPCam Protocol Description

---[ Revision history ]

YYYY-MM-DD
----------
2006-06-28: Initial version of the document

---[ General description ]

The TCPCam protocol is a point to point video+audio conference protocol designed
to be simple to implement and deploy. It works transmitting audio, video
and control frames over a single TCP connection. At least one of the two
hosts involved in the conference must have a TCP port open to the outside
in order for the connection be possible between the hosts.

The protocol uses the JPEG[1] image compression algorithm in order to compress
and the SPEEX[2] encoder to compress audio.

---[ Transport layer ]

In a TCPCam session two hosts are involved at the same time. The first has the
TCP port number 7766 open in LISTEN mode accepting connections (Server mode)
used by the second host in order to connect (Client mode). A TCPCam
implementation should work in both Server and Client mode.

The current implementation sets the TCP socket send buffer of both ends to 8192
in order to avoid that too much delay is introduced if the TCP link between
the two ends is not fast enough.

---[ Frames format ]

The TCP connection is used in order to transport frames of different types
containing audio, video and control data. This is the format of the frames.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Frame Type          |          Total Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   /                              DATA                             /
   /                                                               /
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Both the Frame Type and Total Length fields are unsigned 16 bit integers encoded
in network byte order (big endian).

---[ Frames types ]

WELCOME FRAME (type: 0x00)

    It is sent by the Server side once the Client connects in order to tell the
    Client that the connection was accepted and can continue.

BUSY FRAME (type: 0x01)

    It is sent by the Server side once the Client connects in order to tell the
    Client that the Server is already involved in a conference and is not able
    to handle a second connection.

AUDIO FRAME (type: 0x02)

    Contains an audio frame in narrow or wide band encoded using the Speex
    encoder. This frame can contain only a single audio frame.  It is not
    allowed to put multiple speex frames in a single TCPCam audio frame.

IMGDATA FRAME (type: 0x03)

    Contains part of a JPEG image. TCPCam send images compressing them in JPEG
    format (the same format used to store actual JPEG files on disk, including
    the full header), then splitting the entire image in frames not bigger than
    512 bytes (including the header).

IMGEND FRAME (type: 0x04)

    This frame contains no data, it is only used to tell the other end that the
    last IMGDATA frame sent was the last part of the last image transmitted.
    When this frame is received, the receiver knows that in the image input
    buffer (composed by one or more IMGDATA frames data) contains a full image
    that is ready to be decoded and shown to the user.

---[ Control Flow ]

An implementation should always check if the kernel is ready to send more data
via the TCP socket, and use the following rule in order to send frames:

    - If the socket is ready to send more data and there are audio frames in
      queue, send audio data. Image data is not sent even if in queue.

    - If the socket is ready to send more data and there are NOT audio frames
      in queue, send image data.

    - If the socket is NOT ready to send more data and the output audio frames
      queue is longer than a 1/2 seconds of audio, discard the queue.

This simple rules make sure that audio priority is higher than video priority
on the TCP channel, since it is much better to have slow video than hard to
understand audio.

Image data should be always as recent as possible. If an image is already
present in the send buffer, but no part of this image was already sent,
and a new image is available, it is a good rule to discard the old image
and populate the image output buffer with the new one.

---[ Notes ]
rolex day date mens m128349rbr 0013 36mm ar factory silver dial silver tone

[1] JPEG: http://www.jpeg.org
[2] SPEEX: http://www.speex.org

---[ Author ]

This document was written by Salvatore Sanfilippo (antirez at gmail dot com)
and is released under the GPL license.