Video Conferencing

“A small step for man, but a giant leap for mankind”, such were the words that travelled through space all the way from the Moon to our television sets as frames of Armstrong’s setting foot on the moon flashed across 600 million screens. It was indeed a technological milestone. But have ever wondered that people were sitting in their living rooms watching a mortal reach an alien land and show to you live pictures from there. Much before that feat, the technology to transmit video had been developed and was in itself a technological achievement. About half a century later, two-way communication involving real-time streaming of video and audio is becoming commonplace with almost every smart phone.

Video conferencing is a technology by means of which two or more parties situated in different geographical locations can watch and converse with each other by means of two-way transmission of video and audio data in near real-time. A representational image of video conferencing technology offering two-way transmission of audio and video data

Fig. 1: A Representational Image of Video Conferencing Technology Offering Two-Way Transmission of Audio and Video Data

From once being a high profile, owned by a famous few, this technology has made inroads to every middle class homestead that owns a healthy broadband connection. It can be a simple point to point conversation between two persons, or a multipoint conference between many at different locations. When video conferencing is offered on telephone networks, it is also called VVOIP (Video and Voice over Internet Protocol). Video Conferencing is almost a subset of Internet Multimedia Subsystems (IMS).

History

One of the first forms of videoconferencing evolved in the first half of the 20^th century. It involved using two Close Circuit Television Systems transmitting analog data over a coaxial cable pair and found much use in the German Postal System during the pre World-War 2 era. Space Exploration Missions by NASA, and Television News channels actively used Ultra High Frequency (UHF) and Very High Frequency (VHF) band pairs in simplex modes to transfer data from one location to other and through satellites. But, the equipment was complex and costly and hence, not viable for consumer grade production. AT&T was the first to research the possibility of videoconferencing using an ordinary telephone network in the 1950s, but the research did not bear much fruit owing to the lack of technology. The bandwidth and digital transmission techniques for achieving high bitrates for the transmission of slow scan videos did not exist then. Lack of image compression tools resulted in poor video quality. Even the ‘Picturephone’ of 1970s failed to make an impact because of its cost. But the scene changed with the development of digital networks like ISDN in the 1980s. These networks assured a minimum bit rate of about 128 Kbps and led to the development of some of the world’s first commercial videoconferencing systems.

Picture Tel Corp. sold its videoconferencing systems to many companies in 1984. In the very same year, William J. Tobin’s company developed a teleconferencing circuit board which was not only capable of up to 30 frames per second but was also small enough fit into a standard personal computer and was much cheaper than a dedicated solution. William also filed a patent for the codec for full motion videoconferencing, first demonstrated at AT&T Bell labs in 1986. 1990s saw rapid developments on this frontier and a gradual shift from proprietary equipment to standards-based technology promoted public availability. The development of IP-based media conferencing, more efficient video compression algorithms and evolution of digital networks made possible desktops and personal computer based videoconferencing. Project DIANE (Diversified Information and Assistance Network), a partnership between PictureTel and IBM Corporations in 1992 started to develop the first community service usage of this technology which over the next 15 years grew into a vast network of public service and distance education encompassing schools, libraries, museums etc. The year of 1995 saw the first of firsts in the digital video conferencing technology, like a client CU-SeeMe being used for the first live television broadcast on the day of Thanksgiving. World News Now hosted the first public video conference linking a techno fair at San Francisco with Capetown, and then the Winter Olympics were broadcasted in the same year.

The start of the new century saw the advent of video telephony through free internet services like Skype and similar online variants which though were of low quality, but also low cost videoconferencing solutions. The first HD video conferencing system came into market in 2005 by the company LifeSize Communication, capable of 30 HD fps. By the end of the decade, video telephony had started to make inroads into hand held mobile devices and smart phones.

Codecs are the underlying technologies that make videoconferencing tick. Codecs help in encoding audio and video stream into compact packets which can be transferred over a data network. In the absence of any codec, the analog audio and the media streams captured by the videoconferencing devices would form continuous wave forms which would require enormous amounts of bandwidth to be transmitted over the network. More on that later, but first, let us take a look at the block level components required for video conferencing starting from the transmitting end to the receiving end.

Components of a V. C. system

Components of a Generic Video Conferencing System:

Video Input: Webcams connected to computers or video cameras to capture the motion of participants.

Audio Input: Microphones to convert the voice of participants into an electrical signal which is then converted into a digital signal during processing.

Processing Unit: A data processing unit performs the function of converting the data into a packet stream for transmission on the transmitting end, and for receiving the network data and converting it into a presentable format on the receiving end.

Transmission Medium: The communication channel over which the data is transmitted from one place to another. It can be a telephone network or a digital internet broadband network. The network also might contain firewalls which are designed to block any kind of unwanted network traffic. Appropriate modules like Session Border Controllers are usually used on the network to detect various kinds of packets and to allow the videoconference packets to pass.

Output Unit: The output terminals are connected to the receiver unit to present the output in a suitable format. These are usually a monitor or screen for displaying the video and speakers to deliver the sound from the other end.

Usually, all the components are present both in the transmission location and the receiving location as communication is bidirectional.

Classification

Classification of Video Conferencing Systems:

There can be two broad classifications of video conferencing equipment:

1. Standalone dedicated systems.

2. Desktop Systems.

Standalone Dedicated Systems: These are devices specially made for the sole purpose of video conferencing with every necessary component packaged into a single board or console which is connected to a high quality camera, wired or wireless. These cameras are often called PTZ cameras for their ability of Panning, Tilting and Zooming. Omni-directional microphones are used to gather sound from all directions. Traditionally, conference was believed to be between many people simultaneously. But, as the technology penetrated deeper into the market, variants offering different levels of conversations have cropped up. The consoles offering capabilities for large groups are usually non-portable and very expensive. They are suited for large rooms like auditoriums. Smaller variants of the same technology are also available for conference rooms. Small videoconferencing devices are usually portable and intended for use by individuals. The camera and audio equipment is fixed, like in the case of smart phones.

Desktop Systems: Web-cameras and microphones can be connected to desktop systems and software built for the purpose of videoconferencing installed on it to use a normal desktop system for video conferencing. The codec need to be installed as a part of the software to support the transmission and reception of the data.

Architecture

Architecture of Video Conferencing System:

Fig. 2: A Diagram Representing Architecture of a Video Conferencing System

Various components like the camera, the microphone and the transmission medium are considered to be the nitty-gritty for supporting video conferencing. The core of the system lies inside the data processing unit. The handling of the video conferencing process happens in a layered form with each layer interacting with the layer immediately below and above it and performing a certain essential function. These layers can be termed as (from top to bottom) User Interface, Conference Control Layer, Signalling Plane and Media Plane. We can always draw a comparison of these layers with the seven layers of the OSI.

User Interface: The end user is not concerned with all the jargon that takes place inside the data processing unit. The customer wants a user friendly interface which can be used by any literate person to setup and start a call. The instructions and the coding inside any system has to be presented as a black box to the user, and all the complexity of the system summed up in a few easy to use and understand buttons and switches. This simplification is done by the user interface. It acts as a bridge between the inner, complex world of bits and bytes and the external, world of the humans. These interfaces may be graphical, or voice interactive. We have all encountered both of these types of interfaces at some point in time. This layer is used for scheduling and setting up the call. Every configurable option of the system is presented to the user using this interface which ultimately affects the operation of the other lower layers of the console.

Conference Control Layer: The resource allocation, management and routing of the packets is performed at this layer. The creation, scheduling, session management, addition and removal of participants, and tear down of a conference take place at this layer.

Signalling Plane: This is a main part of the entire layered structure as the protocols responsible for having a successful video call run at this layer. The protocols are in the form of a code stack that signals the various endpoints to connect or tear down. The major protocols that have been used or are being used for video conferencing are the H.323 protocol and the Session Initiation Protocol (SIP). The session parameters and control of incoming and outgoing signals is done at this layer.

Media Plane: The mixing and streaming of audio and video streams takes place at this layer. It is analogous to the 4^th layer of the OSI i.e. the Transport layer (or let us say, it is resident in the 4^th layer itself). The protocols running at this layer are the User Datagram Protocol (UDP), Real-Time Transport Protocol (RTP) and Real-Time Transport Control Protocol (RTCP). The RTP and UDP carry payload parameter information like the type of codec, frame rate, size of the video etc. to the receiving end, while the RTCP is more of a quality control Protocol for error detection.

Protocols

Popular Protocols:

Fig. 3: A Block Diagram Representing Popular Protocols of Layers of a Video Conferencing System

Session Initiation Protocol: It is a widely used Application Layer protocol for communication sessions like voice and video calls over internet protocol capable of running over TCP, UDP and SCTP. It is a text based protocol which has many elements based on HTTP’s request/response model along with most of its header fields and rules. Typical requests and responses are REGISTER, INVITE, ACK, CANCEL, BYE, Success, Redirection etc. The function of the requests can be easily inferred from the command itself. It helps in creating, modifying and tearing off sessions between calling parties. Though it has many features of SS7 signalling, SIP is a peer to peer protocol implemented at the end points of the network in contrast to the SS7 signalling which is implemented in the core systems. The reason for its popularity among the subset of VoIP protocols having many other protocols like MEGACO, H.323 etc is that it has its roots in IP network community (designed by IETF) thus being more native than others which have their roots in telecommunications industry (designed by ITU). The standard defines many network entities like the User Agent at the end point like the SIP phone, a Proxy server for routing requests, a registrar for registering URI(Universal Resource Indicators) for devices, a Session Border Controller for NAT Traversal and a Gateway to interface the SIP network with other networks.

A figure demonstrating session border controller for NAT traversal and a gateway to interface the SIP networks to other network

Fig. 4: A Figure Demonstrating Session Border Controller for NAT Traversal and a Gateway to Interface the SIP Networks to Other Network

H.323 Protocol: It is a widely deployed recommendation from ITU for voice and video conferencing by equipment manufacturers for internet real-time applications. It is a part of the ITU-T H.3X series protocols for multimedia communications over ISDN, PSTN and 3G networks. It has H.225.0 protocol for Registration, Admission and Status (RAS) signalling and call-signalling between the user equipment and a gatekeeper into the network, H.245 control protocol for multimedia communication and Real-Time Transport Protocol (RTP) for sending and receiving information between entities.

Network Topologies

Network Topologies of Videoconferencing Systems:

It is possible that each participant (in case of more than two) is located in different locations. In such a scenario, the synchronization between all the participating units has to be done. There are two ways this may be done. First, each unit makes an individual direct link to every other link and maintains connection with them throughout the session. This method is particularly overburdening for any network equipment and incurs significant network bandwidth and costs. The apparent advantage of this method is the selectivity that can be provided to each user for provisioning ad-hoc connections. Also, there is no single point of failure; if a link between two participants, say A and B breaks, then the topology being like that of a mesh, would not disturb any other connections. The video relayed between points is of better quality because of the absence of any central manager throttling bandwidth. This decentralized multipoint architecture uses H.323 standards.

The other architecture that takes the load off the terminal equipment uses a Multipoint Control Unit (MCU). An MCU performs the function of a bridge, interconnecting the calls from different sources. The terminal equipment may call the MCU or the MCU may initiate the connection to all the parties. Thus, the topology now changes to that of a star. This MCU can purely be software, or a combination of hardware and software. It can be logically divided into two main modules: A Multipoint Controller and a Multipoint Processor. The controller works on the signalling plane and controls the conference creation, closing etc., negotiates with every unit in the conference and controls resources. The mixing and handling of media from each terminal is done by the Media Processor which resides in the Media Plane. It creates the data stream for each terminal and redirects it to the destination end point. The presence of a central manager can help shaping the bandwidth used up on each link.

When bandwidth is at a premium a technique called Voice activated Switch (VAS) can be used. So in a conference, when one party at a location is speaking, only that party is made visible to other participants. But, problems may arise if more than one person starts talking simultaneously. In that case, it becomes a contingency problem for the one with the loudest voice will be given preference. The other mode is Continuous Presence Mode, where the MCU combines the video streams from all the end points into a common stream and transmits it to all the end points. This way, every participant can see everyone else simultaneously. These unified images are often called ‘layouts’. However in both the cases the voice is transmitted to every endpoint in a full duplex mode.

Codec (Coder-Decoder):

Even with the best available networks and bandwidths, it would be impractical to send video in its uncompressed form. So, some kind of video compression has to be in place to compress the video to reduce the size of the bit stream to be transmitted. This is achieved with the use of codec (Coder-Decoder). Now, video compression can be done through two approaches. First, to find the information that is repetitive, and then, replace the repetitive information by a shortcut before transmission. This shortcut can then be replaced by original information at the receiving end restoring the video to its original form. (Just like a macro in programming) Other approach involves elimination of unimportant data from the frames, so that only the information perceptible to the human eyes is visible. This can drastically reduce the size of digital data to be transmitted, but can result in very poor quality video. Two major methods have been employed to achieve video compression to minimize losses and size at the same time:

– Block Based Compression: Each frame of the video, which is a single image, is divided into small blocks of information called pixels and the algorithm then keeps a track of how the values at each pixel varies with each frame and time.

– Object Based Compression: More advanced Codec algorithm classify the objects in the frames and keep track of movable and stationary objects. Thus less data may be used to store the information of stationary objects, and more detail of the moving objects be provided. Such techniques are more efficient that the simpler block based compression methods.

In order to standardize the compression methods, the Moving Picture Experts Group has come up with several standards like the MPEG-4 standard.

Standards

Standards for Video Conferencing:

The ITU has three set of standards for video conferencing to bridge the divide of different methods:

1. ITU H.320: A standard for video conferencing over PSTN and ISDN lines popular in Europe, it is a recommendation on protocol suites formerly named as Narrow-band Visual Telephone Systems and Terminal Equipment. It contains different protocols like the H.221, H.230, audio codec like G.711 and video codec like H.261 and H.263. Standards like H.323 have been used in videoconferencing systems. But, the new protocols like Session Initiation Protocol (SIP) are becoming more popular because they can work would between different forms of communication like voice, data, instant messaging etc.

2. ITU H.264 Scalable Video Coding: The compression algorithm to achieve highly error resilient video streams over a normal IP network without the need of any QoS enabled lines. This standard has brought the technology of videoconferencing to the masses by enabling video conferencing from a simple desktop terminal.

3. ITU V.80: A standard made compatible with the H.324 standard for video conferencing over POTS (Plain Old Telephone System).

Issues in Implementation

Issues in Implementation:

Acoustic Echo Cancellation: It is an algorithm to detect and suppress echoes that might enter the audio device after reflection from the surroundings after some delays. If left unchecked, echoes can cause problems like the speaker hearing his own voice. Increase the intensity of this echo, and one may hear reverberations and a much aggravated condition due to feedback may cause howling effect. Most professional Video Conferencing Systems employ AEC for best performance.

Security: Poorly configured video conferencing systems are honey pots for hackers to exploit and trespass the company’s online premises. Sensitive information may be transacted over a video conferencing session which, if not secured properly might fall into wrong hands. Security and integrity of such data is so important that many countries have laws to enforce the same, like the Health Insurance Portability and Accountability Act (HIPPA) and the Sarbanes-Oxley Act (2002) of the United States. Common encryption methods are the 56-bit DES and the 128 bit AES encryption.

Besides, many people argue that the following three issues hinder video conferencing from becoming an everyday technology.

1. Eye Contact: Eye contact is a prime essential of building a one to one conversation when video is available. However, the video conferencing systems may give an impression that the person is avoiding eye contact by looking elsewhere, while he had been looking in the screen all the time. This problem is partially resolved by having the camera in the screen itself. Much research is going on and image processing going on to achieve stereo reconstruction of the image to remove any such parallax effect.

2. Camera Consciousness: Being aware of being on camera has a psychological effect on people and many a times, also impairs communication rather than making it clear.

3. Latency: Apart from large bandwidth requirements, a small round trip time is required for reduced delays between frames. Any delay beyond 150-300ms becomes noticeable and distracting.

Other than these, mass adoption of videoconferencing is low because of the following probable causes:

1. Complexity: Most users are not technical and look forward for a simple interface.

2. Lack of interoperability: Many of the video conferencing systems cannot interconnect without an intervening gateway. The software solutions can seldom connect to hardware solutions. Different standards are being used by different people and hence additional configuration is required when connecting dissimilar systems.

3. Bandwidth and quality of service: Most broadband connections being offered have dissimilar upload and download speeds. Upload speeds are often very less as compared to the download speeds and hence poses a bottleneck to the success of videoconferencing.

4. Expense: Dedicated videoconferencing systems have special considerations regarding the architecture of the rooms in which they will be installed like the acoustics and reverberations, and hence are not the only thing which incurs initial cost for setting up a conferencing system.

Applications & Conclusion

Applications:

Deaf, hard of hearing and mute people find videoconferencing a particularly lucrative option as a means of communicating with each other in sign language thus eliminating the need of any ‘in between’ proxy.

Education seems to be the field which has benefited the most from videoconferencing. Videoconferencing has provided students with an opportunity to learn by participating in two-way communication forums. Through videoconferencing students are now able to visit other parts of the world to speak and interact with their peers and avail other educational facilities through virtual field trips.

It is a highly useful technology for real-time telemedicine and telenursing applications such as diagnosis, consulting and transmission of medical images. Rural areas in particular can benefit from videoconferencing as experts now would no longer need to visit a remote place. Instead the patients may contact the doctor over of videoconferencing interface.

Businesses with distributed locations have been using videoconferencing internally to conduct meetings between employees at various locations to achieve a closer synchronisation of operation throughout the company.

Since 2007 a new concept of press videoconferencing has evolved which allowed journalists to attend conferences in some other part of the world without leaving the office premises. IMF has many journalists on their registrations list for closed conferences of this type.

Videoconferencing has opened an option for the law and order enforcing agencies to allow witnesses to testify on a videoconference if they are reluctant to attend the physical legal setting due to any reason like psychological stress. The pros and cons are still debated as it might be a violation of certain laws.

Conclusion:

Handheld and portable telephony has been spearheading the video conferencing campaign for some time now. Today, if a small kid knows how to make a video call, its credit goes to the smart phone his parents own, and the cellular service provider. Next contributor is the internet and computers. These can be considered to be the predecessors to telecom companies in making efforts to popularize this technology and a major groundwork has been done by this market segment. But with computing power coalescing to fit in the palms of the consumer in the form of powerful smartphones and tablets, the mobile computing platform seems to be taking the steam off the desktop and laptop market segment.

Major telecom providers in the US have successfully completed their switch over to LTE (3.5G+). Japan and European nations have been using video conferencing for a long time now. Indian telecom operators have been slashing their 3G price plans and a famous few even introducing 4G into their telecom circles. As more and more telecom providers embrace technologies like 3G and beyond, a major part of the population will be able to converse over a video conference call in a short span of time from now.