Multi-party conversations are known from the PSTN world. The function is often provided by a company's PBX. It is also possible to use a commercial service. A central part of the service is the Multipoint Conference Unit (MCU). In order to use the service, a session leader must make a reservation for the session at the MCU. Every MCU has a different interface to do so. MCUs in the IP-Telephony world usually offer a web based form for this.
At the time the conference is planned, each user calls the phone number of the MCU. After that, a number must be dialled to denote the session that the user wants to join, because an MCU can support multiple sessions at the same time. A password is required to prevent uninvited parties to join the conference. Some MCUs can initiate the set up of the conference itself by dialling all the parties. It needs to know everyone's phone number in advance, of course.
The main function of the MCU starts at this point in the conference: it receives the audio signals of every party in the session, mixes the sources and copies the result to everyone except for the source party. This happens in real time, so everyone will hear everyone. This way of conversating has its specific ways of interaction between the parties. If a party wants to speak, it should be clear that the previous party has ended its part of the conversation. When collisions occur, it is useful if the session leader gives the word to one of those who wish to speak. These aspects have been investigated in the social-cultural area, but are not part of this cookbook.
Now that the functionality is known in general, more details on the case of IP Telephony MCUs can be given. An MCU can be obtained either in hardware or in software. Many Gatekeepers are equipped with built-in MCU software functionality. In case of a hardware MCU, the main interface is, of course, an Ethernet connection. From a functional point of view, users can not approach the MCU directly over IP. No matter whether H.323 or SIP are used setting up regular two-party calls, a user can only dial in to the MCU through a Gatekeeper or SIP proxy. Parties that use a PSTN phone can also join a conference by means of the IP-to-PSTN gateway.
Modern MCUs can support both audio and video. The calling parties must support the audio and video codec that the MCU has on board. Some MCUs have the possibility to transcode between codecs, so enabling users with different codecs to join the same conference. If video is also distributed by an MCU, the video streams can not be mixed of course. One way this distribution mechanism is implemented is that only the video signal of the source with the loudest audio signal is transmitted to all users at that point of time. This in audio switching mode. There are other options, such as chair-controlled, in which the chair can lock the video (and possibly audio too) on one participant, or presentation mode, where one participant is chosen and both audio and video locked on the presenter and the rest of the audience can only listen. Some MCUs offer a Continuous Presence mode, in which video signals is displayed in a matrix that shows all users to every user. Modern MCUs support other layouts as well.
An alternative to using an MCU in the IP world is to use IP Multicast. In this case, all parties transmit their audio (and video) over a Multicast channel. All users must tune in to the channels of everyone else. This means that the total amount data traffic increases with every user that joins the conference. Unfortunately, few networks support Multicast.