4.22.1.2. Proxy behavior

The SIP proxy allows SIP signaling (accepting SIP messages on the TCP port 5060) and the dynamic RTP traffic through the firewall without compromising the security of the firewall and the defended network. Ports are dynamically opened through the firewall based on information received in the signaling traffic. The signaling part of the protocol is inspected on the application level for protocol conformance: SIP proxy enforces the standards, protecting the network from attacks violating the protocol. This is especially important since SIP clients and even servers are rarely designed with security in mind and many of them have issues from a security point of view. As an application level gateway, Zorp parses, checks, and rebuilds every passing signaling request and response. The actual (audio, video, etc.) communication is not inspected, it is forwarded through Zorp on the kernel level using stateful package filtering. These connections are handled as related UDP connections. Furthermore, it is possible to perform NATing and connection marking (see the description of the SIP proxy classes for details).

When packets arrive to the port the SIP proxy is listening on, basic access control is performed based on the source IP address of the packets. Each and every request and response is inspected on the application level (Layer 7 in the OSI model). The requests and responses - including protocol elements like headers - are parsed and strictly checked for conformance with the SIP standards. The SIP proxy understands and enforces the SIP protocol as described in RFC 3261. The syntax and length of the various protocol elements (e.g.: length of lines, headers, requests, etc.) is checked in order to repel various attack forms based on malformed messages, such as buffer overflow attacks. The relation of the arriving packets relative to other packets and previous communication information is also inspected. Packets not conforming to the logic and workflow of the protocol (e.g.: responses without requests, etc.) are rejected. This step is important because SIP uses random ports for transferring the actual communication data (the RTP stream, e.g.: voice, video), and otherwise it would be possible to open covert channels through the firewall between machines, not only the intended VoIP communication between the two SIP endpoints (i.e. the caller and the receiver).

The payload (SDP) part of the communication is parsed as well and modified if network address translation (NAT) is used. In this case, the addresses and dynamic ports used by the RTP traffic stream have to be modified accordingly. After all these sanity checks the policy settings of the firewall are consulted. Address, and media type filtering is performed (e.g.: to allow only voice traffic to/from specific addresses). Network address translation is also performed at this step if required.

Access control on the RTP stream part of the protocol is performed separately. This is important because RTP and signaling streams can have different access control settings. If SIP servers or a SIP proxy is used on some part of the network, the signaling and the RTP streams originate from different sources. (In such situation, the signaling is originating from the proxy, but the RTP stream arrives directly from the actual client. However, such a situation could also be used to initiate covert channels.)

The proxy supports the use of secondary sessions as described in Section 2.2, Secondary sessions.