SIP: Session Initiation Protocol
The signaling protocol that sets up, manages, and tears down voice and video calls over IP networks. SIP handles the call control while RTP carries the actual media.
Type
Application Layer
Port
5060 (UDP/TCP) / 5061 (TLS)
Transport
UDP / TCP
Standard
RFC 3261
What is SIP?
SIP (Session Initiation Protocol) is a text-based signaling protocol used to set up, modify, and tear down communication sessions over IP networks. Defined in RFC 3261(published in 2002 by the IETF), SIP was designed as a simpler, more flexible alternative to the ITU's H.323 standard. It borrows heavily from HTTP's request-response model. SIP messages are plain text with headers, methods, and status codes that will look immediately familiar to anyone who has worked with web protocols.
SIP itself does not carry voice or video data. Its only job is signaling: finding the other party, negotiating media capabilities through SDP (Session Description Protocol), and managing the session lifecycle. The actual audio and video streams travel over RTP (Real-time Transport Protocol), which runs separately between the endpoints. This separation of signaling and media is one of SIP's core design principles, and it allows each layer to be optimized independently.
Today, SIP is the dominant VoIP signaling protocol worldwide. It powers every major VoIP platform, including Asterisk, FreeSWITCH, Twilio, Vonage, and Cisco Unified Communications Manager. SIP trunking has largely replaced legacy ISDN PRI and T1 lines for connecting enterprise phone systems to the PSTN. Microsoft Teams, Zoom, and most video conferencing platforms use SIP or SIP-derived signaling internally. Billions of VoIP calls are established through SIP every day, making it one of the most widely deployed application-layer protocols in existence.
How SIP Works: Call Setup and Teardown
A SIP call begins when the calling party (the User Agent Client, or UAC) sends an INVITE request. This INVITE contains an SDP body describing what media the caller can send and receive, including codec preferences, IP addresses, and port numbers. The INVITE travels through one or more SIP proxy servers, which use DNS lookups and their location service databases to route the request to the correct destination.
The callee's proxy server forwards the INVITE to the target device (the User Agent Server, or UAS). The UAS immediately sends back a 100 Trying response to stop retransmissions, followed by 180 Ringing once the phone starts ringing. When the user picks up, the UAS sends a 200 OK containing its own SDP answer, which completes the media negotiation. The caller then sends an ACK to confirm receipt, and the three-way INVITE handshake is complete.
At this point, RTP media flows directly between the two endpoints. The proxy servers are no longer in the media path. This peer-to-peer media flow reduces latency and keeps proxy servers from becoming bottlenecks. When either party wants to end the call, they send a BYE request. The other side responds with 200 OK, and both endpoints stop sending RTP packets. The entire session, from INVITE to BYE, is called a SIP dialog, identified by the Call-ID, From tag, and To tag.
SIP Architecture: The Trapezoid Model
SIP defines several logical entities that work together to route and manage calls. A User Agent Client (UAC) initiates requests, while a User Agent Server (UAS) receives and responds to them. In practice, every SIP phone or softphone acts as both UAC and UAS depending on whether it is making or receiving a call.
A SIP Proxy Server routes requests on behalf of user agents. Proxies examine the Request-URI, consult DNS and their location service, and forward the request toward the destination. A Registrar is a server that accepts REGISTER requests. When a SIP phone boots up, it sends a REGISTER message to its registrar with its current IP address and port. The registrar stores this binding in the Location Service database, which proxies query when routing calls. A Redirect Server responds to requests with a 3xx redirect, telling the UAC to try a different URI instead of forwarding the request itself.
The classic SIP architecture forms a trapezoid shape. Alice's phone connects to her outbound proxy. Bob's phone registers with his proxy. When Alice calls Bob, her INVITE goes up to her proxy, across to Bob's proxy (resolved via DNS SRV records), and down to Bob's phone. The signaling path follows the trapezoid, but the RTP media flows directly between Alice and Bob, bypassing the proxies entirely. This is why SIP scales well: proxies only handle lightweight text-based signaling, not bandwidth intensive media streams.
SIP Messages and Status Codes
SIP is a text-based protocol, similar to HTTP. Every SIP message is either a request (with a method name) or a response (with a numeric status code). Requests and responses share the same header format: a start line followed by headers, a blank line, and an optional body (usually SDP).
SIP Request Methods
| Method | Purpose |
|---|---|
INVITE | Initiates a new session or modifies an existing one (re-INVITE). |
ACK | Confirms receipt of a final response to an INVITE. |
BYE | Terminates an established session. |
CANCEL | Cancels a pending INVITE before the callee answers. |
REGISTER | Registers the user agent's current contact address with a registrar. |
OPTIONS | Queries a server's capabilities without establishing a session. |
INFO | Sends mid-session signaling information (e.g., DTMF digits). |
UPDATE | Modifies session parameters before the session is established. |
REFER | Asks the recipient to issue a request (used for call transfer). |
SUBSCRIBE | Requests notifications about an event (e.g., presence, voicemail). |
NOTIFY | Delivers event notifications to a subscriber. |
SIP Response Code Classes
| Code | Class | Meaning |
|---|---|---|
100 | Provisional | Trying. The proxy received the request and is working on it. |
180 | Provisional | Ringing. The callee's device is alerting the user. |
183 | Provisional | Session Progress. Used to send early media (ringback tone over RTP). |
200 | Success | OK. The request was successful. For INVITE, the callee answered. |
301 | Redirection | Moved Permanently. The user can be reached at a new URI. |
302 | Redirection | Moved Temporarily. Try an alternate URI for this request. |
401 | Client Error | Unauthorized. The request requires authentication credentials. |
403 | Client Error | Forbidden. The server understood the request but refuses to fulfill it. |
404 | Client Error | Not Found. The user does not exist at the specified domain. |
408 | Client Error | Request Timeout. The server could not produce a response in time. |
486 | Client Error | Busy Here. The callee's device is busy and cannot take the call. |
500 | Server Error | Internal Server Error. The server encountered an unexpected condition. |
503 | Server Error | Service Unavailable. The server is temporarily overloaded or in maintenance. |
603 | Global Failure | Decline. The callee explicitly rejected the call at all locations. |
SIP Request and Response Examples
Because SIP is text-based, you can read its messages directly, unlike binary protocols that require hex decoding. Below are realistic SIP INVITE and 200 OK messages with the key headers explained.
SIP INVITE Request
Alice at 192.168.1.100 initiates a call to Bob at example.com. The INVITE carries an SDP body (omitted here for clarity) describing Alice's media capabilities.
Request (UAC to Proxy)
INVITE sip:bob@example.com SIP/2.0
Via: SIP/2.0/UDP 192.168.1.100:5060;branch=z9hG4bK776
Max-Forwards: 70
To: Bob <sip:bob@example.com>
From: Alice <sip:alice@example.com>;tag=1928301774
Call-ID: a84b4c76e66710@192.168.1.100
CSeq: 314159 INVITE
Contact: <sip:alice@192.168.1.100>
Content-Type: application/sdp
Content-Length: 142Key Headers:
SIP 200 OK Response
Bob's phone answers the call and sends a 200 OK back through the proxy chain. The response includes Bob's SDP answer, completing the offer/answer negotiation.
Response (UAS to UAC via Proxy)
SIP/2.0 200 OK
Via: SIP/2.0/UDP 192.168.1.100:5060;branch=z9hG4bK776
To: Bob <sip:bob@example.com>;tag=a6c85cf
From: Alice <sip:alice@example.com>;tag=1928301774
Call-ID: a84b4c76e66710@192.168.1.100
CSeq: 314159 INVITE
Contact: <sip:bob@192.168.2.200>
Content-Type: application/sdp
Content-Length: 131Key Details:
SDP: Session Description Protocol
SDP (defined in RFC 4566) is not a protocol in the traditional sense. It is a text format for describing multimedia session parameters. SIP carries SDP inside the message body, using the Content-Type header set to application/sdp. SDP is how two SIP endpoints agree on which codecs to use, which IP addresses and ports to send media to, and whether the media flow should be sendrecv, sendonly, recvonly, or inactive.
The negotiation follows the offer/answer model defined in RFC 3264. The caller includes an SDP offer in the INVITE body, listing all the codecs and media types it supports. The callee responds with an SDP answer in the 200 OK, selecting a subset of the offered codecs. Once both sides have exchanged their SDP, they know exactly where to send RTP packets and which codec to use for encoding.
Key SDP Fields
v=0
o=alice 2890844526 2890844526 IN IP4 192.168.1.100
s=VoIP Call
c=IN IP4 192.168.1.100
t=0 0
m=audio 49170 RTP/AVP 0 8 101
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=sendrecvField Breakdown:
SIP vs H.323 Comparison
H.323 was the first major VoIP signaling standard, published by the ITU in 1996. SIP came later (RFC 2543 in 1999, then RFC 3261 in 2002) and eventually displaced H.323 in most new deployments. Here is how they compare.
| Feature | SIP | H.323 |
|---|---|---|
| Encoding | Text-based (like HTTP) | Binary (ASN.1 PER encoding) |
| Standards Body | IETF | ITU-T |
| Complexity | Lightweight, modular design | Monolithic, complex specification suite |
| Extensibility | Easy to extend with new headers and methods | Requires formal ITU standardization process |
| NAT Traversal | Challenging, solved with STUN/TURN/ICE | Challenging, solved with H.460 extensions |
| Adoption | Dominant in nearly all modern VoIP deployments | Legacy, declining. Still found in older video systems |
| Call Setup Speed | 1.5 round trips (typical INVITE transaction) | 6-7 round trips (multiple channel negotiations) |
| Codec Negotiation | SDP offer/answer model | H.245 channel negotiation protocol |
| Interoperability | Wide support across vendors and open source | Good within H.323 ecosystem, poor outside it |
Key Features of SIP
- Text-based and human-readable: SIP messages can be read and debugged with a simple packet capture. No special decoder needed.
- Transport-agnostic: SIP runs over UDP, TCP, TLS (SIPS), and even WebSocket for browser-based clients. UDP on port 5060 is the default, and TLS on port 5061 is used for encrypted signaling.
- URI-based addressing: SIP addresses look like email addresses (sip:user@domain), making them easy to provision and route using DNS.
- Forking:a proxy can fork an INVITE to multiple registered devices simultaneously. The first device to answer wins. This is how "ring all phones" works in office phone systems.
- Presence and instant messaging: SIP supports SUBSCRIBE/NOTIFY for presence status and the MESSAGE method for instant messaging (RFC 3428).
- SIP trunking: replaces physical ISDN PRI and T1 lines with IP-based connections to the PSTN, significantly reducing telephony costs.
- WebRTC interoperability: WebRTC uses SIP-like signaling concepts. SIP-to-WebRTC gateways (like Opalvoip or Opalvoip) allow browser-based clients to call standard SIP phones.
Common Use Cases for SIP
- Enterprise VoIP phone systems: Cisco Unified Communications Manager, Avaya Aura, and open-source PBXes like Asterisk and FreeSWITCH all use SIP as their primary signaling protocol.
- SIP trunking: replacing ISDN PRI lines with IP-based connections from providers like Twilio, Vonage, Bandwidth, and Lumen. A single SIP trunk can carry hundreds of concurrent calls.
- Video conferencing: many video conferencing platforms use SIP for room system interoperability. Opalvoip, Cisco, and Poly endpoints all speak SIP natively.
- Unified communications: Microsoft Teams uses SIP internally for PSTN connectivity through Direct Routing. Cisco Webex and Zoom also use SIP for gateway integration.
- Contact center solutions: SIP enables intelligent call routing, IVR integration, and agent endpoint management in platforms like Genesys, Five9, and Amazon Connect.
- WebRTC signaling: while WebRTC uses its own APIs in the browser, the backend signaling infrastructure often translates to and from SIP for interoperability with the existing telephone network.
- IoT and M2M communication: lightweight SIP stacks are used in embedded devices for push-to-talk, intercom systems, and industrial voice communication.
Frequently Asked Questions About SIP
What is the difference between SIP and RTP?
SIP and RTP serve completely different purposes. SIP is a signaling protocol that establishes, modifies, and terminates sessions. It handles the "phone call control" part: finding the other party, negotiating codecs, and hanging up. RTP (Real-time Transport Protocol) carries the actual audio and video media after SIP has set up the call. Think of SIP as the operator who connects your call, and RTP as the phone line that carries your voice.
What port does SIP use?
SIP uses port 5060 for unencrypted signaling over UDP or TCP. For encrypted signaling (SIPS), it uses port 5061 over TLS. These are the IANA-registered defaults. Some deployments use non-standard ports, but 5060 and 5061 are the universal standard that firewalls and SBCs (Session Border Controllers) expect.
Is SIP secure?
Plain SIP over UDP or TCP provides no encryption. The signaling messages, including caller identity and call metadata, are transmitted in clear text. To secure SIP signaling, you use SIPS (SIP over TLS on port 5061). To encrypt the media stream, you use SRTP (Secure RTP) instead of plain RTP. Most enterprise deployments and SIP trunking providers support TLS and SRTP. SIP also supports digest authentication (similar to HTTP Digest) for verifying user identity during REGISTER and INVITE transactions.
What is SIP trunking?
SIP trunking replaces traditional analog or ISDN phone lines with an IP-based connection between your PBX and a telephony service provider. Instead of physical T1 or PRI circuits, your PBX sends SIP INVITE messages over the internet (or a private MPLS connection) to the provider, who then routes the call to the PSTN. SIP trunking is significantly cheaper than legacy lines, supports flexible capacity scaling, and allows geographic number portability. Providers like Twilio, Bandwidth, and Lumen offer SIP trunking services.
What is the difference between SIP and VoIP?
VoIP (Voice over IP) is a broad term for any technology that transmits voice over IP networks. SIP is one specific protocol used to set up VoIP calls. VoIP also involves RTP for media transport, codecs like G.711 and Opus for audio encoding, and potentially other signaling protocols like H.323 or proprietary alternatives. SIP is the most widely used VoIP signaling protocol, but VoIP existed before SIP and can technically work without it.
Can SIP handle video calls?
Yes. SIP is media-agnostic. The SDP body inside SIP messages can describe audio, video, or any other media type. For video, the SDP includes an m=video line with video codecs like H.264 or VP8. The video call setup process is identical to audio: SDP offer/answer negotiation inside the INVITE/200 OK exchange. Cisco, Poly, and other video endpoint vendors rely on SIP for establishing video conferences between room systems.
Related Protocols
- UDP: the default transport for SIP signaling on port 5060. Preferred for its low latency and minimal overhead.
- TCP: used for SIP when messages exceed the UDP MTU or when reliable delivery is required.
- TLS: provides encryption for SIP signaling (SIPS) on port 5061.
- HTTP: SIP's design was modeled after HTTP, sharing request-response semantics and text-based header formatting.