SIP: Session Initiation Protocol

The signaling protocol that sets up, manages, and tears down voice and video calls over IP networks. SIP handles the call control while RTP carries the actual media.

Type

Application Layer

Port

5060 (UDP/TCP) / 5061 (TLS)

Transport

UDP / TCP

Standard

RFC 3261

What is SIP?

SIP (Session Initiation Protocol) is a text-based signaling protocol used to set up, modify, and tear down communication sessions over IP networks. Defined in RFC 3261(published in 2002 by the IETF), SIP was designed as a simpler, more flexible alternative to the ITU's H.323 standard. It borrows heavily from HTTP's request-response model. SIP messages are plain text with headers, methods, and status codes that will look immediately familiar to anyone who has worked with web protocols.

SIP itself does not carry voice or video data. Its only job is signaling: finding the other party, negotiating media capabilities through SDP (Session Description Protocol), and managing the session lifecycle. The actual audio and video streams travel over RTP (Real-time Transport Protocol), which runs separately between the endpoints. This separation of signaling and media is one of SIP's core design principles, and it allows each layer to be optimized independently.

Today, SIP is the dominant VoIP signaling protocol worldwide. It powers every major VoIP platform, including Asterisk, FreeSWITCH, Twilio, Vonage, and Cisco Unified Communications Manager. SIP trunking has largely replaced legacy ISDN PRI and T1 lines for connecting enterprise phone systems to the PSTN. Microsoft Teams, Zoom, and most video conferencing platforms use SIP or SIP-derived signaling internally. Billions of VoIP calls are established through SIP every day, making it one of the most widely deployed application-layer protocols in existence.

How SIP Works: Call Setup and Teardown

A SIP call begins when the calling party (the User Agent Client, or UAC) sends an INVITE request. This INVITE contains an SDP body describing what media the caller can send and receive, including codec preferences, IP addresses, and port numbers. The INVITE travels through one or more SIP proxy servers, which use DNS lookups and their location service databases to route the request to the correct destination.

The callee's proxy server forwards the INVITE to the target device (the User Agent Server, or UAS). The UAS immediately sends back a 100 Trying response to stop retransmissions, followed by 180 Ringing once the phone starts ringing. When the user picks up, the UAS sends a 200 OK containing its own SDP answer, which completes the media negotiation. The caller then sends an ACK to confirm receipt, and the three-way INVITE handshake is complete.

At this point, RTP media flows directly between the two endpoints. The proxy servers are no longer in the media path. This peer-to-peer media flow reduces latency and keeps proxy servers from becoming bottlenecks. When either party wants to end the call, they send a BYE request. The other side responds with 200 OK, and both endpoints stop sending RTP packets. The entire session, from INVITE to BYE, is called a SIP dialog, identified by the Call-ID, From tag, and To tag.

SIP Call Setup and TeardownCaller (UAC)SIP ProxyCallee (UAS)INVITEINVITE100 Trying180 Ringing180 Ringing200 OK200 OKACKRTP Media (Audio/Video)BYE200 OKSETUPMEDIATEARDOWNjustprotocols.com
SIP call flow: INVITE initiates the session, provisional responses (100 Trying, 180 Ringing) provide progress, 200 OK confirms the call, ACK completes the handshake, RTP carries the media, and BYE tears down the session.

SIP Architecture: The Trapezoid Model

SIP defines several logical entities that work together to route and manage calls. A User Agent Client (UAC) initiates requests, while a User Agent Server (UAS) receives and responds to them. In practice, every SIP phone or softphone acts as both UAC and UAS depending on whether it is making or receiving a call.

A SIP Proxy Server routes requests on behalf of user agents. Proxies examine the Request-URI, consult DNS and their location service, and forward the request toward the destination. A Registrar is a server that accepts REGISTER requests. When a SIP phone boots up, it sends a REGISTER message to its registrar with its current IP address and port. The registrar stores this binding in the Location Service database, which proxies query when routing calls. A Redirect Server responds to requests with a 3xx redirect, telling the UAC to try a different URI instead of forwarding the request itself.

The classic SIP architecture forms a trapezoid shape. Alice's phone connects to her outbound proxy. Bob's phone registers with his proxy. When Alice calls Bob, her INVITE goes up to her proxy, across to Bob's proxy (resolved via DNS SRV records), and down to Bob's phone. The signaling path follows the trapezoid, but the RTP media flows directly between Alice and Bob, bypassing the proxies entirely. This is why SIP scales well: proxies only handle lightweight text-based signaling, not bandwidth intensive media streams.

SIP Trapezoid ArchitectureProxy Server Aatlanta.example.comProxy Server Bbiloxi.example.comAlice (UAC)sip:alice@atlanta.example.comBob (UAS)sip:bob@biloxi.example.comDNSSIP Signaling PathDirect RTP MediaSIP SignalingRTP Media (peer-to-peer)DNS Lookupjustprotocols.com
The SIP trapezoid: signaling messages route through proxy servers for address resolution and policy, while RTP media flows directly between endpoints for lowest latency.

SIP Messages and Status Codes

SIP is a text-based protocol, similar to HTTP. Every SIP message is either a request (with a method name) or a response (with a numeric status code). Requests and responses share the same header format: a start line followed by headers, a blank line, and an optional body (usually SDP).

SIP Request Methods

MethodPurpose
INVITEInitiates a new session or modifies an existing one (re-INVITE).
ACKConfirms receipt of a final response to an INVITE.
BYETerminates an established session.
CANCELCancels a pending INVITE before the callee answers.
REGISTERRegisters the user agent's current contact address with a registrar.
OPTIONSQueries a server's capabilities without establishing a session.
INFOSends mid-session signaling information (e.g., DTMF digits).
UPDATEModifies session parameters before the session is established.
REFERAsks the recipient to issue a request (used for call transfer).
SUBSCRIBERequests notifications about an event (e.g., presence, voicemail).
NOTIFYDelivers event notifications to a subscriber.

SIP Response Code Classes

CodeClassMeaning
100ProvisionalTrying. The proxy received the request and is working on it.
180ProvisionalRinging. The callee's device is alerting the user.
183ProvisionalSession Progress. Used to send early media (ringback tone over RTP).
200SuccessOK. The request was successful. For INVITE, the callee answered.
301RedirectionMoved Permanently. The user can be reached at a new URI.
302RedirectionMoved Temporarily. Try an alternate URI for this request.
401Client ErrorUnauthorized. The request requires authentication credentials.
403Client ErrorForbidden. The server understood the request but refuses to fulfill it.
404Client ErrorNot Found. The user does not exist at the specified domain.
408Client ErrorRequest Timeout. The server could not produce a response in time.
486Client ErrorBusy Here. The callee's device is busy and cannot take the call.
500Server ErrorInternal Server Error. The server encountered an unexpected condition.
503Server ErrorService Unavailable. The server is temporarily overloaded or in maintenance.
603Global FailureDecline. The callee explicitly rejected the call at all locations.

SIP Request and Response Examples

Because SIP is text-based, you can read its messages directly, unlike binary protocols that require hex decoding. Below are realistic SIP INVITE and 200 OK messages with the key headers explained.

SIP INVITE Request

Alice at 192.168.1.100 initiates a call to Bob at example.com. The INVITE carries an SDP body (omitted here for clarity) describing Alice's media capabilities.

Request (UAC to Proxy)

INVITE sip:bob@example.com SIP/2.0
Via: SIP/2.0/UDP 192.168.1.100:5060;branch=z9hG4bK776
Max-Forwards: 70
To: Bob <sip:bob@example.com>
From: Alice <sip:alice@example.com>;tag=1928301774
Call-ID: a84b4c76e66710@192.168.1.100
CSeq: 314159 INVITE
Contact: <sip:alice@192.168.1.100>
Content-Type: application/sdp
Content-Length: 142

Key Headers:

Via = Transport, sender IP, and branch ID for routing responses backMax-Forwards = Hop limit (like TTL), decremented by each proxyTo / From = Logical identities of the called and calling partiesCall-ID = Globally unique identifier for this call dialogCSeq = Sequence number and method, used to order transactionsContact = Direct URI where the UAC can be reached for subsequent requests

SIP 200 OK Response

Bob's phone answers the call and sends a 200 OK back through the proxy chain. The response includes Bob's SDP answer, completing the offer/answer negotiation.

Response (UAS to UAC via Proxy)

SIP/2.0 200 OK
Via: SIP/2.0/UDP 192.168.1.100:5060;branch=z9hG4bK776
To: Bob <sip:bob@example.com>;tag=a6c85cf
From: Alice <sip:alice@example.com>;tag=1928301774
Call-ID: a84b4c76e66710@192.168.1.100
CSeq: 314159 INVITE
Contact: <sip:bob@192.168.2.200>
Content-Type: application/sdp
Content-Length: 131

Key Details:

To tag = Added by the UAS. Together with From tag and Call-ID, it uniquely identifies the dialogVia = Copied from the request so the response follows the same path backContact = Bob's direct address, used for subsequent in-dialog requests (BYE, re-INVITE)CSeq = Matches the original INVITE, confirming which request this responds to

SDP: Session Description Protocol

SDP (defined in RFC 4566) is not a protocol in the traditional sense. It is a text format for describing multimedia session parameters. SIP carries SDP inside the message body, using the Content-Type header set to application/sdp. SDP is how two SIP endpoints agree on which codecs to use, which IP addresses and ports to send media to, and whether the media flow should be sendrecv, sendonly, recvonly, or inactive.

The negotiation follows the offer/answer model defined in RFC 3264. The caller includes an SDP offer in the INVITE body, listing all the codecs and media types it supports. The callee responds with an SDP answer in the 200 OK, selecting a subset of the offered codecs. Once both sides have exchanged their SDP, they know exactly where to send RTP packets and which codec to use for encoding.

Key SDP Fields

v=0
o=alice 2890844526 2890844526 IN IP4 192.168.1.100
s=VoIP Call
c=IN IP4 192.168.1.100
t=0 0
m=audio 49170 RTP/AVP 0 8 101
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=sendrecv

Field Breakdown:

v=0 = SDP version (always 0)o= = Origin: username, session ID, version, network type, addresss= = Session name (human-readable label)c= = Connection data: the IP address where the endpoint expects to receive mediat=0 0 = Timing: 0 0 means the session is unbounded (no scheduled start/stop)m=audio 49170 RTP/AVP 0 8 101 = Media line: audio on port 49170 using RTP, offering codecs 0, 8, and 101a=rtpmap:0 PCMU/8000 = Codec 0 is G.711 mu-law at 8kHza=sendrecv = The endpoint will both send and receive media

SIP vs H.323 Comparison

H.323 was the first major VoIP signaling standard, published by the ITU in 1996. SIP came later (RFC 2543 in 1999, then RFC 3261 in 2002) and eventually displaced H.323 in most new deployments. Here is how they compare.

FeatureSIPH.323
EncodingText-based (like HTTP)Binary (ASN.1 PER encoding)
Standards BodyIETFITU-T
ComplexityLightweight, modular designMonolithic, complex specification suite
ExtensibilityEasy to extend with new headers and methodsRequires formal ITU standardization process
NAT TraversalChallenging, solved with STUN/TURN/ICEChallenging, solved with H.460 extensions
AdoptionDominant in nearly all modern VoIP deploymentsLegacy, declining. Still found in older video systems
Call Setup Speed1.5 round trips (typical INVITE transaction)6-7 round trips (multiple channel negotiations)
Codec NegotiationSDP offer/answer modelH.245 channel negotiation protocol
InteroperabilityWide support across vendors and open sourceGood within H.323 ecosystem, poor outside it

Key Features of SIP

  • Text-based and human-readable: SIP messages can be read and debugged with a simple packet capture. No special decoder needed.
  • Transport-agnostic: SIP runs over UDP, TCP, TLS (SIPS), and even WebSocket for browser-based clients. UDP on port 5060 is the default, and TLS on port 5061 is used for encrypted signaling.
  • URI-based addressing: SIP addresses look like email addresses (sip:user@domain), making them easy to provision and route using DNS.
  • Forking:a proxy can fork an INVITE to multiple registered devices simultaneously. The first device to answer wins. This is how "ring all phones" works in office phone systems.
  • Presence and instant messaging: SIP supports SUBSCRIBE/NOTIFY for presence status and the MESSAGE method for instant messaging (RFC 3428).
  • SIP trunking: replaces physical ISDN PRI and T1 lines with IP-based connections to the PSTN, significantly reducing telephony costs.
  • WebRTC interoperability: WebRTC uses SIP-like signaling concepts. SIP-to-WebRTC gateways (like Opalvoip or Opalvoip) allow browser-based clients to call standard SIP phones.

Common Use Cases for SIP

  • Enterprise VoIP phone systems: Cisco Unified Communications Manager, Avaya Aura, and open-source PBXes like Asterisk and FreeSWITCH all use SIP as their primary signaling protocol.
  • SIP trunking: replacing ISDN PRI lines with IP-based connections from providers like Twilio, Vonage, Bandwidth, and Lumen. A single SIP trunk can carry hundreds of concurrent calls.
  • Video conferencing: many video conferencing platforms use SIP for room system interoperability. Opalvoip, Cisco, and Poly endpoints all speak SIP natively.
  • Unified communications: Microsoft Teams uses SIP internally for PSTN connectivity through Direct Routing. Cisco Webex and Zoom also use SIP for gateway integration.
  • Contact center solutions: SIP enables intelligent call routing, IVR integration, and agent endpoint management in platforms like Genesys, Five9, and Amazon Connect.
  • WebRTC signaling: while WebRTC uses its own APIs in the browser, the backend signaling infrastructure often translates to and from SIP for interoperability with the existing telephone network.
  • IoT and M2M communication: lightweight SIP stacks are used in embedded devices for push-to-talk, intercom systems, and industrial voice communication.

Frequently Asked Questions About SIP

What is the difference between SIP and RTP?

SIP and RTP serve completely different purposes. SIP is a signaling protocol that establishes, modifies, and terminates sessions. It handles the "phone call control" part: finding the other party, negotiating codecs, and hanging up. RTP (Real-time Transport Protocol) carries the actual audio and video media after SIP has set up the call. Think of SIP as the operator who connects your call, and RTP as the phone line that carries your voice.

What port does SIP use?

SIP uses port 5060 for unencrypted signaling over UDP or TCP. For encrypted signaling (SIPS), it uses port 5061 over TLS. These are the IANA-registered defaults. Some deployments use non-standard ports, but 5060 and 5061 are the universal standard that firewalls and SBCs (Session Border Controllers) expect.

Is SIP secure?

Plain SIP over UDP or TCP provides no encryption. The signaling messages, including caller identity and call metadata, are transmitted in clear text. To secure SIP signaling, you use SIPS (SIP over TLS on port 5061). To encrypt the media stream, you use SRTP (Secure RTP) instead of plain RTP. Most enterprise deployments and SIP trunking providers support TLS and SRTP. SIP also supports digest authentication (similar to HTTP Digest) for verifying user identity during REGISTER and INVITE transactions.

What is SIP trunking?

SIP trunking replaces traditional analog or ISDN phone lines with an IP-based connection between your PBX and a telephony service provider. Instead of physical T1 or PRI circuits, your PBX sends SIP INVITE messages over the internet (or a private MPLS connection) to the provider, who then routes the call to the PSTN. SIP trunking is significantly cheaper than legacy lines, supports flexible capacity scaling, and allows geographic number portability. Providers like Twilio, Bandwidth, and Lumen offer SIP trunking services.

What is the difference between SIP and VoIP?

VoIP (Voice over IP) is a broad term for any technology that transmits voice over IP networks. SIP is one specific protocol used to set up VoIP calls. VoIP also involves RTP for media transport, codecs like G.711 and Opus for audio encoding, and potentially other signaling protocols like H.323 or proprietary alternatives. SIP is the most widely used VoIP signaling protocol, but VoIP existed before SIP and can technically work without it.

Can SIP handle video calls?

Yes. SIP is media-agnostic. The SDP body inside SIP messages can describe audio, video, or any other media type. For video, the SDP includes an m=video line with video codecs like H.264 or VP8. The video call setup process is identical to audio: SDP offer/answer negotiation inside the INVITE/200 OK exchange. Cisco, Poly, and other video endpoint vendors rely on SIP for establishing video conferences between room systems.

Related Protocols

  • UDP: the default transport for SIP signaling on port 5060. Preferred for its low latency and minimal overhead.
  • TCP: used for SIP when messages exceed the UDP MTU or when reliable delivery is required.
  • TLS: provides encryption for SIP signaling (SIPS) on port 5061.
  • HTTP: SIP's design was modeled after HTTP, sharing request-response semantics and text-based header formatting.