SIP and RTP Basics: A 7-Step Guide

You’ll grasp SIP fast by mapping its layers (syntax, transport, transactions, users) and roles: UAs at the edge, proxies, redirects, registrars, and location servers in the core. See how INVITE triggers setup, 100 Trying calms retransmits, and 180 Ringing signals alerting. Negotiate media via SDP offer/answer, rtpmap/fmtp/ptime, and handle 488 failures. Control sessions with ACK, re-INVITE hold/resume, REFER transfers, and UPDATE. Secure signaling/media with TLS/SRTP, strong ciphers, and sane firewall rules—then build confidence step by step.

Key Takeaways

  • SIP sets up, modifies, and tears down sessions; RTP carries the actual media once a session is established.
  • Call flow: INVITE → 100 Trying → 180 Ringing → 200 OK → ACK, then media starts on negotiated ports.
  • SDP offer/answer negotiates codecs, ports, and directions; rtpmap and fmtp define payload formats and parameters.
  • RTP streams use dynamic UDP ports; SRTP secures media with keying via SDES, MIKEY, or ZRTP.
  • Proxies route SIP, registrars map users to contacts; TLS on 5061 protects signaling, SRTP protects media.

SIP Architecture and Core Components

Although SIP is often discussed as a signaling “protocol,” its architecture is a coordinated system of roles and layers that cleanly separate concerns. You’ll work with four layers: syntax/encoding (ABNF), transport (UDP/TCP/SCTP), the transaction layer, and the transaction user. The transaction layer enforces transaction state management—matching requests to responses, handling retransmits, and timing out cleanly.

At the edge, User Agents combine UAC and UAS roles, negotiate SDP, and maintain dialog state via Call-ID, tags, and sequence numbers. In the core, Proxy Servers route requests; they may be stateless or stateful depending on transaction needs. Redirect Servers return 3xx targets without relaying. Registrar Servers record contacts, while Location Servers provide reachability data. SBCs, acting as B2BUAs, deliver topology hiding and security, NAT traversal, and policy enforcement across domains. SIP works alongside RTP and SDP in a packet-switched environment to enable interoperability across multimedia sessions.

Call Setup Flow: From INVITE to Ringing

With the roles and layers in place, you can watch a call begin with a single message: the INVITE. Your User Agent Client fires an INVITE toward the callee’s SIP URI, carrying caller ID and supported media in SDP. Proxies validate the request, consult the location server, and forward it—sometimes across several hops. The first server replies 100 Trying to halt retransmits and confirm processing. As the callee’s User Agent Server alerts the user, it returns 180 Ringing along the reverse path. You hear ringback; the dialog remains provisional, no media yet. Track call security concerns and call quality optimization from the outset. The SIP call flow is the exchange of signaling messages that tell the system when to ring, when to connect, how to route the call, and when to end it.

Step Message Purpose
1 INVITE Initiates session setup
2 100 Trying Stops retransmits; processing
3 Proxy routing Locate/forward to callee
4 180 Ringing Device alerting; provisional

Media Negotiation With SDP and RTP Basics

Once the INVITE and provisional responses fly, SDP takes center stage to define what you’ll actually exchange. You craft an offer per RFC 3264, listing media streams via m= lines, transport (e.g., RTP/AVP), ports, and codecs. The answerer picks compatible options and returns an SDP answer; session parameters are set only where both sides intersect. The SDP Offer/Answer model also enables negotiating codec compatibility, ensuring at least one common codec is present to avoid 488 Not Acceptable Here failures.

Use rtpmap to bind payload types to codecs and clock rates, fmtp for codec parameters, and ptime to tune packetization. Include telephone-event for DTMF. Defaults live at the session level, overridden per-media as needed. Capability negotiation (RFC 5939) lets you advertise multiple configurations up front.

Plan for sdp negotiation failure considerations: a 488 indicates no common codec or parameters. Expect rtp quality of service impacts from mismatched ptime, payload types, or ports.

Session Management: ACK, Hold, Transfer, and Updates

SDP has set the media terms; now the signaling must keep the session stable and adaptable. You finalize setup with SIP’s three-way handshake: INVITE, a 2xx final response, then ACK. Only 2xx to INVITE needs an ACK; non-INVITE methods complete without it. ACK carries no SDP—negotiation is already done. For resilience, pair strict error handling with failover mechanisms across proxies and UAS/UAC logic. SIP works with RTP to deliver media streams efficiently, enhancing reliability and collaboration in modern communications through its adaptable design.

Hold and resume: send a re-INVITE with SDP a=sendonly or a=inactive; the peer mirrors direction. Resume with a normal direction (sendrecv). PRACK and session timers prevent zombie states.

Transfer: use REFER (RFC 3515). Blind or attended, with NOTIFY for status and Transfer-Target indicating the new destination.

UPDATE: modify session parameters without changing dialog state—early media, codec changes—distinct from re-INVITE.

Secure Transport: Ports, TLS, and Best Practices

Two layers lock down SIP sessions: TLS for signaling and SRTP for media. You run SIPS on port 5061 (UDP/TCP fallback on 5060), then negotiate SRTP over dynamic UDP ports (commonly 16384–32768). Guarantee a secure signaling path; many systems reject SRTP if SIPS isn’t active.

Harden signaling with TLS ciphers that support PFS and validate certificates rigorously. For media encryption configuration, advertise SRTP in SDP and confirm message authentication, especially for SRTCP. Avoid NULL ciphers. SRTP uses HMAC-SHA1 to authenticate packets and protect integrity, with sequence numbers preventing replay attacks.

Derive SRTP keys via compatible key exchange protocols: SDES (in SDP), MIKEY, or ZRTP. Master keys generate per-session keys; protect them—one leak compromises all derived keys, though session keys provide forward secrecy.

Configure firewalls to permit negotiated SRTP ranges while restricting admin and signaling ports. Disable insecure fallbacks.

Frequently Asked Questions

How Does NAT Affect SIP Signaling and RTP Media Paths?

NAT breaks SIP by hiding private addresses and dropping replies to unopened ports; it disrupts RTP by blocking UDP flows. You mitigate using nat traversal techniques: rport/received, STUN/TURN, symmetric RTP, and media relay requirements via RTP proxies.

What Debugging Tools Help Troubleshoot SIP and RTP Issues?

Use Wireshark for packet capture analysis and RTP stream review; SIP Workbench for ladder diagrams; VoIPmonitor for MOS, jitter buffer monitoring, and alerts; SIPp/StarTrinity for load reproduction; PacketSafari/QXIP exports for collaboration; enable PCAP exports and TLS/SRTP decryption.

How Do STUN, TURN, and ICE Assist With Connectivity?

They enable connectivity by discovering public addresses, relaying when direct paths fail, and testing candidate pairs. You use STUN/TURN/ICE with server selection criteria to minimize latency, guarantee reliability, and achieve bandwidth optimization through prioritized checks, triggered validations, and adaptive fallback to relays.

How Are SIP Codecs Chosen for Call Quality and Bandwidth?

You choose SIP codecs through codec selection that balances MOS targets and bandwidth optimization. Prioritize G.722 for wideband quality, G.711 for standard clarity, and G.729 or iLBC when constrained. Evaluate latency, jitter, loss, PSTN compatibility, and call concurrency.

What Causes One-Way Audio and How to Fix It?

One-way audio stems from NAT/firewall blocks, bad SDP addresses, SBC anchoring errors, codec mismatches, or network congestion problems. Fix by disabling SIP ALG, correcting NAT, validating SDP, anchoring media, prioritizing RTP, testing paths, and resolving audio quality issues with QoS.

Conclusion

You’ve mapped SIP’s building blocks, traced call setup, and seen how SDP aligns codecs so RTP can flow. You can manage sessions with ACKs, holds, transfers, and mid-call updates. You also know which ports matter and how TLS hardens transport. Apply this rigor: validate headers, monitor state machines, negotiate conservatively, and lock down signaling paths. When issues arise, inspect SIP traces and RTP stats. With these fundamentals, you’ll diagnose faster, optimize quality, and deploy securely at scale.

Share your love
Greg Steinig
Greg Steinig

Gregory Steinig is Vice President of Sales at SPARK Services, leading direct and channel sales operations. Previously, as VP of Sales at 3CX, he drove exceptional growth, scaling annual recurring revenue from $20M to $167M over four years. With over two decades of enterprise sales and business development experience, Greg has a proven track record of transforming sales organizations and delivering breakthrough results in competitive B2B technology markets. He holds a Bachelor's degree from Texas Christian University and is Sandler Sales Master Certified.

Articles: 116