- Published on
Introduction to Live Streaming Pipeline
- Authors
- Name
- Ryan
Table of Contents
1. Streaming and Distribution Principles
A typical live streaming pipeline starts from the capture side, where audio and video are pushed via a streaming protocol
to a server or CDN. The CDN then transcodes and repackages the stream into multiple protocol formats (e.g., FLV, HLS, HTTP-FLV, LLS) for viewers to pull and play.
Modern protocols (like SRT, RIST, E‑RTMP) and standard ones (RTMP, HLS, HTTP‑FLV, RTC) serve different purposes: some prioritize ultra-low latency, while others focus on compatibility and large-scale delivery.
Capture and Push: The broadcaster typically uses RTMP to push streams, but SRT, RIST, or E‑RTMP can also be used. Upstream network quality affects push latency and stability.
SRT (Secure Reliable Transport): Based on UDP + ARQ retransmission; supports AES encryption and firewall traversal; typical latency ≈120 ms. Ideal for cross-network, high-reliability transmission.
RIST (Reliable Internet Stream Transport): Based on RTP/UDP + NACK; open standard focused on multi-vendor compatibility and low latency.
E‑RTMP (Enhanced RTMP): Extends RTMP with support for modern features such as AV1/HEVC, nanosecond timestamps, and multi-track audio.
Server / Transcoding / CDN: Handles stream distribution and protocol conversion. Each protocol has a different processing model: HLS requires segmenting and indexing (introducing batch latency); RTMP transmits frame by frame (lower latency).
Receive protocol stream → Transcode into multiple bitrates (H.264, H.265, AV1, etc.)
Output formats on demand: FLV/HTTP‑FLV (second-level latency), HLS/LL-HLS (cross-platform), MPEG‑DASH/LL‑DASH, WebRTC (real-time interaction)
CDN handles segment caching (.ts/.m4s), index distribution (.m3u8/.mpd), and edge protocol relay.
Playback Client: The player downloads, decodes, and renders the stream. Buffer size and decoding delay determine final latency. Players often buffer a few frames or seconds to smooth network fluctuations, which introduces extra delay.
2. Latency Composition and GOP Buffer Strategy
Total live latency is made up of three parts:
Production Side: Capture (10–30 ms), encoding (50–200 ms), preprocessing
Network/CDN: Transmission (≳ network RTT), segmentation & distribution (HLS segment duration × playlist length)
Playback Side: Player buffer (1–3 s), decoding, rendering, post-processing
The GOP
(Group of Pictures) value defines the number of frames between keyframes (IDR), i.e., the maximum frame count in a group.
Clients usually buffer 1–2s of content to mitigate jitter. To balance startup speed and latency, two GOP buffer strategies are common:
Upper-Bound Priority (Low Latency): No historical I-frame buffering; player waits for the next I-frame to begin playback (minimal delay).
Lower-Bound Priority (Anti-Jitter): Buffers complete GOPs and starts playback immediately using the cached I-frame ("instant start"), at the cost of increased delay (≈1 GOP length).
Using SRS as an example, the gop_cache config toggles this strategy:
- gop_cache off: enables upper-bound priority (low-latency playback),
- gop_cache on: allows fast startup using buffered GOPs.
Toggle based on scenario—enable buffering to avoid black screens in poor networks, or disable to reduce delay when downstream is stable.
3. Protocol Mechanisms and Use Cases
RTMP
(Real-Time Messaging Protocol): Developed by Adobe/Flash, uses long-lived TCP connections to transmit FLV packets. Pros: Low latency (1–3s), widely supported by capture devices and legacy players, firewall-friendly (often via port 80). Cons: Requires plugins or dedicated players (modern browsers don’t support Flash), TCP may cause accumulated delay under poor network. Use Cases: Interactive streams, PC-based streaming.HTTP-FLV
: Streams FLV via continuous HTTP requests. Works similarly to RTMP—packages audio/video into FLV Tags and transmits over HTTP.Pros: Uses standard HTTP (bypasses firewall), supports HTTPS, mobile-friendly.
Cons: Still based on TCP, requires a player or JS engine with HTTP-FLV support (not playable natively in browsers).
Use Cases: Web-based live streaming.
HLS
(HTTP Live Streaming): Apple’s segmented streaming protocol over HTTP. Server slices video into TS or CMAF segments and generates an M3U8 playlist.Pros: Fully HTTP-based, firewall-traversable, CDN-friendly, simple client implementation.
Cons: Intrinsically high latency due to segment-based delivery; standard delay typically 5–15s. LL-HLS reduces latency to 3–5s via shorter segments and partial loading.
Use Cases: Video-on-demand and general-purpose live streaming.
LLS
(Low-Latency Streaming): Refers to proprietary low-latency solutions using fine-grained segments (e.g., CMAF/FLV hybrid) and push-based delivery.Pros: Sub-second latency, real-time delivery at scale.
Cons: Vendor-locked, requires specific SDKs/players, higher deployment cost.
Use Cases: Large-scale live events requiring low latency.
RTC
(WebRTC): UDP-based real-time communication protocol using SRTP.Pros: Ultra-low point-to-point latency (<500ms), ideal for video conferencing or co-hosting.
Cons: Requires complex signaling and NAT traversal; not natively scalable (requires SFU/MCU); browser and hardware compatibility varies.
Use Cases: 1-on-1 or small group low-latency interactions; not suitable for CDN broadcasting.
Protocol | Container | Transport | Latency | Advantages | Disadvantages | Use Cases |
---|---|---|---|---|---|---|
RTMP | FLV | TCP | 1–3s | Low latency, CDN-friendly, widely compatible | Needs special player, TCP delay under poor network | PC streaming, interaction |
HTTP-FLV | FLV | TCP(HTTP) | 1–3s | HTTP-compatible, mobile-friendly | Not supported by default in browsers | Web-based live streaming |
HLS | TS/CMAF | TCP(HTTP) | 5-15s | Great compatibility, scalable | High latency, poor real-time performance | VOD and general live |
LLS | CMAF | TCP(HTTP) | <0.5s | Ultra-low latency, end-to-end sync | Vendor-locked, requires specific SDK/player | High-concurrency, low-latency |
RTC | RTP | UDP/SRTP | <0.5s | Real-time, P2P | Poor scalability, hard to distribute via CDN | Co-hosting, conferencing |
4. Transcoding Profiles
Encoder Options:
H.264: Most widely compatible, suitable for SD/HD.
H.265: Higher compression (≈50% bitrate reduction at same quality), suitable for 4K/8K, but requires client support.
For users on poor networks, a Narrowband HD profile
may be offered—maintains resolution while reducing bitrate.
Example transcoding profiles (actual strategy may vary by vendor):
Profile | Resolution | FPS | H.264Bitrate | H.265Bitrate | Description |
---|---|---|---|---|---|
origin | - | - | - | - | Original stream |
uhd | 3840×2160 | 60fps | 6Mbps | 3Mbps | Ultra HD 4K |
hd | 1920×1080 | 45fps | 4Mbps | 2Mbps | Full HD 1080p |
zuhd | 1920×1080 | 45fps | 2Mbps | 1Mbps | Narrowband HD (low-bitrate FHD) |
sd | 1280×720 | 30fps | 2Mbps | 1Mbps | HD 720p |
ld | 960x540 | 25fps | 1Mbps | 500kbps | SD 540p |