Commit Graph

19147 Commits (060ac76257a8c1f7370a8a571821c1d73377701f)
 

Author SHA1 Message Date
pukkandan be6202f12b
Subtitle extraction from streaming media manifests #247
Authored by fstirlitz
Modified from: https://github.com/ytdl-org/youtube-dl/pull/6144

Closes: #73
Fixes:
https://github.com/ytdl-org/youtube-dl/issues/6106
https://github.com/ytdl-org/youtube-dl/issues/14977
https://github.com/ytdl-org/youtube-dl/issues/21438
https://github.com/ytdl-org/youtube-dl/issues/23609
https://github.com/ytdl-org/youtube-dl/issues/28132

Might also fix (untested):
https://github.com/ytdl-org/youtube-dl/issues/15424
https://github.com/ytdl-org/youtube-dl/issues/18267
https://github.com/ytdl-org/youtube-dl/issues/23899
https://github.com/ytdl-org/youtube-dl/issues/24375
https://github.com/ytdl-org/youtube-dl/issues/24595
https://github.com/ytdl-org/youtube-dl/issues/27899

Related:
https://github.com/ytdl-org/youtube-dl/issues/22379
https://github.com/ytdl-org/youtube-dl/pull/24517
https://github.com/ytdl-org/youtube-dl/pull/24886
https://github.com/ytdl-org/youtube-dl/pull/27215

Notes:
* The functions `extractor.common._extract_..._formats` are still kept for compatibility
* Only some extractors have currently been moved to using `_extract_..._formats_and_subtitles`
* Direct subtitle manifests (without a master) are not supported and are wrongly identified as containing video formats
* AES support is untested
* The fragmented TTML subtitles extracted from DASH/ISM are valid, but are unsupported by `ffmpeg` and most video players
    * Their XML fragments can be dumped using `ffmpeg -i in.mp4 -f data -map 0 -c copy out.ttml`.
        Once the unnecessary headers are stripped out of this, it becomes a valid self-contained ttml file
    * The ttml subs downloaded from DASH manifests can also be directly opened with <https://github.com/SubtitleEdit>
* Fragmented WebVTT files extracted from DASH/ISM are also unsupported by most tools
    * Unlike the ttml files, the XML fragments of these cannot be dumped using `ffmpeg`
    * The webtt subs extracted from DASH can be parsed by <https://github.com/gpac/gpac>
    * But validity of the those extracted from ISM are untested
4 years ago
Felix S e8f834cd8d [threeqsdn] Extract subtitles from streaming manifests 4 years ago
Felix S e0e624ca7f [canvas] Extract subtitles from streaming manifests 4 years ago
Felix S ec4f374c05 [wat] Extract subtitles from streaming manifests 4 years ago
Felix S c811e8d8bd [atresplayer] Extract subtitles from streaming manifests 4 years ago
Felix S b2cd5da460 [francetv] Extract subtitles from the HLS manifest 4 years ago
Felix S 2de3b21e05 [uplynk] Extract subtitles from HLS manifests 4 years ago
Felix S 4bed436371 [twitter] Extract subtitles from HLS manifests 4 years ago
Felix S efe9dba595 [srgssr] Extract subtitles from HLS manifests 4 years ago
Felix S 47f4203dd3 [nytimes] Extract subtitles from HLS manifests 4 years ago
Felix S 015c10aeec [roosterteeth] Use common code for subtitle extraction 4 years ago
Felix S a00d781b73 [elonet] Use common code for subtitle extraction 4 years ago
Felix S 0c541b563f [tv4] Extract subtitles from streaming manifests 4 years ago
Felix S 64a5cf7929 [byutv] Extract subtitles from streaming manifests 4 years ago
Felix S 7a450a3b1c [generic] Extract subtitles from direct SSTR manifest links 4 years ago
Felix S 7de27caf16 [generic] Extract subtitles from direct DASH manifest links 4 years ago
Felix S c26326c1be [generic] Extract subtitles from direct HLS manifest links 4 years ago
Felix S 66a1b8643a [downloader/ism] Support muxing TTML subtitles 4 years ago
Felix S 15828bcf25 [downloader/hls] Handle MPEG-2 PES timestamp overflow 4 years ago
Felix S 333217f43e [downloader/hls] Remove duplicate cues using a sliding window of candidates 4 years ago
Felix S 4a2f19abbd [downloader/hls] Assemble single-file WebVTT subtitles from HLS segments 4 years ago
Felix S 5fbcebed8c [test] Test SSTR manifest parsing 4 years ago
Felix S becdc7f82c [test] Test subtitle extraction from DASH manifests 4 years ago
Felix S 73b9088a1c [test] Test subtitle extraction from HLS manifests 4 years ago
Felix S f6a1d69a87 [extractor/common] Extend _extract_akamai_formats to also extract subtitle tracks 4 years ago
Felix S fd76a14259 [extractor/common, downloader/ism] Extract SSTR subtitle tracks
_parse_ism_formats was extended into _parse_ism_formats_and_subtitles;
all direct users were updated, though _extract_ism_formats was left
as a compatibility wrapper.

The SSTR downloader was also modified in order to prepare for muxing
subtitle streams, although no support for any subtitle codecs was
added in this commit.
4 years ago
Felix S 171e59edd4 [extractor/common] Extract DASH subtitle tracks
_extract_mpd_formats and _parse_mpd_formats were extended into
_…_formats_and_subtitles; wrappers with old names are provided
for compatibility.
4 years ago
Felix S a0c3b2d5cf [extractor/common] Extract HLS subtitle tracks
_extract_m3u8_formats is renamed to _extract_m3u8_formats_and_subtitles
and extended to handle subtitle tracks instead of skipping them;
a wrapper with the old name is provided for compatibility.

_parse_m3u8_formats is likewise renamed and extended, but without adding
the compatibility wrapper; the test suite is adjusted to test the enhanced
method instead.
4 years ago
Felix S 19bb39202d [extractor/common] Generalise _merge_subtitles
This allows modifying a subtitles dictionary in-place.
4 years ago
Felix S d4553567d2 [downloader/ism] Prevent writing the header again when resuming an interrupted download 4 years ago
Felix S 4d49884c58 [downloader/fragment] Allow persisting extra state when a download is interrupted 4 years ago
Felix S 5873d4ccdd [utils] Improve bug_report_message
Add an optional argument specifying the text that should go before
the message.
4 years ago
Hadi0609 db9a564b6a
[zee5] Fix extraction for some URLs (#279)
Closes: #278
4 years ago
Felix S c72967d5de
[mediasite] Generalize URL pattern (#275)
Authored by: fstirlitz
4 years ago
pukkandan 598d185db1
Fix case sensitivity of format selector
Bug introduced in f8d4ad9ab0
4 years ago
pukkandan b982cbdd0e
[limelight] Obey `allow_unplayable_formats` 4 years ago
pukkandan 6a04a74e8b
[FormatSort] Fix for when some formats have quality and others don't 4 years ago
pukkandan 88728713c8
Py2 compatibility for `FileNotFoundError` 4 years ago
CXwudi 6b1d8c1e30
[niconico] Fix title and thumbnail extraction (#273)
Authored by: CXwudi
4 years ago
Ashish 87c3d06271
[Mxplayer] Add MxplayerShowIE (#270)
Authored by: Ashish0804
4 years ago
pukkandan 915f911e36
[utils] Encode URLs in `YoutubeDLCookieProcessor`
Closes #263
4 years ago
pukkandan cf9d6cfb0c
[tubi] Raise "no video formats" error when video url is empty
Related: #266
4 years ago
pukkandan bbed5763f1
[francetvinfo] Improve video id extraction
Closes #261
4 years ago
pukkandan ca0b91b39e
[version] update :ci skip all 4 years ago
pukkandan 0cf0571560
Release 2021.04.22 4 years ago
pukkandan e58c22a0f6
[documentation] Fix typos 4 years ago
pukkandan e4bdd3377d
[ci] Disable fail-fast 4 years ago
pukkandan 0b2e9d2c30
[lazy_extractor] Do not load plugins 4 years ago
pukkandan 1bdae7d312
Update to ytdl-commit-7e8b3f9
[youtube] Remove unused code
7e8b3f9439
4 years ago
Felix S a471f21da6
[mildom] Remove proxy (#260)
Closes #251
Makes 2cff495997, ab406a1c0e, #252 obsolete

Authored by: fstirlitz
4 years ago