Commit Graph

596 Commits (19dc10b986eeda47975a0e77e84df20ad3b59062)

Author SHA1 Message Date
dirkf f66372403f [InfoExtractor] Rework and improve JWPlayer extraction
* use traverse_obj() and _search_json()
* support playlist `.load({**video1},{**video2}, ...)`
* support transform_source=... for _extract_jwplayer_data()
11 months ago
dirkf 7216fa2ac4 [InfoExtractor] Add `_search_json()`
* uses the error diagnostic to truncate the JSON string
* may be confused by non-C-Pythons
11 months ago
dirkf 1fd8f802b8 [InfoExtractor] Correctly resolve BaseURL in DASH manifest
Specs:
* ISO/IEC 23009-1:2012 section 5.6
* RFC 3986 section 5.
12 months ago
dirkf 4eaeb9b2c6 [InfoExtractor] Support byte range for DASH
* adapted from https://github.com/ytdl-org/youtube-dl/pull/30279
* thx former GH user kikuyan
12 months ago
dirkf c58b655a9e [InfoExtractor] Support DASH subtitle extraction (yt-dlp back-port) 12 months ago
dirkf 640d39f03a [InfoExtractor] Support some warning and `._downloader` shortcut methods from yt-dlp 1 year ago
dirkf a25e9f3c84 [compat] Use `compat_open()` 2 years ago
dirkf b2ba24bb02 [InfoExtractor] Add `_match_valid_url()` class method and refactor
* API compatible with yt-dlp
* also support Sequence of patterns in _VALID_URL
* one place to compile _VALID_URL
* TODO: remove existing extractor shims
2 years ago
dirkf b2741f2654 [InfoExtractor] Add search methods for Next/Nuxt.js from yt-dlp
* add _search_nextjs_data(), from https://github.com/yt-dlp/yt-dlp/pull/1386
  thanks selfisekai
* add _search_nuxt_data(), from https://github.com/yt-dlp/yt-dlp/pull/1921,
  thanks Lesmiscore, pukkandan
* add tests for the above
* also fix HTML5 type recognition and tests, from
  222a230871,
  thanks Lesmiscore
* update extractors in PR using above, fix tests.
2 years ago
dirkf 1e8ccdd2eb [InfoExtractor] Support groups in _`search_regex()`, etc 2 years ago
dirkf 42b098dd79 [InfoExtractor] Handle unquoted values in OpenGraph searches 2 years ago
dirkf 604762a9f8
[common:jwplayer] Improve jwplayer extraction and parsing (#31000)
* don't crash parser if jwplayer_data is invalid (empty, or no formats)
* use `label` in `sources[n]` as `format_id`
* relax `jwplayer().setup(...)` RE (also rework PR #27274 enhancement)
* detect more manifest formats in _parse_jwplayer_formats() (from PR #29596)
* improve metadata extraction (from PR #25433)
* remember URLs in a set
* use parse_resolution() in format
* extract filesize in format (from yt-dlp)

Co-authored-by: kikuyan <kikuyan@users.noreply.github.com>
Co-authored-by: martin54 <martin54@users.noreply.github.com>
2 years ago
dirkf 11b284c81f
[Common:JWPlayer] Fix x1000 scaling error
See https://github.com/yt-dlp/yt-dlp/issues/5106#issuecomment-1264625161
2 years ago
Sergey M․ 70d0d4f9be
[compat] Use more conventional name for compat SimpleCookie 4 years ago
Remita Amine 162bf9e10a [compat] add compat_SimpleCookie 4 years ago
Remita Amine 6beb1ac65b [extractor/common] keep support for non standard JSON-LD VideoObject author values 4 years ago
Remita Amine e165f5641f [extractor/common] fix JSON-LD VideoObject author extraction 4 years ago
Remita Amine 1df2596f81 [extractor/common] fix _get_cookies method for python 2(#20673, #23256, #20326, closes #28640) 4 years ago
Sergey M․ 477bff6906
Introduce release_timestamp meta field (refs #28386) 4 years ago
Remita Amine 67299f23d8 [youtube] Rewrite Extractor
- improve format sorting
- remove unused code(swf parsing, ...)
- fix series metadata extraction
- fix trailer video extraction
- improve error reporting
- extract video location
4 years ago
Remita Amine 22feed08a1 [common] remove unwanted query params from unsigned akamai manifest URLs 4 years ago
Sergey M․ 1727541315
[extractor/common] Improve JSON-LD interaction statistic extraction (refs #23306) 4 years ago
Sergey M․ eae19a4473
[extractor/common] Document duration meta field for playlists 4 years ago
Sergey M․ 5a1fbbf8b7
[extractor/common] Fix inline HTML5 media tags processing and add test (closes #27345) 4 years ago
Sergey M․ 91dd25fe1e
[extractor/common] Add support for dl8-* media tags (closes #27283) 4 years ago
Sergey M․ 06bf2ac20f
[extractor/common] Eliminate media tag name regex duplication 4 years ago
Sergey M․ 6ad0d8781e
[extractor/common] Fix media type extraction for HTML5 media tags in start/end form 4 years ago
Remita Amine da4304609d [extractor/commons] improve Akamai HTTP formats extraction 4 years ago
Remita Amine 664dd8ba85 [extractor/common] improve Akamai HTTP format extraction
- Allow m3u8 manifest without an additional audio format
- Fix extraction for qualities starting with a number
Solution provided by @nixxo based on: https://stackoverflow.com/a/5984688
4 years ago
Remita Amine 193422e12a [extractor/common] add generic support for akamai http format extraction 4 years ago
Josh Soref 71ddc222ad
Fix typos (#27084)
* spelling: authorization

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: brightcove

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: creation

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: exceeded

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: exception

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: extension

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: extracting

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: extraction

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: frontline

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: improve

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: length

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: listsubtitles

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: multimedia

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: obfuscated

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: partitioning

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: playlist

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: playlists

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: restriction

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: services

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: split

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: srmediathek

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: support

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: thumbnail

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: verification

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: whitespaces

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
4 years ago
Sergey M․ c7178f0f7a
[extractor/common] Output error for invalid URLs in _is_valid_url (refs #21400, refs #24151, refs #25617, refs #25618, refs #25586, refs #26068, refs #27072) 4 years ago
Sergey M․ ce5b904050
[extractor/common] Relax interaction count extraction in _json_ld 4 years ago
Sergey M․ ad06b99dd4
[extractor/common] Extract author as uploader for VideoObject in _json_ld 4 years ago
Sergey M․ f8c7bed133
[extractor/common] Handle ssl.CertificateError in _request_webpage (closes #26601)
ssl.CertificateError is raised on some python versions <= 3.7.x
4 years ago
Sergey M․ 6c22cee673
[extractor/common] Use compat_cookiejar_Cookie for _set_cookie (closes #23256, closes #24776)
To always ensure cookie name and value are bytestrings on python 2.
5 years ago
Sergey M․ 4433bb0245
[extractor/common] Extract multiple JSON-LD entries 5 years ago
Sergey M․ 13b08034b5
[extractor/common] Skip malformed ISM manifest XMLs while extracting ISM formats (#24667) 5 years ago
Sergey M․ 7947a1f7db
Remove no longer needed compat_str around geturl 5 years ago
Sergey M․ e2f8bf5888
[extractor/common] Convert ISM manifest to unicode before processing on python 2 (#24152) 5 years ago
Remita Amine 5ef62fc4ce [dailymotion] improve extraction
- extract http formats included in m3u8 manifest
- fix user extraction(closes #3553)(closes #21415)
- add suport for User Authentication(closes #11491)
- fix password protected videos extraction(closes #23176)
- respect age limit option and family filter cookie value(closes #18437)
- handle video url playlist query param
- report alowed countries for geo-restricted videos
5 years ago
Sergey M․ 7360c06fac
[extractor/common] Add data, headers and query to all major extract methods preserving standard order for potential future use 5 years ago
Remita Amine f81dd65ba2 [extractor/common] clean jwplayer description HTML tags 5 years ago
Remita Amine 3ec86619e3 [common] initialize headers param with empty dict 5 years ago
Remita Amine 57033e35e5 [common] fix typo 5 years ago
Remita Amine b6139cb0c3 [common] pass headers to _extract_(m3u8|mpd)_formats methods 5 years ago
Sergey M․ 25e911a968
[extractor/common] Make _is_valid_url more relaxed 5 years ago
Petr Vaněk 5e1c39ac85 [extractor/common] Fix typo in thumbnails resolution description (#21817) 6 years ago
Sergey M․ f856816b94
[extractor/common] Strip src attribute for HTML5 entries code (closes #18485, closes #21169) 6 years ago
Sergey M․ ce2fe4c01c
[extractor/common] Add doc string for _apply_first_set_cookie_header 6 years ago