Commit Graph

542 Commits (034f5fb5ee37e9edf90f7e82f48af985cdd17901)

Author SHA1 Message Date
Sergey M․ b0d21deda9 [extractor/common] Auto calculate tbr when missing
Yen Chi Hsuan 77f785076f [common] Keep full codec name from m3u8 manifests
See . This is for consistency between YouTube and HLS formats.
Yen Chi Hsuan 0b26ba3fc8 [extractor/common] Allow passing more parameters to _search_json_ld
Sergey M․ 4ca2a3cf3c [extractor/common] Add initial support for JSON-LD metadata extraction into info_dict
Jakub Wilk dfb1b1468c Fix typos
Closes .
Sergey M 3f3343cd3e Merge pull request from dstftw/introduce-chapter-and-series-fields
Introduce chapter and series fields
Sergey M․ 27bfd4e526 [extractor/common] Introduce number fields for chapters and series
Philipp Hagemeister 32f9036447 [ccc] Add language information to formats
Sergey M․ 7109903e61 [extractor/common] Document chapter and series fields
Sergey M․ 7e5edcfd33 Simplify formats accumulation for f4m/m3u8/smil formats
Now all _extract_*_formats routines return a list
remitamine 39d60b715a Merge pull request from remitamine/sort
[common] lower (m3u8,rtmp,rtsp) format preference only if required program is not available
remitamine d497a201ca [common] use specific variable for protocol preference in _sort_formats
remitamine 8d29e47f54 [common] simplify the use of _extract_m3u8_formats and _extract_f4m_formats
Sergey M․ 9b9c5355e4 Rename error_to_str to error_to_compat_str
Sergey M․ 7f8b271465 Properly convert errors to strings
Sergey M․ dd85e4d707 [extractor/common] Properly decode error string on python 2 (Closes , closes , closes , closes )
Sergey M․ 62d231c004 [extractor/common] Clarify duration can be float
Sergey M? 5c2266df4b Switch codebase to use sanitized_Request instead of
compat_urllib_request.Request

[downloader/dash] Use sanitized_Request

[downloader/http] Use sanitized_Request

[atresplayer] Use sanitized_Request

[bambuser] Use sanitized_Request

[bliptv] Use sanitized_Request

[brightcove] Use sanitized_Request

[cbs] Use sanitized_Request

[ceskatelevize] Use sanitized_Request

[collegerama] Use sanitized_Request

[extractor/common] Use sanitized_Request

[crunchyroll] Use sanitized_Request

[dailymotion] Use sanitized_Request

[dcn] Use sanitized_Request

[dramafever] Use sanitized_Request

[dumpert] Use sanitized_Request

[eitb] Use sanitized_Request

[escapist] Use sanitized_Request

[everyonesmixtape] Use sanitized_Request

[extremetube] Use sanitized_Request

[facebook] Use sanitized_Request

[fc2] Use sanitized_Request

[flickr] Use sanitized_Request

[4tube] Use sanitized_Request

[gdcvault] Use sanitized_Request

[extractor/generic] Use sanitized_Request

[hearthisat] Use sanitized_Request

[hotnewhiphop] Use sanitized_Request

[hypem] Use sanitized_Request

[iprima] Use sanitized_Request

[ivi] Use sanitized_Request

[keezmovies] Use sanitized_Request

[letv] Use sanitized_Request

[lynda] Use sanitized_Request

[metacafe] Use sanitized_Request

[minhateca] Use sanitized_Request

[miomio] Use sanitized_Request

[meovideo] Use sanitized_Request

[mofosex] Use sanitized_Request

[moniker] Use sanitized_Request

[mooshare] Use sanitized_Request

[movieclips] Use sanitized_Request

[mtv] Use sanitized_Request

[myvideo] Use sanitized_Request

[neteasemusic] Use sanitized_Request

[nfb] Use sanitized_Request

[niconico] Use sanitized_Request

[noco] Use sanitized_Request

[nosvideo] Use sanitized_Request

[novamov] Use sanitized_Request

[nowness] Use sanitized_Request

[nuvid] Use sanitized_Request

[played] Use sanitized_Request

[pluralsight] Use sanitized_Request

[pornhub] Use sanitized_Request

[pornotube] Use sanitized_Request

[primesharetv] Use sanitized_Request

[promptfile] Use sanitized_Request

[qqmusic] Use sanitized_Request

[rtve] Use sanitized_Request

[safari] Use sanitized_Request

[sandia] Use sanitized_Request

[shared] Use sanitized_Request

[sharesix] Use sanitized_Request

[sina] Use sanitized_Request

[smotri] Use sanitized_Request

[sohu] Use sanitized_Request

[spankwire] Use sanitized_Request

[sportdeutschland] Use sanitized_Request

[streamcloud] Use sanitized_Request

[streamcz] Use sanitized_Request

[tapely] Use sanitized_Request

[tube8] Use sanitized_Request

[tubitv] Use sanitized_Request

[twitch] Use sanitized_Request

[twitter] Use sanitized_Request

[udemy] Use sanitized_Request

[vbox7] Use sanitized_Request

[veoh] Use sanitized_Request

[vessel] Use sanitized_Request

[vevo] Use sanitized_Request

[viddler] Use sanitized_Request

[videomega] Use sanitized_Request

[viewvster] Use sanitized_Request

[viki] Use sanitized_Request

[vk] Use sanitized_Request

[vodlocker] Use sanitized_Request

[voicerepublic] Use sanitized_Request

[wistia] Use sanitized_Request

[xfileshare] Use sanitized_Request

[xtube] Use sanitized_Request

[xvideos] Use sanitized_Request

[yandexmusic] Use sanitized_Request

[youku] Use sanitized_Request

[youporn] Use sanitized_Request

[youtube] Use sanitized_Request

[patreon] Use sanitized_Request

[extractor/common] Remove unused import

[nfb] PEP 8
Sergey M․ 019839faaa [extractor/common] Use baseURL from f4m manifest for recursive manifest extraction
Sergey M 30eecc6a04 Merge pull request from jaimeMF/xml_attrib_unicode
Use a wrapper around xml.etree.ElementTree.fromstring in python 2.x (…
Sergey M․ dbd82a1d4f [extractor/common] Fix m3u8 extraction on failure
Sergey M․ dc519b5421 [extractor/common] Make ie_key and IE_NAME return unicode string
Jaime Marquínez Ferrándiz 36e6f62cd0 Use a wrapper around xml.etree.ElementTree.fromstring in python 2.x ()
Attributes aren't unicode objects, so they couldn't be directly used in info_dict fields (for example '--write-description' doesn't work with bytes).
remitamine 3711304510 [extractor/common] get the redirected m3u8_url in _extract_m3u8_formats
Jaime Marquínez Ferrándiz 865d1fbafc [extractor/common] Remove unused import
Sergey M․ 943a1e24b8 [extractor/common] Use more generic URLError in _is_valid_url
Sergey M․ 02835c6bf4 [extractor/common] Document repost_count
Sergey M․ 448ef1f31c [extractor/common] Allow angle brackets in attributes in _og_regexes ()
Sergey M․ 7a6d76a64d [extractor/common] Require closing quote in _og_regexes (Closes )
E.g. do not match `property='og:video:type'` when `og:video` is requested.
Sergey M․ 4180a3d8b7 [extractor/common] Allow quoteless content attribute in og regexes (Closes )
Yen Chi Hsuan 57935b2564 [extractor/common] Allow HTML5 unquoted attribute values
Fixes 

HTML5 allows unquoted attribute values. See the "Unquoted attribute value
syntax" section [1] for more information

[1] http://www.w3.org/TR/html5/syntax.html
Sergey M․ 4bba371644 [YoutubeDL] Autocalculate ext for subtitles when missing
Sergey M․ e5851b963a [extractor/common] Make f4m extraction for SMIL non fatal
Sergey M․ 4de6131090 [extractor/common] Add fatal to _extract_f4m_formats
Sergey M․ 3a1341a7bc [extractor/common] Make m3u8 extraction for SMIL non fatal
Sergey M․ c78e48177c [extractor/common] Check validity of direct URLs
Sergey M․ 647eab4541 [extractor/common] Extract upload date from SMIL
Sergey M․ 1e5bcdec02 [extractor/common] Extract images from SMIL
Sergey M․ e7d8e98a9f [extractor/common] Allow float bitrates
Sergey M․ 8aab976bbd [extractor/common] Document release_date field
Sergey M․ c430802e32 [extractor/common] Add raise_geo_restricted
Sergey M․ 586f1cc532 [extractor/common] Skip html comment tags (Closes )
Sergey M․ 73eb13dfc7 [extractor/common] Case insensitive inputs extraction
Sergey M․ be0e5dbd83 [extractor/common] Extract submit inputs
Sergey M․ 43e7d3c945 [extractor/common] Add raise_login_required
Jaime Marquínez Ferrándiz 8c97f81943 [common] Follow convention of using 'cls' in classmethods
Yen Chi Hsuan f738dd7b7c [common] Remove debugging codes
Yen Chi Hsuan 912e0b7e46 [common] Add _merge_subtitles()
Yen Chi Hsuan 03bc7237ad [common] _parse_smil_subtitles: accept `lang` as the subtitle language
Sergey M․ 5cdefc4625 [extractor/common] Add more subtitle mime types for guess when ext is missing
Sergey M․ ce00af8767 [extractor/common] Add default subtitles lang
Yen Chi Hsuan f877c6ae5a [theplatform] Use InfoExtractor._parse_smil_formats()
Sergey M․ e64b756943 [extractor/common] Interactive TFA code input
Sergey M․ 201ea3ee8e [extractor/common] Improve _hidden_inputs
Sergey M․ 8b9848ac56 [extractor/common] Expand meta regex
Sergey M․ 942acef594 [extractor/common] Extract _parse_xspf
Sergey M․ 98044462b1 [extractor/common] Use playlist id as default title
Sergey M․ e0b9d78fab [extractor/common] Clarify playlists can have description field
Sergey M․ 8d6765cf48 [extractor/generic] Add generic support for xspf playist extraction
Sergey M. d5d7bdaeb5 Merge pull request from dstftw/improve-generic-smil-support
Improve generic SMIL support
Sergey M․ 5b0c40da24 [extractor/common] Expand meta regex
Sergey M․ 17712eeb19 [extractor/common] Extract namespace parse routine
Sergey M․ 41c3a5a7be [extractor/common] Fix python 3
Sergey M․ a107193e4b [extractor/common] Extract f4m and m3u8 formats, subtitles and info
remitamine 799207e838 [viewster] extract the api auth token
Closes .
Sergey M․ 864f24bd2c [extractor/common] Add _meta_regex and clarify tags field
Purdea Andrei 5316bf7487 Documented tags as a possible dict key
Sergey M․ 10952eb2cf [extractor/common] Consistent URL spelling
Jaime Marquínez Ferrándiz 297a564bee [youtube] Extract end_time
Jaime Marquínez Ferrándiz 7c80519cbf [youtube] Extract start_time
From the 't=*' in the url.
Currently youtube-dl doesn't use the value, but it was requested for the mpv plugin.
Sergey M․ 74fe23ec35 [extractor/common] Style
Yen Chi Hsuan a38436e889 [extractor/common] Add 'transform_source' parameter to _extract_f4m_formats()
Sergey M․ 31c746e5dc [extractor/common] Keep going in some media_url is missing
Sergey M․ 70f0f5a8ca [extractor/common] Recursively extract child f4m manifests
Sergey M․ cc357c4db8 [extractor/common] Properly handle full URLs
Sergey M․ 97f4aecfc1 [extractor/common] Handle malformed f4m manifests
Sergey M․ cf61d96df0 [extractor/common] Add _form_hidden_inputs
Sergey M․ f8da79f828 [extractor/common] Improve _form_hidden_inputs and rename to _hidden_inputs
Sergey M․ 27713812a0 [extractor/common] Add method for extracting form hidden input fields as dict
Yen Chi Hsuan 13af92fdc4 [common] Add 'fatal' to _extract_m3u8_formats
Sergey M․ 5414623791 [extractor/common] Remove superfluous line
Sergey M․ c342041fba [extractor/common] Use NO_DEFAULT from utils
Yen Chi Hsuan 621ed9f5f4 [common] Add note and errnote field for _extract_m3u8_formats
Sergey M․ baa43cbaf0 [extractor/common] Relax valid url check verbosity
Yen Chi Hsuan c1c924abfe [utils,common] Merge format_srt_time and _subtitles_timecode
format_srt_time uses a comma as the delimiter between seconds and
milliseconds while _subtitles_timecode uses a dot. All .srt examples I
found on the Internet uses a comma, so I use a comma in the merged
version. See http://matroska.org/technical/specs/subtitles/srt.html and
http://devel.aegisub.org/wiki/SubtitleFormats/SRT
Yen Chi Hsuan 05d5392cda [common] Ignore subtitles in m3u8
Sergey M․ 74f728249f [extractor/common] Fallback to empty string for (yet) missing `format_id` in `_sort_formats` (Closes )
Jaime Marquínez Ferrándiz 2ddcd88129 Remove code that was only used by the Grooveshark extractor
zouhair cf0649f8b7 Typo: twice "the the" to "the"
Sergey M․ 3ded7bac16 [extractor/common] Add ability to specify custom field preference for `_sort_formats`
Jaime Marquínez Ferrándiz 08f2a92c9c InfoExtractor._search_regex: Suggest updating when the regex is not found (suggested in )
Reuse the same message from ExtractorError
Yen Chi Hsuan c9a779695d [extractor/common] Add the encoding parameter
The QQMusic info extractor need forced encoding for correct working.
Sergey M․ 830d53bfae [utils] Add `video_title` for `url_result`
Sergey M․ e21a55abcc [extractor/common] Remove f4m section
It's now provided by `f4m_id`
Sergey M․ 4a34f69ea6 [extractor/common] Add subtitles timecode formatter
Sergey M․ f207019ce5 [extractor/common] Remove 'm3u8' from quality selection URL
Sergey M․ 8dc9d361c2 [extractor/common] Fix format_id when `last_media` is None and always include `m3u8_id` if present
The rationale behind `m3u8_id` was to resolve duplicates when processing several m3u8 playlists within the same media that give equal resulting `format_id`'s,
e.g. `youtube-dl http://www.rts.ch/play/tv/passe-moi-les-jumelles/video/la-fee-des-bois-mustang-les-chemins-du-vent?id=3854925 -F`
Philipp Hagemeister a0bb7c5593 [extractor/common] Improve m3u format IDs ()
Sergey M․ 2f0f6578c3 [extractor/common] Assume non HTTP(S) URLs valid
Philipp Hagemeister 72a406e7aa [extractor/common] Pass in video_id ()
Antti Ajanki 6f4ba54079 [extractor/common] Extract HTTP (possibly f4m) URLs from a .smil file
Antti Ajanki 637570326b [extractor/common] Extract the first of a seq of videos in a .smil file
Jaime Marquínez Ferrándiz bfc993cc91 Merge branch 'subtitles-rework'
(Closes PR )
Sergey M․ 9fe6ef7ab2 [extractor/common] Fix preference for m3u8 quality selection URL
Philipp Hagemeister 8fb3ac3649 PEP8: W503
Philipp Hagemeister 77b2986b5b [extractor/common] Recognize Indian censorship ()
Jaime Marquínez Ferrándiz 9868ea4936 [extractor/common] Simplify subtitles handling methods
Initially I was going to use a single method for handling both subtitles and automatic captions, that's why I used the 'list_subtitles' and the 'subtitles' variables.
Philipp Hagemeister fa15607773 PEP8 fixes
Jaime Marquínez Ferrándiz 4cd95bcbc3 [twitch:stream] Prefer the 'source' format (fixes )
Sergey M? 4069766c52 [extractor/common] Test URLs with GET
Jaime Marquínez Ferrándiz 360e1ca5cc [youtube] Convert to new subtitles system
The automatic captions are stored in the 'automactic_captions' field, which is used if no normal subtitles are found for an specific language.
Jaime Marquínez Ferrándiz c84dd8a90d [YoutubeDL] store the subtitles to download in the 'requested_subtitles' field
We need to keep the orginal subtitles information, so that the '--load-info' option can be used to list or select the subtitles again.
We'll also be able to have a separate field for storing the automatic captions info.
Jaime Marquínez Ferrándiz a504ced097 Improve subtitles support
For each language the extractor builds a list with the available formats sorted (like for video formats), then YoutubeDL selects one of them using the '--sub-format' option which now allows giving the format preferences (for example 'ass/srt/best').
For each format the 'url' field can be set so that we only download the contents if needed, or if the contents needs to be processed (like in crunchyroll) the 'data' field can be used.

The reasons for this change are:
* We weren't checking that the format given with '--sub-format' was available, checking it in each extractor would be repetitive.
* It allows to easily support giving a format preference.
* The subtitles were automatically downloaded in the extractor, but I think that if you use for example the '--dump-json' option you want to finish as fast as possible.

Currently only the ted extractor has been updated, but the old system still works.
Philipp Hagemeister 03cd72b007 [extractor/common] Move up filesize
filesize and tbr should correlate, so it doesn't make sense to treat them differently.
Jaime Marquínez Ferrándiz 6ca7732d5e [extractor/common] Fix link to external documentation
Jaime Marquínez Ferrándiz 2d30521ab9 [youtube] Extract average rating (closes )
Philipp Hagemeister 9650885be9 [escapist] Filter video differently (Fixes )
Philipp Hagemeister 7e5db8c930 [options] Add --no-color
Philipp Hagemeister 3a5bcd0326 [extractor/common] Wrap extractor errors (Fixes )
For now, we just wrap some common errors. More may follow. We do not want to catch actual programming errors in the extractors, such as 1 // 0.
Naglis Jonaitis 69319969de [extractor/common] Add new helper method _family_friendly_search
Philipp Hagemeister 1e1896f2de [extractor/common] Correct sort order.
We should look at height and width before ext_preference.
Sergey M․ 3900eec27c [extractor/common] Fix 2.0 manifest extraction (Closes )
Sergey M․ 60ca389c64 [extractor/common] Prefix f4m/m3u8 entries with identifier
Philipp Hagemeister 9bb8e0a3f9 [wsj] Add new extractor (Fixes )
Philipp Hagemeister 1a6373ef39 [sort_formats] Prefer bitrate over video size
720p @ 1000KB/s looks way better than 1080p @ 500KB/s
Philipp Hagemeister 995029a142 [nerdist] Add new extractor (Fixes )
Philipp Hagemeister b04b885271 [extractor/common] Document all protocol values
Sergey M․ 96a53167fa [common] Generalize URLs' HTTP errors pre-testing
Philipp Hagemeister 3dee7826e7 [rtl2] PEP8, simplify, make rtmp tests run ()
Philipp Hagemeister cfb56d1af3 Add --list-thumbnails
Jaime Marquínez Ferrándiz e1554a407d [extractors] Use http_headers for setting the User-Agent and the Referer
Philipp Hagemeister 121c09c7be Merge remote-tracking branch 'Dineshs91/f4m-2.0'
Philipp Hagemeister 6271f1cad9 [youtube|ffmpeg] Automatically correct video with non-square pixels (Fixes )
Philipp Hagemeister ff21a8e0ee Merge remote-tracking branch 'Tithen-Firion/master'
Philipp Hagemeister dd622d7c4e [netzkino] Add new extractor (Fixes )
Philipp Hagemeister bec2248141 [InfoExtractor/common] Correct and test meta tag matching
Philipp Hagemeister 0590062925 Respect age_limit when listing extractors (Fixes )
Philipp Hagemeister e65566a9cc [youtube] Correct handling when DASH manifest is not necessary to find all formats
Sergey M․ 6c6f1408f2 [extractor/common] Allow multiline content tags
Jaime Marquínez Ferrándiz 5d3808524d [extractor/common] Update docstring: replace FileDownloader with YoutubeDL
Philipp Hagemeister bf94e38d3d Merge remote-tracking branch 'Tithen-Firion/hsw-update'
Philipp Hagemeister f5e43bc695 [vine] Provide alt_title (Fixes )
Sergey M․ e89a2aabed [extractor/common] Add generic SMIL formats extraction routine
Philipp Hagemeister f58766ce5c [extractor/common] Document ie_key in url results
Sergey M․ acf5cbfe93 [extractor/common] Add description to playlist_result
Philipp Hagemeister b82f815f37 Allow iterators for playlist result entries
Tithen-Firion ebb6419960 [common] Split _download_json
Add ability for extractor to use _parse_json
Tithen-Firion 995ad69c54 [common] Add new parameters for _download_webpage
Philipp Hagemeister 810fb84d5e pep8 and minor beautification all around
Jaime Marquínez Ferrándiz 42939b6129 [youtube] Use a cookie for seeting the language
This way, we don't have to do an aditional request
Philipp Hagemeister 4e262a8838 [generic] Detect direct video links (Fixes , )
Jouke Waleson 9e1a5b8455 PEP8: applied even more rules
Jouke Waleson 5f6a1245ff PEP8 applied
Philipp Hagemeister fed5d03260 [extractor/common] Document _type values (Motivated by )
Philipp Hagemeister aff2f4f4f5 [arte] Clean up format sorting mess
We now use our standard sorting facilities. As a side effect, it's finally possible to download German videos from French URLs and vice versa.
Philipp Hagemeister 711ede6e1b [heise] Fix description, thumbnail and format ID
Philipp Hagemeister 8c25f81bee [util] Move compatibility functions out of util
utils is large enough without these compatibility functions.

Everything that is present in newer versions of Python (i.e. with dev Python it's just an import) goes into compat.py .
Everything else (i.e. youtube-dl-specific helpers) goes into utils.py .
Philipp Hagemeister 2c8e03d937 Sort formats by fps as well
Philipp Hagemeister fbb21cf528 [youtube] Add formats 298, 299 (Fixes )
Philipp Hagemeister 81515ad9f6 [extractor/common] Improve m3u8 output
Philipp Hagemeister 23be51d8ce [generic] Handle audio streams that do not implement HEAD (Fixes )
Philipp Hagemeister c64ed2a310 [viddler] Use API
Philipp Hagemeister 1ede5b2481 [glide] Simplify
dinesh 7a47d07c6d [extractor/common] href attribute added
dinesh 34e48bed3b [extractor/common] Added support for f4m manifest Version 2.0
Sergey M․ 5f58165def [extractor/common] Fix dumping requests with long file abspath on Windows
Philipp Hagemeister d838b1bd4a [utils] Default age_limit to None
If we can't parse it, it means we don't have any information, not that the content is unrestricted.
Philipp Hagemeister e7b6d12254 [utils] Improve and test js_to_json
Philipp Hagemeister b14f3a4c1d [golem] Simplify ()
Philipp Hagemeister ed9266db90 [common] Add new helper function _match_id
Philipp Hagemeister f4b1c7adb8 [muenchentv] Move live title generation to common
Philipp Hagemeister f0b5d6af74 [vevo] Support 1080p videos (Fixes )
Philipp Hagemeister 7267bd536f [muenchentv] Add support (Fixes )
Sergey M․ 9ebf22b7d9 [common] Improve codecs extraction from m3u8
Philipp Hagemeister daebaab692 [extractor/common] Correct typo
Philipp Hagemeister 3524cc25ca [sportdeutschland] Add support for more plain videos
Philipp Hagemeister f1a9d64eea [extractor/common] Modernize
Philipp Hagemeister da9ec3b932 [muscivault] Add extractor (Fixes )
Philipp Hagemeister 704df56da7 [sportdeutschland] add new extractor
Philipp Hagemeister b252735910 [extractor/common] Generate better f4m format IDs
Philipp Hagemeister 9480d1a566 Merge remote-tracking branch 'riking/twofactor'
Philipp Hagemeister d769be6c96 [grooveshark,http] Make HTTP POST downloads work
Philipp Hagemeister a36819731b [escapist] Add support for og:video:url (Fixes )
riking 165250ff5e Remove debug prints
riking 83317f6938 [youtube] Add two-factor account signin (TOTP only)
Additional work is required to prompt the user for the SMS or phone call codes, as there is no framework currently to prompt the user during an extraction operation.

Fixes 
Jaime Marquínez Ferrándiz f036a6328e [extractor/common] _extract_f4m_formats: Use more specific messages when downloading the manifest
Jaime Marquínez Ferrándiz 31bb8d3f51 [bloomberg] Extract the available formats (closes )
It uses a helper method in the InfoExtractor class.
The downloader will pick the requested formats using the bitrate in the info dict.
Philipp Hagemeister c3415d1bac [extractor/common] PEP8
Philipp Hagemeister b090af5922 [vube] Fix comment count
Philipp Hagemeister 1a30deca50 [teachertube] Fix title and playlist recognition
Philipp Hagemeister 9732d77ed2 [snotr] PEP8 and minor fixes ()
Philipp Hagemeister 40c696e5c6 [screencast] Add suppot for more video types ()
Philipp Hagemeister 4094b6e36d [vodlocker] PEP8, generalization, and simplification ()
Jaime Marquínez Ferrándiz 78338f71ca [livestream:original] Add support for folder urls (closes )
The webpage only contains shortened links for the videos, since the server
doesn't support HEAD requests, we use an specific extractor for them.
Philipp Hagemeister d551980823 [spiegeltv] Simplify and PEP8
Philipp Hagemeister ad3bc6acd5 Document and test categories ()
Philipp Hagemeister 5afa7f8bee [extractor/common] --write-pages: Correct file name if video_id is None
Philipp Hagemeister 57c7411f46 [mixcloud] Shed API dependency ()
Philipp Hagemeister c1bce22f23 [extractor/common] Protect against long video IDs and URLs
Philipp Hagemeister 2099125333 [soundcloud/generic] Add support for playlists
Philipp Hagemeister 28746fbd59 [bilibili] Add preliminary support ()
The URL http://www.bilibili.tv/video/av636603/index_2.html does not work yet.
Anisse Astier ec0fafbb19 [extractor/common] fallback on utf-8 when charset is not found
fixes 
Philipp Hagemeister b6cfde99b7 Only mention websense URL once
Philipp Hagemeister 2410c43d83 Detect Websense censorship (Fixes )
Philipp Hagemeister 38d63d846e [extractor/common] Clarify preference key in formats
Philipp Hagemeister 955c451456 Rename upload_timestamp to timestamp
Philipp Hagemeister 9d2ecdbc71 [vevo] Centralize timestamp handling
Philipp Hagemeister 5a25f39653 Correct extractor documentation
Philipp Hagemeister 9f62eaf4ef [canal13cl] Add test and improve extraction ()
Philipp Hagemeister 0afef30b23 Add display_id field
Philipp Hagemeister 81c2f20b53 [youtube] Correct invalid JSON (Fixes )
dst c1206423c4 Fix extraction of og content in single quotes
Jaime Marquínez Ferrándiz 0c708f11cb [bloomberg] Fix ooyala url extraction
Added a helper method to InfoExtractor for searching the ‘twitter:player’ meta property.
Now the OoyalaIE also recognizes the ‘ec’ parameter in the url as the embed code.
Philipp Hagemeister 7e8caf30c0 Throw an error if no video formats are found
Philipp Hagemeister db1f388878 [huffpost] Add support
Jaime Marquínez Ferrándiz 944d65c762 [extractor/common] Encode the url when calculating the md5 with `—write-pages` option
This doesn’t cause any problem in python 2.*, but on python 3 the `md5` function only accepts bytes.
Philipp Hagemeister 1394ce65b4 [youtube] Add new formats (Fixes )
Philipp Hagemeister 50317b111d Merge branch 'youtube-dash-manifest'
Conflicts:
	youtube_dl/extractor/youtube.py
Philipp Hagemeister 9d4288b2d4 [extractor/common] Clarify when and when not we generate the filename
Philipp Hagemeister b60016e831 Deal with implicitly UTF-16 decoded webpages
These webpages don't specify an encoding and rely on the BOM
Philipp Hagemeister dd27fd1739 [youtube] Download DASH manifest
If given, download and parse the DASH manifest file, in order to get ultra-HQ formats.
Fixes 
Philipp Hagemeister 3ec05685f7 [extractor/common] Limit --write-pages filename to 200 chars
This avoids problems with very long URLs.
Philipp Hagemeister 9933b57430 [pornhub] Use centralized sorting
Philipp Hagemeister 3d3538e422 [khanacademy] Add support (Fixes )
Philipp Hagemeister 5d73273f6f [orf] Use new extraction method (Fixes )
Philipp Hagemeister 9887c9b2d6 [jpopsuki] Simplify
Philipp Hagemeister 08d13955dd [wistia] Prefer original video format above all others
We could also set up a formula which would weigh filesize/bitrate and vcodec/acodec (say, 1GB h264 < 3 GB MPEG2 < 2 GB h264), but that would get really messy real soon.
Philipp Hagemeister 5d4f3985be Document that format_id field should be present
Philipp Hagemeister 7217e148fb [yahoo] Use centralized sorting, and add tbr field
Philipp Hagemeister c7deaa4c74 [zdf] Use centralized sorting
Philipp Hagemeister e6812ac99d [spiegel] Use centralized sorting
Philipp Hagemeister 4bcc7bd1f2 Add temporary _sort_formats helper function
Philipp Hagemeister f49d89ee04 Add a resolution field and improve general --list-formats output
Philipp Hagemeister f45f96f8f8 [myvideo] Use RTMP instead of RTMPT (Fixes )
Philipp Hagemeister 1538eff6d8 [bliptv] Remove support for direct downloads
This is now handled by the generic IE
Philipp Hagemeister aa94a6d315 [aparat] Add support (Fixes )
Jaime Marquínez Ferrándiz c0d0b01f0e [generic] Detect ooyala videos (fixes )
Philipp Hagemeister 46374a56b2 [youtube] Do not warn for videos with allow_rating=0
This fixes 
Test video: http://www.youtube.com/watch?v=gi2uH3YxohU
Itay Brandes 87a28127d2 _search_regex's "isatty" call fails with Py2exe's
_search_regex calls the sys.stderr.isatty() function for unix systems.

Py2exe uses a custom Stderr() stream which doesn't have an `isatty()`
function, leading to it's crash.

Fixes easily with checking that it's a unix system first.
Philipp Hagemeister d67b0b1596 Reorder info_dict documentation
Philipp Hagemeister c0ba0f4859 Document duration field
Philipp Hagemeister e2b38da931 [mtv] Fixup incorrectly encoded XML documents
Philipp Hagemeister 7cc3570e53 Add fatal=False parameter to _download_* functions.
This allows us to simplify the calls in the youtube extractor even further.
Philipp Hagemeister 19e3dfc9f8 [9gag] Like/dislike count ()
Philipp Hagemeister aaebed13a8 [smotri] Simplify
Philipp Hagemeister 2a275ab007 [zdf] Use _download_xml
Philipp Hagemeister 79d09f47c2 Merge branch 'opener-to-ydl'
Philipp Hagemeister c059bdd432 Remove quality_name field and improve zdf extractor
Philipp Hagemeister 02dbf93f0e [zdf/common] Use API in ZDF extractor.
This also comes with a lot of extra format fields
Fixes 
Philipp Hagemeister e03db0a077 Merge branch 'master' into opener-to-ydl
Jaime Marquínez Ferrándiz 267ed0c5d3 [collegehumor] Encode the xml before calling xml.etree.ElementTree.fromstring (fixes )
Uses a new helper method in InfoExtractor: _download_xml
Philipp Hagemeister 7012b23c94 Match --download-archive during playlist processing (Fixes )
Philipp Hagemeister dca0872056 Move the opener to the YoutubeDL object.
This is the first step towards being able to just import youtube_dl and start using it.
Apart from removing global state, this would fix problems like .
Philipp Hagemeister 5904088811 Add support for tou.tv (Fixes )
Philipp Hagemeister 91c7271aab Add automatic generation of format note based on bitrate and codecs
Jaime Marquínez Ferrándiz 78fb87b283 Don't accept '>' inside the content attribute in OpenGraph regexes
Jaime Marquínez Ferrándiz ab2d524780 Improve the OpenGraph regex
* Do not accept '>' between the property and content attributes.
* Recognize the properties if the content attribute is before the property attribute using two regexes (fixes the extraction of the description for SlideshareIE).
Philipp Hagemeister eb0a839866 [common] Simplify og_search_property
Marcin Cieślak a8eeb0597b Fix AssertionError when og property not found
On tvp.pl some webpages contain OpenGraph
metadata and some don't.

If og property is not found, _og_search_description
fails with

WARNING: unable to extract OpenGraph description; please report this issue on http://yt-dl.org/bug
Traceback (most recent call last):
  File "/usr/home/saper/bin/youtube-dl", line 18, in <module>
    youtube_dl.main()
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/__init__.py", line 766, in main
    _real_main(argv)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/__init__.py", line 719, in _real_main
    retcode = ydl.download(all_urls)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/YoutubeDL.py", line 715, in download
    videos = self.extract_info(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/YoutubeDL.py", line 348, in extract_info
    ie_result = ie.extract(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 125, in extract
    return self._real_extract(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/tvp.py", line 56, in _real_extract
    info['description'] = self._og_search_description(webpage)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 331, in _og_search_description
    return self._og_search_property('description', html, fatal=False, **kargs)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 325, in _og_search_property
    return unescapeHTML(escaped)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/utils.py", line 494, in unescapeHTML
    assert type(s) == type(u'')
AssertionError

The patch allows me to use:

  try:
    info['description'] = self._og_search_description(webpage)
    info['thumbnail'] = self._og_search_thumbnail(webpage)
  except RegexNotFoundError:
    pass
Jaime Marquínez Ferrándiz 9103bbc5cd Add the 'webpage_url' field to info_dict
The url for the video page, it must allow to reproduce the result.
It's automatically set by YoutubeDL if it's missing.
Philipp Hagemeister b5d0d817bc Remove superfluous space
Philipp Hagemeister ebc14f251c Merge remote-tracking branch 'origin/master'
Philipp Hagemeister d41e6efc85 New debug option --write-pages
Filippo Valsorda 8ffa13e03e [Instagram] get the non-https link, as they are serving Akamai cert from a instagram.com domain
Jaime Marquínez Ferrándiz 55b3e45bba [vimeo] Fix pro videos and player.vimeo.com urls
The old process can still be used for those videos.
Added RegexNotFoundError, which is raised by _search_regex if it can't extract the info.
Jaime Marquínez Ferrándiz 8c51aa6506 The 'format' field now defaults to '{format_id} - {width}x{height}{format_note}'
Following the YoutubeIE format. The 'format_note' gives additional info about the format, for example '3D' or 'DASH video'.
Philipp Hagemeister 416a5efce7 fix typos
Philipp Hagemeister 8dbe9899a9 Allow users to specify an age limit (fixes )
With these changes, users can now restrict what videos are downloaded by the intented audience, by specifying their age with --age-limit YEARS .
Add rudimentary support in youtube, pornotube, and youporn.
Philipp Hagemeister 2f5865cc6d Clarify that url and ext are optional when formats is given ()
Philipp Hagemeister deefc05b88 Document formats (for )
Jaime Marquínez Ferrándiz 0d75ae2ce3 Fix detection of the webpage charset if it's declared using ' instead of "
Like in "<meta charset='utf-8'/>"
Philipp Hagemeister f143d86ad2 [sohu] Handle encoding, and fix tests
Philipp Hagemeister 6d69d03bac Merge remote-tracking branch 'origin/reuse_ies'
Philipp Hagemeister 2eabb80254 [addanime] improve
Jaime Marquínez Ferrándiz 9e9c164052 Merge pull request from jaimeMF/subtitles_rework
Subtitles rework
Philipp Hagemeister 79cb25776f Cache suitable regular expressions
This speeds up TestAllURLsMatching.test_no_duplicates by about 8000% at the cost of minimal memory overhead.
Jaime Marquínez Ferrándiz 5d51a883c2 Use a dictionary for storing the subtitles
The errors while getting the subtitles are reported as warnings, if no subtitles are found return and empty dict.
Philipp Hagemeister f38de77f6e Use unescapeHTML for OpenGraph properties
These are attribute values, so we don't need the more complex and whitespace-destroying cleanHTML - we just need to unescape quotes, that's it.
Philipp Hagemeister b9d3e1635f Strip hash info from URL when making requests (Fixes )
Philipp Hagemeister 3c4e6d8337 Improve OpenGraph property matching
Jaime Marquínez Ferrándiz 44dbe89035 Use re.DOTALL by default when searching OpenGraph properties
Jaime Marquínez Ferrándiz 46720279c2 InfoExtractor: add some helper methods to extract OpenGraph info
Philipp Hagemeister 690e872c51 Remove video_result helper method
Calling it was more complex then actually including the type in the video info
Jaime Marquínez Ferrándiz 56c7366547 YoutubeIE: reuse instances of InfoExtractors (closes )
When a IE is added to the list, it's also added to a dictionary. When a IE is requested it first looks in the dictionary and if there's no instance it will create a new one.

That way _real_initialize is only called once for each IE, saving time if it needs to login for example.
Philipp Hagemeister d93e4dcbb7 Merge branch 'master' of github.com:rg3/youtube-dl
Philipp Hagemeister 73e79f2a1b [3sat] Add support (Fixes )
Jaime Marquínez Ferrándiz fc79158de2 VimeoIE: authentication support (closes ) and add a method in the base InfoExtractor to get the login info
Philipp Hagemeister 0f81866329 Add --list-extractor-descriptions (human-readable list of IEs)
Philipp Hagemeister f3d294617f Document view_count (Closes )
Filippo Valsorda 98bcd2834a improve generic and encrypted signature error messages
Philipp Hagemeister 3c25b9abae Remove useless headers
Philipp Hagemeister d6983cb460 Fix generic class move (add all files)