Commit Graph

294 Commits (2b4f5e68d1517bcadac4b25ecbac3b143104b1c4)

Author SHA1 Message Date
Philipp Hagemeister 02dbf93f0e [zdf/common] Use API in ZDF extractor.
This also comes with a lot of extra format fields
Fixes 
Philipp Hagemeister e03db0a077 Merge branch 'master' into opener-to-ydl
Jaime Marquínez Ferrándiz 267ed0c5d3 [collegehumor] Encode the xml before calling xml.etree.ElementTree.fromstring (fixes )
Uses a new helper method in InfoExtractor: _download_xml
Philipp Hagemeister 7012b23c94 Match --download-archive during playlist processing (Fixes )
Philipp Hagemeister dca0872056 Move the opener to the YoutubeDL object.
This is the first step towards being able to just import youtube_dl and start using it.
Apart from removing global state, this would fix problems like .
Philipp Hagemeister 5904088811 Add support for tou.tv (Fixes )
Philipp Hagemeister 91c7271aab Add automatic generation of format note based on bitrate and codecs
Jaime Marquínez Ferrándiz 78fb87b283 Don't accept '>' inside the content attribute in OpenGraph regexes
Jaime Marquínez Ferrándiz ab2d524780 Improve the OpenGraph regex
* Do not accept '>' between the property and content attributes.
* Recognize the properties if the content attribute is before the property attribute using two regexes (fixes the extraction of the description for SlideshareIE).
Philipp Hagemeister eb0a839866 [common] Simplify og_search_property
Marcin Cieślak a8eeb0597b Fix AssertionError when og property not found
On tvp.pl some webpages contain OpenGraph
metadata and some don't.

If og property is not found, _og_search_description
fails with

WARNING: unable to extract OpenGraph description; please report this issue on http://yt-dl.org/bug
Traceback (most recent call last):
  File "/usr/home/saper/bin/youtube-dl", line 18, in <module>
    youtube_dl.main()
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/__init__.py", line 766, in main
    _real_main(argv)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/__init__.py", line 719, in _real_main
    retcode = ydl.download(all_urls)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/YoutubeDL.py", line 715, in download
    videos = self.extract_info(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/YoutubeDL.py", line 348, in extract_info
    ie_result = ie.extract(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 125, in extract
    return self._real_extract(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/tvp.py", line 56, in _real_extract
    info['description'] = self._og_search_description(webpage)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 331, in _og_search_description
    return self._og_search_property('description', html, fatal=False, **kargs)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 325, in _og_search_property
    return unescapeHTML(escaped)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/utils.py", line 494, in unescapeHTML
    assert type(s) == type(u'')
AssertionError

The patch allows me to use:

  try:
    info['description'] = self._og_search_description(webpage)
    info['thumbnail'] = self._og_search_thumbnail(webpage)
  except RegexNotFoundError:
    pass
Jaime Marquínez Ferrándiz 9103bbc5cd Add the 'webpage_url' field to info_dict
The url for the video page, it must allow to reproduce the result.
It's automatically set by YoutubeDL if it's missing.
Philipp Hagemeister b5d0d817bc Remove superfluous space
Philipp Hagemeister ebc14f251c Merge remote-tracking branch 'origin/master'
Philipp Hagemeister d41e6efc85 New debug option --write-pages
Filippo Valsorda 8ffa13e03e [Instagram] get the non-https link, as they are serving Akamai cert from a instagram.com domain
Jaime Marquínez Ferrándiz 55b3e45bba [vimeo] Fix pro videos and player.vimeo.com urls
The old process can still be used for those videos.
Added RegexNotFoundError, which is raised by _search_regex if it can't extract the info.
Jaime Marquínez Ferrándiz 8c51aa6506 The 'format' field now defaults to '{format_id} - {width}x{height}{format_note}'
Following the YoutubeIE format. The 'format_note' gives additional info about the format, for example '3D' or 'DASH video'.
Philipp Hagemeister 416a5efce7 fix typos
Philipp Hagemeister 8dbe9899a9 Allow users to specify an age limit (fixes )
With these changes, users can now restrict what videos are downloaded by the intented audience, by specifying their age with --age-limit YEARS .
Add rudimentary support in youtube, pornotube, and youporn.
Philipp Hagemeister 2f5865cc6d Clarify that url and ext are optional when formats is given ()
Philipp Hagemeister deefc05b88 Document formats (for )
Jaime Marquínez Ferrándiz 0d75ae2ce3 Fix detection of the webpage charset if it's declared using ' instead of "
Like in "<meta charset='utf-8'/>"
Philipp Hagemeister f143d86ad2 [sohu] Handle encoding, and fix tests
Philipp Hagemeister 6d69d03bac Merge remote-tracking branch 'origin/reuse_ies'
Philipp Hagemeister 2eabb80254 [addanime] improve
Jaime Marquínez Ferrándiz 9e9c164052 Merge pull request from jaimeMF/subtitles_rework
Subtitles rework
Philipp Hagemeister 79cb25776f Cache suitable regular expressions
This speeds up TestAllURLsMatching.test_no_duplicates by about 8000% at the cost of minimal memory overhead.
Jaime Marquínez Ferrándiz 5d51a883c2 Use a dictionary for storing the subtitles
The errors while getting the subtitles are reported as warnings, if no subtitles are found return and empty dict.
Philipp Hagemeister f38de77f6e Use unescapeHTML for OpenGraph properties
These are attribute values, so we don't need the more complex and whitespace-destroying cleanHTML - we just need to unescape quotes, that's it.
Philipp Hagemeister b9d3e1635f Strip hash info from URL when making requests (Fixes )
Philipp Hagemeister 3c4e6d8337 Improve OpenGraph property matching
Jaime Marquínez Ferrándiz 44dbe89035 Use re.DOTALL by default when searching OpenGraph properties
Jaime Marquínez Ferrándiz 46720279c2 InfoExtractor: add some helper methods to extract OpenGraph info
Philipp Hagemeister 690e872c51 Remove video_result helper method
Calling it was more complex then actually including the type in the video info
Jaime Marquínez Ferrándiz 56c7366547 YoutubeIE: reuse instances of InfoExtractors (closes )
When a IE is added to the list, it's also added to a dictionary. When a IE is requested it first looks in the dictionary and if there's no instance it will create a new one.

That way _real_initialize is only called once for each IE, saving time if it needs to login for example.
Philipp Hagemeister d93e4dcbb7 Merge branch 'master' of github.com:rg3/youtube-dl
Philipp Hagemeister 73e79f2a1b [3sat] Add support (Fixes )
Jaime Marquínez Ferrándiz fc79158de2 VimeoIE: authentication support (closes ) and add a method in the base InfoExtractor to get the login info
Philipp Hagemeister 0f81866329 Add --list-extractor-descriptions (human-readable list of IEs)
Philipp Hagemeister f3d294617f Document view_count (Closes )
Filippo Valsorda 98bcd2834a improve generic and encrypted signature error messages
Philipp Hagemeister 3c25b9abae Remove useless headers
Philipp Hagemeister d6983cb460 Fix generic class move (add all files)