Commit Graph

499 Commits (6bb0fbf9fb60047087b44c6e147ed071fe0dc41f)

Author SHA1 Message Date
Yen Chi Hsuan 622d19160b [utils] Clarify Python versions affected by buggy struct module 9 years ago
Yen Chi Hsuan efbed08dc2 [utils] Encode hostnames before passing to urllib
With IDN (Internationalized Domain Name) and a proxy, non-ascii URLs
are passed down to urllib/urllib2, causing UnicodeEncodeError

Fixes #8890
9 years ago
Jaime Marquínez Ferrándiz 782b1b5bd1 [utils] lookup_unit_table: Match word boundary instead of end of string 9 years ago
Jaime Marquínez Ferrándiz 09fc33198a utils: lookup_unit_table: Use a stricter regex
In parse_count multiple units start with the same letter, so it would match different units depending on the order they were sorted when iterating over them.
9 years ago
Sergey M․ 810c10baa1 [utils] Use compat_xpath 9 years ago
Sergey M․ c5229f3926 [utils] PEP 8 9 years ago
remitamine 83548824c2 Merge pull request #8092 from bpfoley/twitter-thumbnail
[utils] Add extract_attributes for extracting html tag attributes
9 years ago
Sergey M․ 2f7ae819ac [utils] PEP 8 9 years ago
Sergey M․ fb47597b09 [bbc] Generalize unit table lookup and add parse_count 9 years ago
Yen Chi Hsuan 25cb05bda9 [utils] Remove codec2ext
This function is orignally used for determining file extensions of DASH
formats. Now in DASH, ext is determined by mime_type. See #8766 for more
information.
9 years ago
Yen Chi Hsuan 6d210f2090 [utils] Add more codecs to codec2ext
BBC uses avc3. Here's an example (thanks to @remitamine for this example)

http://rdmedia.bbc.co.uk/dash/ondemand/bbb/2/client_manifest-common_init.mpd

See also https://trac.ffmpeg.org/ticket/5217
9 years ago
Yen Chi Hsuan 19a17d4623 [utils] Add codec2ext 9 years ago
Jaime Marquínez Ferrándiz 3233a68fbb [utils] update_url_query: Encode the strings in the query dict
The test case with {'test': '第二行тест'} was failing on python 2 (the non-ascii characters were replaced with '?').
9 years ago
remitamine 1255733945 Merge pull request #8739 from remitamine/update_url_params
[utils] add update_url_query function to create or update query string params
9 years ago
remitamine 38f9ef31dc [utils] add update_url_query function 9 years ago
Yen Chi Hsuan 0cae023b24 Merge branch 'jython-support'
Closes #8302
9 years ago
Yen Chi Hsuan 8ee239e921 [utils] Jython support - handle filenames correctly
Now test:youtube downloads
9 years ago
Brian Foley 8bb56eeeea [utils] Add extract_attributes for extracting html tag attributes
This is much more robust than just using regexps, and handles all
the common scenarios, such as empty/no values, repeated attributes,
entity decoding, mixed case names, and the different possible value
quoting schemes.
9 years ago
remitamine e07237f640 [utils] remove check for val from find_xpath_attr 9 years ago
Yen Chi Hsuan 5eb6bdced4 [utils] Multiple changes to base_n()
1. Renamed to encode_base_n()
2. Allow tables longer than 62 characters
3. Raise ValueError instead of AssertionError for invalid input data
4. Return the first character in the table instead of '0' for number 0
5. Add tests
9 years ago
Yen Chi Hsuan 680079be39 [utils] Relaxing regex in decode_packed_codes for vidzi 9 years ago
Yen Chi Hsuan f52354a889 [utils] Move codes for handling eval() from iqiyi.py 9 years ago
Yen Chi Hsuan 59f898b7a7 [utils] Merge base_n functions 9 years ago
Yen Chi Hsuan 481888294d [utils] Add base36 for use in Vidzi 9 years ago
Yen Chi Hsuan 81bdc8fdf6 [utils] Move base62 to utils 9 years ago
Sergey M․ f160785c5c [utils] Remove AM/PM from unified_strdate patterns 9 years ago
Yen Chi Hsuan b95dc034ca [utils] Implement cache for OnDemandPagedList 9 years ago
remitamine cafcf657a4 add more subtitles mime types to mimetype2ext and fix the platform subtitle extraction 9 years ago
Yen Chi Hsuan c1c05c67ea [utils] Jython support - disable setproctitle() until ctypes is complete 9 years ago
Yen Chi Hsuan 399a76e67b [utils] Jython support: tolerate missing fcntl module 9 years ago
Jaime Marquínez Ferrándiz 765ac263db [utils] mimetype2ext: return 'm4a' for 'audio/mp4' (fixes #8620)
The youtube extractor was using 'mp4' for them, therefore filters like 'bestaudio[ext=m4a]' stopped working (94278f7202 broke it).
9 years ago
Yen Chi Hsuan 5bc880b988 [utils] Add OHDave's RSA encryption function 9 years ago
Sergey M․ 611c1dd96e [refactor] Single quotes consistency 9 years ago
Sergey M․ d800609c62 [refactor] Do not specify redundant None as second argument in dict.get() 9 years ago
Sergey M․ 9c7b38981c [utils] Bump Firefox version in User-Agent
Old version number causes Youtube not to serve some formats in ytplayer.config
9 years ago
Sergey M․ 8411229bd5 [utils] Allow dot in strip_jsonp 9 years ago
Sergey M․ 86296ad2cd [utils] Add ability to control skipping false values in dict_get 9 years ago
Sergey M․ cbecc9b903 [utils] Add dict_get convenience method 9 years ago
Jaime Marquínez Ferrándiz 87de7069b9 [utils] dfxp2srt: make TTMLPElementParser inherit from object
For consistency between python 2 and 3.
9 years ago
remitamine 2b14cb566f [utils] fix dfxp2srt text extraction(fixes #8055) 9 years ago
Yen Chi Hsuan a0d8d704df [utils] Reorder items in mimetype2ext alphabetically 9 years ago
Yen Chi Hsuan f6861ec96f [utils] Add more items to mimetype2ext (#8293)
These are used in Youtube formats
9 years ago
remitamine 6ec6cb4e95 Revert "fix typos"
This reverts commit 36a0e46c39.
9 years ago
remitamine 36a0e46c39 fix typos 9 years ago
Jakub Wilk dfb1b1468c Fix typos
Closes #8200.
9 years ago
Sergey M․ a7aaa39863 [utils] Extract known extensions for reuse 9 years ago
Yen Chi Hsuan c047270c02 [utils] Remove Content-encoding from headers after decompression
With cn_verification_proxy, our http_response() is called twice, one from
PerRequestProxyHandler.proxy_open() and another from normal
YoutubeDL.urlopen(). As a result, for proxies honoring Accept-Encoding, the
following bug occurs:

$ youtube-dl -vs --cn-verification-proxy https://secure.uku.im:993 "test:letv"
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['-vs', '--cn-verification-proxy', 'https://secure.uku.im:993', 'test:letv']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2015.12.23
[debug] Git HEAD: 97f18fa
[debug] Python version 3.5.1 - Linux-4.3.3-1-ARCH-x86_64-with-arch-Arch-Linux
[debug] exe versions: ffmpeg 2.8.4, ffprobe 2.8.4, rtmpdump 2.4
[debug] Proxy map: {}
[TestURL] Test URL: http://www.letv.com/ptv/vplay/22005890.html
[Letv] 22005890: Downloading webpage
[Letv] 22005890: Downloading playJson data
ERROR: Unable to download JSON metadata: Not a gzipped file (b'{"') (caused by OSError('Not a gzipped file (b\'{"\')',)); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/extractor/common.py", line 330, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/YoutubeDL.py", line 1886, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python3.5/urllib/request.py", line 471, in open
    response = meth(req, response)
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/utils.py", line 773, in http_response
    raise original_ioerror
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/utils.py", line 761, in http_response
    uncompressed = io.BytesIO(gz.read())
  File "/usr/lib/python3.5/gzip.py", line 274, in read
    return self._buffer.read(size)
  File "/usr/lib/python3.5/gzip.py", line 461, in read
    if not self._read_gzip_header():
  File "/usr/lib/python3.5/gzip.py", line 409, in _read_gzip_header
    raise OSError('Not a gzipped file (%r)' % magic)
9 years ago
Sergey M․ 9b9c5355e4 Rename error_to_str to error_to_compat_str 9 years ago
Sergey M․ 8e60dc7526 [utils] Add encode_compat_str 9 years ago
Sergey M․ fdae235858 [utils] Add error_to_str 9 years ago
Yen Chi Hsuan db2fe38b55 [utils] Support alternative timestamp format in TTML
Fixes #7608
9 years ago
Yen Chi Hsuan d631d5f9f2 [utils] Fix TTML conversion
Tolerate invalid timestamps (closes #7909)
9 years ago
Sergey M․ 31b2051e21 [utils] Add remove_quotes 9 years ago
Yen Chi Hsuan 992fc9d6e1 [utils] Refactor handle_youtubedl_headers for future extension 9 years ago
Yen Chi Hsuan 0424ec307b [utils] Correct docstring of YoutubeDLHandler 9 years ago
Yen Chi Hsuan 87f0e62d94 [utils] Separate codes for handling Youtubedl-* headers 9 years ago
Sergey M․ 67dda51722 Rename compat_urllib_request_Request to sanitized_Request and move to utils 9 years ago
Sergey M․ 9cb9a5df77 [utils] Check ext with trailing slash against the list of known extensions 9 years ago
Sergey M․ 3e12bc583a [utils] Improve determine_ext (Closes #7593) 9 years ago
Sergey M․ 7e1f5447e7 [utils] Improve encode_dict 9 years ago
Sergey M․ 7a3f0c00ad [utils] Style 9 years ago
Sergey M․ 7aefc49c40 [utils] Skip invalid/non HTML entities (Closes #7518) 9 years ago
Jaime Marquínez Ferrándiz 6a75040278 [utils] unified_strdate: Return None if the date format can't be recognized (fixes #7340)
This issue was introduced with ae12bc3ebb, it returned 'None'.
9 years ago
Sergey M․ c90d16cf36 [utils:sanitize_path] Disallow trailing whitespace in path segment (Closes #7332) 9 years ago
Sergey M 30eecc6a04 Merge pull request #7296 from jaimeMF/xml_attrib_unicode
Use a wrapper around xml.etree.ElementTree.fromstring in python 2.x (…
9 years ago
Sergey M․ ae12bc3ebb [utils] Make unified_strdate always return unicode string 9 years ago
Sergey M․ 578c074575 [utils] Support list of xpath in xpath_element 9 years ago
Sergey M․ 52c3a6e49d [utils] Improve parse_iso8601 9 years ago
Jaime Marquínez Ferrándiz f78546272c [compat] compat_etree_fromstring: also decode the text attribute
Deletes parse_xml from utils, because it also does it.
9 years ago
Jaime Marquínez Ferrándiz 36e6f62cd0 Use a wrapper around xml.etree.ElementTree.fromstring in python 2.x (#7178)
Attributes aren't unicode objects, so they couldn't be directly used in info_dict fields (for example '--write-description' doesn't work with bytes).
9 years ago
Sergey M․ d01949dc89 [utils:js_to_json] Fix bad escape in double quoted strings 9 years ago
Yen Chi Hsuan 1e399778ee [letv] Fix extraction
Using data URIs for passing the decrypted M3U8 manifest, which is
supported by ffmpeg only.
9 years ago
Sergey M․ af98f8ff37 [utils] Return default on fail in int_or_none 9 years ago
Sergey M․ caf80631f0 [utils] Do not fail in float_or_none on non-numeric data 9 years ago
Sergey M․ 1812afb7b3 [utils] Do not fail in int_or_none on non-numeric data (Closes #7175) 9 years ago
Sergey M․ 5a1a2e9454 [utils] Fix kwargs on old python 2 (Closes #6905) 9 years ago
Sergey M․ e28034c5ac [utils] Comment cookie processing until result from travis and some more testing 9 years ago
Sergey M․ 266e466ee4 [utils] Simplify cookie processor 9 years ago
Sergey M․ 1639282434 [utils] Add encode_dict 9 years ago
Sergey M․ ad72917274 [utils] Add issue URL in comment for #6457 9 years ago
Sergey M․ a6420bf50c [utils] Add cookie processor for cookie correction (Closes #6769) 9 years ago
Sergey M․ 66e289bab4 [utils] Generalize cli option converters 9 years ago
Sergey M․ 8e636da499 [utils] Improve xpath_text 9 years ago
Sergey M․ 5d2354f177 [utils] Relax attribute key assert 9 years ago
Sergey M․ a41fb80ce1 [utils] Add xpath_element and xpath_attr 9 years ago
Sergey M․ e5e78797e6 [utils] Strict HTTP responses (Closes #6727) 9 years ago
Sergey M․ 5a4d9ddb21 [utils] Percent-encode redirect URL of Location header (Closes #6457) 10 years ago
Sergey M․ 51f267d9d4 [YoutubeDL:utils] Move percent encode non-ASCII URLs workaround to http_request and simplify (Closes #6457) 10 years ago
Sergey M․ ee114368ad [utils] Make value optional for find_xpath_attr
This allows selecting particular attributes by name but without specifying the value and similar to xpath syntax `[@attrib]`
10 years ago
Raphael Michel 2c7ed24796 Remove redundant (and wrong) class parameters 10 years ago
Yen Chi Hsuan 9c29bc69f7 [utils] Improve parse_duration
Now dots are parsed. For example '87 Min.'
10 years ago
Sergey M․ bf42a9906d [utils] Add default value for xpath_text 10 years ago
Yen Chi Hsuan 4eb10f6621 [utils] Add ISO3166Utils 10 years ago
Yen Chi Hsuan 4e33577173 [utils] Support ttaf1 namespace in TTML
It's found in bbc.co.uk. See #6038
10 years ago
Yen Chi Hsuan 396726244a [utils/ffmpeg] Move ISO 639 related codes to utils 10 years ago
Yen Chi Hsuan ecee572411 [yahoo] Add support for closed captions (closes #5714) 10 years ago
Yen Chi Hsuan 1b0427e6c4 [utils] Support TTML without default namespace
In a strict sense such TTML is invalid, but Yahoo uses it.
10 years ago
Yen Chi Hsuan c1c924abfe [utils,common] Merge format_srt_time and _subtitles_timecode
format_srt_time uses a comma as the delimiter between seconds and
milliseconds while _subtitles_timecode uses a dot. All .srt examples I
found on the Internet uses a comma, so I use a comma in the merged
version. See http://matroska.org/technical/specs/subtitles/srt.html and
http://devel.aegisub.org/wiki/SubtitleFormats/SRT
10 years ago
Yen Chi Hsuan 7dff03636a [utils] Support 'dur' field in TTML 10 years ago
Yen Chi Hsuan d39e0f05db [utils] Remove sanitize_url_path_consecutive_slashes()
This function is used only in SohuIE, which is updated to use a new
extraction logic.
10 years ago