Commit Graph

18863 Commits (eb55bad5a0c1af9388301ffbf17845ee53a41635)
 

Author SHA1 Message Date
Sergey M․ ce5b904050
[extractor/common] Relax interaction count extraction in _json_ld 4 years ago
Sergey M․ ad06b99dd4
[extractor/common] Extract author as uploader for VideoObject in _json_ld 4 years ago
JChris246 540b9f5164
[pornhub] Fix view count extraction (#26621) (refs #26614) 4 years ago
Jody Bruchon fd87f42378 Randomize the ArchiveTree the proper Python way
Signed-off-by: Jody Bruchon <jody@jodybruchon.com>
4 years ago
Tom-Oliver Heidel 53d50142e8 [skip travis] Update issue templates 4 years ago
Tom-Oliver Heidel c71700dbe4
Merge pull request #125 from jbruchon/master
Keep download archive in memory for better performance
4 years ago
Jody Bruchon 2459b6e1cf Style revisions 4 years ago
Jody Bruchon 4f0150dcec Merge remote-tracking branch 'upstream/master' 4 years ago
Unknown 35d3b674c7 [hotstar] regex the second. 4 years ago
Jody Bruchon a4d834fb3e Fix wrong variable in position swap corrupting archive list
It's always a simple error in the end, you know?

Signed-off-by: Jody Bruchon <jody@jodybruchon.com>
4 years ago
Jody Bruchon fda63a4e87 Randomize archive order before populating search tree
This doesn't result in an elegant, perfectly balanced search tree,
but it's absolutely good enough. This commit completely mitigates
the worst-case scenario where the archive file is sorted.

Signed-off-by: Jody Bruchon <jody@jodybruchon.com>
4 years ago
Stefan Pöschel 6e65a2a67e
[downloader/hls] Fix incorrect end byte in Range HTTP header for media segments with EXT-X-BYTERANGE (#24512) (closes #14748)
The end of the byte range is the first byte that is NOT part of the to
be downloaded range. So don't include it into the requested HTTP
download range, as this additional byte leads to a broken TS packet and
subsequently to e.g. visible video corruption.

Fixes #14748.
4 years ago
Jody Bruchon 1d74d8d9f6 Try to mitigate the problem of loading a fully sorted archive
Sorted archives turn the binary tree into a linked list and make
things horribly slow. This is an incomplete mitigation for this
issue.
4 years ago
Sergey M․ f8c7bed133
[extractor/common] Handle ssl.CertificateError in _request_webpage (closes #26601)
ssl.CertificateError is raised on some python versions <= 3.7.x
4 years ago
Sergey M․ cdc55e666f
[downloader/http] Improve timeout detection when reading block of data (refs #10935) 4 years ago
Ori Avtalion 86b7c00adc
[downloader/http] Retry download when urlopen times out (#26603) (refs #10935) 4 years ago
Jody Bruchon 1de7ea76f8 Remove recursion in at_insert() 4 years ago
Jody Bruchon a5029645ae Remove debugging print statements 4 years ago
Jody Bruchon ecdec1913f Keep download archive in memory for better performance
The old behavior was to open and scan the entire archive file for
every single video download. This resulted in horrible performance
for archives of any remotely large size, especially since all new
video IDs are appended to the end of the archive. For anyone who
uses the archive feature to maintain archives of entire video
playlists or channels, this meant that all such lists with newer
downloads would have to scan close to the end of the archive file
before the potential download was rejected. For archives with tens
of thousands of lines, this easily resulted in millions of line
reads and checks over the course of scanning a single channel or
playlist that had been seen previously.

The new behavior in this commit is to preload the archive file
into a binary search tree and scan the tree instead of constantly
scanning the file on disk for every file. When a new download is
appended to the archive file, it is also added to this tree. The
performance is massively better using this strategy over the more
"naive" line-by-line archive file parsing strategy.

The only negative consequence of this change is that the archive
in memory will not be synchronized with the archive file on disk.
Running multiple instances of the program at the same time that
all use the same archive file may result in duplicate archive
entries or duplicated downloads. This is unlikely to be a serious
issue for the vast majority of users. If the instances are not
likely to try to download identical video IDs then this should
not be a problem anyway; for example, having two instances pull
two completely different YouTube channels at once should be fine.

Signed-off-by: Jody Bruchon <jody@jodybruchon.com>
4 years ago
SeonjaeHyeon 217e517384
[naver] Add support for live videos 4 years ago
Unknown 7ac0ba50ce [hotstar] regex fix 4 years ago
Unknown fe84e2a391 [skip travis] winver 4 years ago
Unknown 17cb02d0c6 bump version 2020.09.16 4 years ago
Unknown 78895bd3a1 [Core] hls manifests, dynamic mpd 4 years ago
Tom-Oliver Heidel 08676fb591 Merge branch 'Zocker1999NET-ext/remuxe-video' 4 years ago
Tom-Oliver Heidel cd93279de8 Merge branch 'ext/remuxe-video' of https://github.com/Zocker1999NET/youtube-dl into Zocker1999NET-ext/remuxe-video 4 years ago
Tom-Oliver Heidel 89233ccbfb
Merge pull request #110 from JensTimmerman/patch-5
Update README.md
4 years ago
Jens Timmerman 8a92dee72c
Update README.md
cleanup + typo fix
4 years ago
Tom-Oliver Heidel 04e2a14b65 Merge branch 'tpikonen-elonet' 4 years ago
Tom-Oliver Heidel c11c64f318 Merge branch 'elonet' of https://github.com/tpikonen/youtube-dl into tpikonen-elonet 4 years ago
Tom-Oliver Heidel 4c7d0c13e1 Merge branch 'fix-mitele' of https://github.com/DjMoren/youtube-dl 4 years ago
Tom-Oliver Heidel b4b3a22dae Merge branch 'DjMoren-fix-mitele' 4 years ago
Tom-Oliver Heidel acdb1a4ec6 Merge branch 'arbitrary-merges' of https://github.com/fstirlitz/youtube-dlc 4 years ago
Unknown 1985f657e5 Merge branch 'ytdl-org-master' 4 years ago
felix d03cfdce1b Support arbitrary stream merges
With this change, the merge operator may join any number of media streams,
video or audio. The streams are downloaded in the order specified.

Also, fix the metadata post-processor so that it doesn't leave out
any streams.
4 years ago
Unknown e69dd78090 merge ytdl-master 4 years ago
Tom-Oliver Heidel 0e0b56a290
Merge pull request #105 from JensTimmerman/patch-3
Update README.md
4 years ago
Sergey M․ e8c5d40bc8
release 2020.09.14 4 years ago
Sergey M․ ca7ebc4e5e
[ChangeLog] Actualize
[ci skip]
4 years ago
Sergey M․ bff857a8af
[postprocessor/embedthumbnail] Fix issues (closes #25717)
* Fix WebP with wrong extension processing
* Fix embedding of thumbnails with % character in path
4 years ago
Alex Merkel a31a022efd
[postprocessor/embedthumbnail] Add support for non jpeg/png thumbnails (closes #25687) 4 years ago
Jens Timmerman 893afc2ca8
Update README.md 4 years ago
Sergey M․ 45f6362464
[rtlnl] Extend _VALID_URL for new embed URL schema 4 years ago
Derek Land 97f34a48d7
[rtlnl] Extend _VALID_URL (#26549) (closes #25821) 4 years ago
Daniel Peukert ea74e00b3a
[youtube] Fix empty description extraction (#26575) (closes #26006) 4 years ago
Sergey M․ 06cd4cdb25
[srgssr] Extend _VALID_URL (closes #26555, closes #26556, closes #26578) 4 years ago
Sergey M․ da2069fb22
[googledrive] Use redirect URLs for source format (closes #18877, closes #23919, closes #24689, closes #26565) 4 years ago
Tom-Oliver Heidel 3796554609
Merge pull request #102 from blackjack4494/gdcvault-fix
[gdcvault] fix extractor
4 years ago
Unknown 4b819d1454 flake8 4 years ago
Unknown 10bbf2c48d [skip travis] bump version 4 years ago