mirror of https://github.com/yt-dlp/yt-dlp
fix youtube music metadata extraction
fixed the metadata extraction regex's catastrophic backtracking, made it faster on all inputs, and added proper support for artists using the middle dot character and now, a rant about properly checking your work and learning how to do shit before you publish changes: simulated atomic groups did not make the regex faster - you added a newline. simulated atomic groups are always (guaranteed!) slower than normal groups and removing them from the old regex makes that regex faster: https://regex101.com/r/8Ssf2h/3 this is fairly obvious to anyone who has actually learned how regexes are matched. the fix is to add a delimiter to the start of the expression: https://regex101.com/r/XqqucW/1 without (?:\n|^), the regex attempts to find a match starting at every possible title character (which is virtually every location) it will then attempt to extend this until it can't do so. for the string "hello", it would have to check "hello", "ello", "llo", "lo", and "o". this is what backtracking is, and it causes quadratic performance in the number of input characters. again, this is fairly obvious to anyone who has actually learned how regexes are matched. i really hope the next person to "improve" this actually takes the time to review their changes before pushing them.pull/13896/head
parent
71f30921a2
commit
1b4d0401e4
Loading…
Reference in New Issue