segments: Use longest segment in bytes if duration is the same

We occasionally see corrupted segments that are slightly shorter in size but report the same metadata as the full segments. Prefer the largest version as it's likely the least corrupt.
6 years ago · bb05e37ae4
parent b516917e62
commit bb05e37ae4
1 changed files with 4 additions and 3 deletions
--- a/common/common/segments.py
+++ b/common/common/segments.py
@ -243,11 +243,12 @@ def best_segments_by_start(hour):
 					start_time, ", ".join(map(str, segments))
 				))
 				# We've observed some cases where the same segment (with the same hash) will be reported
-				# with different durations (generally at stream end). Prefer the longer duration,
+				# with different durations (generally at stream end). Prefer the longer duration (followed by longest size),
 				# as this will ensure that if hashes are different we get the most data, and if they
 				# are the same it should keep holes to a minimum.
-				# If same duration, we have to pick one, so pick highest-sorting hash just so we're consistent.
-				full_segments = [max(full_segments, key=lambda segment: (segment.duration, segment.hash))]
+				# If same duration and size, we have to pick one, so pick highest-sorting hash just so we're consistent.
+				sizes = {segment: os.stat(segment.path).st_size for segment in segments}
+				full_segments = [max(full_segments, key=lambda segment: (segment.duration, sizes[segment], segment.hash))]
 			yield full_segments[0]
 			continue
 		# no full segments, fall back to measuring partials.