»Core Development>Code coverage>Lib/heapq.py

# Python code coverage for Lib/heapq.py

#countcontent
1n/a"""Heap queue algorithm (a.k.a. priority queue).
2n/a
3n/aHeaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for
4n/aall k, counting elements from 0. For the sake of comparison,
5n/anon-existing elements are considered to be infinite. The interesting
6n/aproperty of a heap is that a[0] is always its smallest element.
7n/a
8n/aUsage:
9n/a
10n/aheap = [] # creates an empty heap
11n/aheappush(heap, item) # pushes a new item on the heap
12n/aitem = heappop(heap) # pops the smallest item from the heap
13n/aitem = heap[0] # smallest item on the heap without popping it
14n/aheapify(x) # transforms list into a heap, in-place, in linear time
15n/aitem = heapreplace(heap, item) # pops and returns smallest item, and adds
16n/a # new item; the heap size is unchanged
17n/a
18n/aOur API differs from textbook heap algorithms as follows:
19n/a
20n/a- We use 0-based indexing. This makes the relationship between the
21n/a index for a node and the indexes for its children slightly less
22n/a obvious, but is more suitable since Python uses 0-based indexing.
23n/a
24n/a- Our heappop() method returns the smallest item, not the largest.
25n/a
26n/aThese two make it possible to view the heap as a regular Python list
27n/awithout surprises: heap[0] is the smallest item, and heap.sort()
28n/amaintains the heap invariant!
29n/a"""
30n/a
31n/a# Original code by Kevin O'Connor, augmented by Tim Peters and Raymond Hettinger
32n/a
34n/a
35n/a[explanation by FranÃ§ois Pinard]
36n/a
37n/aHeaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for
38n/aall k, counting elements from 0. For the sake of comparison,
39n/anon-existing elements are considered to be infinite. The interesting
40n/aproperty of a heap is that a[0] is always its smallest element.
41n/a
42n/aThe strange invariant above is meant to be an efficient memory
43n/arepresentation for a tournament. The numbers below are `k', not a[k]:
44n/a
45n/a 0
46n/a
47n/a 1 2
48n/a
49n/a 3 4 5 6
50n/a
51n/a 7 8 9 10 11 12 13 14
52n/a
53n/a 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
54n/a
55n/a
56n/aIn the tree above, each cell `k' is topping `2*k+1' and `2*k+2'. In
57n/aa usual binary tournament we see in sports, each cell is the winner
58n/aover the two cells it tops, and we can trace the winner down the tree
59n/ato see all opponents s/he had. However, in many computer applications
60n/aof such tournaments, we do not need to trace the history of a winner.
61n/aTo be more memory efficient, when a winner is promoted, we try to
62n/areplace it by something else at a lower level, and the rule becomes
63n/athat a cell and the two cells it tops contain three different items,
64n/abut the top cell "wins" over the two topped cells.
65n/a
66n/aIf this heap invariant is protected at all time, index 0 is clearly
67n/athe overall winner. The simplest algorithmic way to remove it and
68n/afind the "next" winner is to move some loser (let's say cell 30 in the
69n/adiagram above) into the 0 position, and then percolate this new 0 down
70n/athe tree, exchanging values, until the invariant is re-established.
71n/aThis is clearly logarithmic on the total number of items in the tree.
72n/aBy iterating over all items, you get an O(n ln n) sort.
73n/a
74n/aA nice feature of this sort is that you can efficiently insert new
75n/aitems while the sort is going on, provided that the inserted items are
76n/anot "better" than the last 0'th element you extracted. This is
77n/aespecially useful in simulation contexts, where the tree holds all
78n/aincoming events, and the "win" condition means the smallest scheduled
79n/atime. When an event schedule other events for execution, they are
80n/ascheduled into the future, so they can easily go into the heap. So, a
81n/aheap is a good structure for implementing schedulers (this is what I
82n/aused for my MIDI sequencer :-).
83n/a
84n/aVarious structures for implementing schedulers have been extensively
85n/astudied, and heaps are good for this, as they are reasonably speedy,
86n/athe speed is almost constant, and the worst case is not much different
87n/athan the average case. However, there are other representations which
88n/aare more efficient overall, yet the worst cases might be terrible.
89n/a
90n/aHeaps are also very useful in big disk sorts. You most probably all
91n/aknow that a big sort implies producing "runs" (which are pre-sorted
92n/asequences, which size is usually related to the amount of CPU memory),
93n/afollowed by a merging passes for these runs, which merging is often
94n/avery cleverly organised[1]. It is very important that the initial
95n/asort produces the longest runs possible. Tournaments are a good way
96n/ato that. If, using all the memory available to hold a tournament, you
97n/areplace and percolate items that happen to fit the current run, you'll
98n/aproduce runs which are twice the size of the memory for random input,
99n/aand much better for input fuzzily ordered.
100n/a
101n/aMoreover, if you output the 0'th item on disk and get an input which
102n/amay not fit in the current tournament (because the value "wins" over
103n/athe last output value), it cannot fit in the heap, so the size of the
104n/aheap decreases. The freed memory could be cleverly reused immediately
105n/afor progressively building a second heap, which grows at exactly the
106n/asame rate the first heap is melting. When the first heap completely
107n/avanishes, you switch heaps and start a new run. Clever and quite
108n/aeffective!
109n/a
110n/aIn a word, heaps are useful memory structures to know. I use them in
111n/aa few applications, and I think it is good to keep a `heap' module
112n/aaround. :-)
113n/a
114n/a--------------------
115n/a[1] The disk balancing algorithms which are current, nowadays, are
116n/amore annoying than clever, and this is a consequence of the seeking
117n/acapabilities of the disks. On devices which cannot seek, like big
118n/atape drives, the story was quite different, and one had to be very
119n/aclever to ensure (far in advance) that each tape movement will be the
120n/amost effective possible (that is, will best participate at
121n/a"progressing" the merge). Some tapes were even able to read
122n/abackwards, and this was also used to avoid the rewinding time.
123n/aBelieve me, real good tape sorts were quite spectacular to watch!
124n/aFrom all times, sorting has always been a Great Art! :-)
125n/a"""
126n/a
127n/a__all__ = ['heappush', 'heappop', 'heapify', 'heapreplace', 'merge',
128n/a 'nlargest', 'nsmallest', 'heappushpop']
129n/a
131n/a """Push item onto heap, maintaining the heap invariant."""
132n/a heap.append(item)
133n/a _siftdown(heap, 0, len(heap)-1)
134n/a
136n/a """Pop the smallest item off the heap, maintaining the heap invariant."""
137n/a lastelt = heap.pop() # raises appropriate IndexError if heap is empty
138n/a if heap:
139n/a returnitem = heap[0]
140n/a heap[0] = lastelt
141n/a _siftup(heap, 0)
142n/a return returnitem
143n/a return lastelt
144n/a
146n/a """Pop and return the current smallest value, and add the new item.
147n/a
148n/a This is more efficient than heappop() followed by heappush(), and can be
149n/a more appropriate when using a fixed-size heap. Note that the value
150n/a returned may be larger than item! That constrains reasonable uses of
151n/a this routine unless written as part of a conditional replacement:
152n/a
153n/a if item > heap[0]:
154n/a item = heapreplace(heap, item)
155n/a """
156n/a returnitem = heap[0] # raises appropriate IndexError if heap is empty
157n/a heap[0] = item
158n/a _siftup(heap, 0)
159n/a return returnitem
160n/a
162n/a """Fast version of a heappush followed by a heappop."""
163n/a if heap and heap[0] < item:
164n/a item, heap[0] = heap[0], item
165n/a _siftup(heap, 0)
166n/a return item
167n/a
169n/a """Transform list into a heap, in-place, in O(len(x)) time."""
170n/a n = len(x)
171n/a # Transform bottom-up. The largest index there's any point to looking at
172n/a # is the largest with a child index in-range, so must have 2*i + 1 < n,
173n/a # or i < (n-1)/2. If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
174n/a # j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is
175n/a # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
176n/a for i in reversed(range(n//2)):
177n/a _siftup(x, i)
178n/a
180n/a """Maxheap version of a heappop."""
181n/a lastelt = heap.pop() # raises appropriate IndexError if heap is empty
182n/a if heap:
183n/a returnitem = heap[0]
184n/a heap[0] = lastelt
185n/a _siftup_max(heap, 0)
186n/a return returnitem
187n/a return lastelt
188n/a
190n/a """Maxheap version of a heappop followed by a heappush."""
191n/a returnitem = heap[0] # raises appropriate IndexError if heap is empty
192n/a heap[0] = item
193n/a _siftup_max(heap, 0)
194n/a return returnitem
195n/a
197n/a """Transform list into a maxheap, in-place, in O(len(x)) time."""
198n/a n = len(x)
199n/a for i in reversed(range(n//2)):
200n/a _siftup_max(x, i)
201n/a
202n/a# 'heap' is a heap at all indices >= startpos, except possibly for pos. pos
203n/a# is the index of a leaf with a possibly out-of-order value. Restore the
204n/a# heap invariant.
206n/a newitem = heap[pos]
207n/a # Follow the path to the root, moving parents down until finding a place
208n/a # newitem fits.
209n/a while pos > startpos:
210n/a parentpos = (pos - 1) >> 1
211n/a parent = heap[parentpos]
212n/a if newitem < parent:
213n/a heap[pos] = parent
214n/a pos = parentpos
215n/a continue
216n/a break
217n/a heap[pos] = newitem
218n/a
219n/a# The child indices of heap index pos are already heaps, and we want to make
220n/a# a heap at index pos too. We do this by bubbling the smaller child of
221n/a# pos up (and so on with that child's children, etc) until hitting a leaf,
222n/a# then using _siftdown to move the oddball originally at index pos into place.
223n/a#
224n/a# We *could* break out of the loop as soon as we find a pos where newitem <=
225n/a# both its children, but turns out that's not a good idea, and despite that
226n/a# many books write the algorithm that way. During a heap pop, the last array
227n/a# element is sifted in, and that tends to be large, so that comparing it
228n/a# against values starting from the root usually doesn't pay (= usually doesn't
229n/a# get us out of the loop early). See Knuth, Volume 3, where this is
230n/a# explained and quantified in an exercise.
231n/a#
232n/a# Cutting the # of comparisons is important, since these routines have no
233n/a# way to extract "the priority" from an array element, so that intelligence
234n/a# is likely to be hiding in custom comparison methods, or in array elements
235n/a# storing (priority, record) tuples. Comparisons are thus potentially
236n/a# expensive.
237n/a#
238n/a# On random arrays of length 1000, making this change cut the number of
239n/a# comparisons made by heapify() a little, and those made by exhaustive
240n/a# heappop() a lot, in accord with theory. Here are typical results from 3
241n/a# runs (3 just to demonstrate how small the variance is):
242n/a#
243n/a# Compares needed by heapify Compares needed by 1000 heappops
244n/a# -------------------------- --------------------------------
245n/a# 1837 cut to 1663 14996 cut to 8680
246n/a# 1855 cut to 1659 14966 cut to 8678
247n/a# 1847 cut to 1660 15024 cut to 8703
248n/a#
249n/a# Building the heap by using heappush() 1000 times instead required
250n/a# 2198, 2148, and 2219 compares: heapify() is more efficient, when
251n/a# you can use it.
252n/a#
253n/a# The total compares needed by list.sort() on the same lists were 8627,
254n/a# 8627, and 8632 (this should be compared to the sum of heapify() and
255n/a# heappop() compares): list.sort() is (unsurprisingly!) more efficient
256n/a# for sorting.
257n/a
259n/a endpos = len(heap)
260n/a startpos = pos
261n/a newitem = heap[pos]
262n/a # Bubble up the smaller child until hitting a leaf.
263n/a childpos = 2*pos + 1 # leftmost child position
264n/a while childpos < endpos:
265n/a # Set childpos to index of smaller child.
266n/a rightpos = childpos + 1
267n/a if rightpos < endpos and not heap[childpos] < heap[rightpos]:
268n/a childpos = rightpos
269n/a # Move the smaller child up.
270n/a heap[pos] = heap[childpos]
271n/a pos = childpos
272n/a childpos = 2*pos + 1
273n/a # The leaf at pos is empty now. Put newitem there, and bubble it up
274n/a # to its final resting place (by sifting its parents down).
275n/a heap[pos] = newitem
276n/a _siftdown(heap, startpos, pos)
277n/a
279n/a 'Maxheap variant of _siftdown'
280n/a newitem = heap[pos]
281n/a # Follow the path to the root, moving parents down until finding a place
282n/a # newitem fits.
283n/a while pos > startpos:
284n/a parentpos = (pos - 1) >> 1
285n/a parent = heap[parentpos]
286n/a if parent < newitem:
287n/a heap[pos] = parent
288n/a pos = parentpos
289n/a continue
290n/a break
291n/a heap[pos] = newitem
292n/a
294n/a 'Maxheap variant of _siftup'
295n/a endpos = len(heap)
296n/a startpos = pos
297n/a newitem = heap[pos]
298n/a # Bubble up the larger child until hitting a leaf.
299n/a childpos = 2*pos + 1 # leftmost child position
300n/a while childpos < endpos:
301n/a # Set childpos to index of larger child.
302n/a rightpos = childpos + 1
303n/a if rightpos < endpos and not heap[rightpos] < heap[childpos]:
304n/a childpos = rightpos
305n/a # Move the larger child up.
306n/a heap[pos] = heap[childpos]
307n/a pos = childpos
308n/a childpos = 2*pos + 1
309n/a # The leaf at pos is empty now. Put newitem there, and bubble it up
310n/a # to its final resting place (by sifting its parents down).
311n/a heap[pos] = newitem
312n/a _siftdown_max(heap, startpos, pos)
313n/a
315n/a '''Merge multiple sorted inputs into a single sorted output.
316n/a
317n/a Similar to sorted(itertools.chain(*iterables)) but returns a generator,
318n/a does not pull the data into memory all at once, and assumes that each of
319n/a the input streams is already sorted (smallest to largest).
320n/a
321n/a >>> list(merge([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
322n/a [0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]
323n/a
324n/a If *key* is not None, applies a key function to each element to determine
325n/a its sort order.
326n/a
327n/a >>> list(merge(['dog', 'horse'], ['cat', 'fish', 'kangaroo'], key=len))
328n/a ['dog', 'cat', 'fish', 'horse', 'kangaroo']
329n/a
330n/a '''
331n/a
332n/a h = []
333n/a h_append = h.append
334n/a
335n/a if reverse:
336n/a _heapify = _heapify_max
337n/a _heappop = _heappop_max
338n/a _heapreplace = _heapreplace_max
339n/a direction = -1
340n/a else:
341n/a _heapify = heapify
342n/a _heappop = heappop
343n/a _heapreplace = heapreplace
344n/a direction = 1
345n/a
346n/a if key is None:
347n/a for order, it in enumerate(map(iter, iterables)):
348n/a try:
349n/a next = it.__next__
350n/a h_append([next(), order * direction, next])
351n/a except StopIteration:
352n/a pass
353n/a _heapify(h)
354n/a while len(h) > 1:
355n/a try:
356n/a while True:
357n/a value, order, next = s = h[0]
358n/a yield value
359n/a s[0] = next() # raises StopIteration when exhausted
360n/a _heapreplace(h, s) # restore heap condition
361n/a except StopIteration:
362n/a _heappop(h) # remove empty iterator
363n/a if h:
364n/a # fast case when only a single iterator remains
365n/a value, order, next = h[0]
366n/a yield value
367n/a yield from next.__self__
368n/a return
369n/a
370n/a for order, it in enumerate(map(iter, iterables)):
371n/a try:
372n/a next = it.__next__
373n/a value = next()
374n/a h_append([key(value), order * direction, value, next])
375n/a except StopIteration:
376n/a pass
377n/a _heapify(h)
378n/a while len(h) > 1:
379n/a try:
380n/a while True:
381n/a key_value, order, value, next = s = h[0]
382n/a yield value
383n/a value = next()
384n/a s[0] = key(value)
385n/a s[2] = value
386n/a _heapreplace(h, s)
387n/a except StopIteration:
388n/a _heappop(h)
389n/a if h:
390n/a key_value, order, value, next = h[0]
391n/a yield value
392n/a yield from next.__self__
393n/a
394n/a
395n/a# Algorithm notes for nlargest() and nsmallest()
396n/a# ==============================================
397n/a#
398n/a# Make a single pass over the data while keeping the k most extreme values
399n/a# in a heap. Memory consumption is limited to keeping k values in a list.
400n/a#
401n/a# Measured performance for random inputs:
402n/a#
403n/a# number of comparisons
404n/a# n inputs k-extreme values (average of 5 trials) % more than min()
405n/a# ------------- ---------------- --------------------- -----------------
406n/a# 1,000 100 3,317 231.7%
407n/a# 10,000 100 14,046 40.5%
408n/a# 100,000 100 105,749 5.7%
409n/a# 1,000,000 100 1,007,751 0.8%
410n/a# 10,000,000 100 10,009,401 0.1%
411n/a#
412n/a# Theoretical number of comparisons for k smallest of n random inputs:
413n/a#
414n/a# Step Comparisons Action
415n/a# ---- -------------------------- ---------------------------
416n/a# 1 1.66 * k heapify the first k-inputs
417n/a# 2 n - k compare remaining elements to top of heap
418n/a# 3 k * (1 + lg2(k)) * ln(n/k) replace the topmost value on the heap
419n/a# 4 k * lg2(k) - (k/2) final sort of the k most extreme values
420n/a#
421n/a# Combining and simplifying for a rough estimate gives:
422n/a#
423n/a# comparisons = n + k * (log(k, 2) * log(n/k) + log(k, 2) + log(n/k))
424n/a#
425n/a# Computing the number of comparisons for step 3:
426n/a# -----------------------------------------------
427n/a# * For the i-th new value from the iterable, the probability of being in the
428n/a# k most extreme values is k/i. For example, the probability of the 101st
429n/a# value seen being in the 100 most extreme values is 100/101.
430n/a# * If the value is a new extreme value, the cost of inserting it into the
431n/a# heap is 1 + log(k, 2).
432n/a# * The probability times the cost gives:
433n/a# (k/i) * (1 + log(k, 2))
434n/a# * Summing across the remaining n-k elements gives:
435n/a# sum((k/i) * (1 + log(k, 2)) for i in range(k+1, n+1))
436n/a# * This reduces to:
437n/a# (H(n) - H(k)) * k * (1 + log(k, 2))
438n/a# * Where H(n) is the n-th harmonic number estimated by:
439n/a# gamma = 0.5772156649
440n/a# H(n) = log(n, e) + gamma + 1 / (2 * n)
441n/a# http://en.wikipedia.org/wiki/Harmonic_series_(mathematics)#Rate_of_divergence
442n/a# * Substituting the H(n) formula:
443n/a# comparisons = k * (1 + log(k, 2)) * (log(n/k, e) + (1/n - 1/k) / 2)
444n/a#
445n/a# Worst-case for step 3:
446n/a# ----------------------
447n/a# In the worst case, the input data is reversed sorted so that every new element
448n/a# must be inserted in the heap:
449n/a#
450n/a# comparisons = 1.66 * k + log(k, 2) * (n - k)
451n/a#
452n/a# Alternative Algorithms
453n/a# ----------------------
454n/a# Other algorithms were not used because they:
455n/a# 1) Took much more auxiliary memory,
456n/a# 2) Made multiple passes over the data.
457n/a# 3) Made more comparisons in common cases (small k, large n, semi-random input).
458n/a# See the more detailed comparison of approach at:
459n/a# http://code.activestate.com/recipes/577573-compare-algorithms-for-heapqsmallest
460n/a
462n/a """Find the n smallest elements in a dataset.
463n/a
464n/a Equivalent to: sorted(iterable, key=key)[:n]
465n/a """
466n/a
467n/a # Short-cut for n==1 is to use min()
468n/a if n == 1:
469n/a it = iter(iterable)
470n/a sentinel = object()
471n/a if key is None:
472n/a result = min(it, default=sentinel)
473n/a else:
474n/a result = min(it, default=sentinel, key=key)
475n/a return [] if result is sentinel else [result]
476n/a
477n/a # When n>=size, it's faster to use sorted()
478n/a try:
479n/a size = len(iterable)
480n/a except (TypeError, AttributeError):
481n/a pass
482n/a else:
483n/a if n >= size:
484n/a return sorted(iterable, key=key)[:n]
485n/a
486n/a # When key is none, use simpler decoration
487n/a if key is None:
488n/a it = iter(iterable)
489n/a # put the range(n) first so that zip() doesn't
490n/a # consume one too many elements from the iterator
491n/a result = [(elem, i) for i, elem in zip(range(n), it)]
492n/a if not result:
493n/a return result
494n/a _heapify_max(result)
495n/a top = result[0][0]
496n/a order = n
497n/a _heapreplace = _heapreplace_max
498n/a for elem in it:
499n/a if elem < top:
500n/a _heapreplace(result, (elem, order))
501n/a top = result[0][0]
502n/a order += 1
503n/a result.sort()
504n/a return [r[0] for r in result]
505n/a
506n/a # General case, slowest method
507n/a it = iter(iterable)
508n/a result = [(key(elem), i, elem) for i, elem in zip(range(n), it)]
509n/a if not result:
510n/a return result
511n/a _heapify_max(result)
512n/a top = result[0][0]
513n/a order = n
514n/a _heapreplace = _heapreplace_max
515n/a for elem in it:
516n/a k = key(elem)
517n/a if k < top:
518n/a _heapreplace(result, (k, order, elem))
519n/a top = result[0][0]
520n/a order += 1
521n/a result.sort()
522n/a return [r[2] for r in result]
523n/a
525n/a """Find the n largest elements in a dataset.
526n/a
527n/a Equivalent to: sorted(iterable, key=key, reverse=True)[:n]
528n/a """
529n/a
530n/a # Short-cut for n==1 is to use max()
531n/a if n == 1:
532n/a it = iter(iterable)
533n/a sentinel = object()
534n/a if key is None:
535n/a result = max(it, default=sentinel)
536n/a else:
537n/a result = max(it, default=sentinel, key=key)
538n/a return [] if result is sentinel else [result]
539n/a
540n/a # When n>=size, it's faster to use sorted()
541n/a try:
542n/a size = len(iterable)
543n/a except (TypeError, AttributeError):
544n/a pass
545n/a else:
546n/a if n >= size:
547n/a return sorted(iterable, key=key, reverse=True)[:n]
548n/a
549n/a # When key is none, use simpler decoration
550n/a if key is None:
551n/a it = iter(iterable)
552n/a result = [(elem, i) for i, elem in zip(range(0, -n, -1), it)]
553n/a if not result:
554n/a return result
555n/a heapify(result)
556n/a top = result[0][0]
557n/a order = -n
558n/a _heapreplace = heapreplace
559n/a for elem in it:
560n/a if top < elem:
561n/a _heapreplace(result, (elem, order))
562n/a top = result[0][0]
563n/a order -= 1
564n/a result.sort(reverse=True)
565n/a return [r[0] for r in result]
566n/a
567n/a # General case, slowest method
568n/a it = iter(iterable)
569n/a result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
570n/a if not result:
571n/a return result
572n/a heapify(result)
573n/a top = result[0][0]
574n/a order = -n
575n/a _heapreplace = heapreplace
576n/a for elem in it:
577n/a k = key(elem)
578n/a if top < k:
579n/a _heapreplace(result, (k, order, elem))
580n/a top = result[0][0]
581n/a order -= 1
582n/a result.sort(reverse=True)
583n/a return [r[2] for r in result]
584n/a
585n/a# If available, use C implementation
586n/atry:
587n/a from _heapq import *
588n/aexcept ImportError:
589n/a pass
590n/atry:
591n/a from _heapq import _heapreplace_max
592n/aexcept ImportError:
593n/a pass
594n/atry:
595n/a from _heapq import _heapify_max
596n/aexcept ImportError:
597n/a pass
598n/atry:
599n/a from _heapq import _heappop_max
600n/aexcept ImportError:
601n/a pass
602n/a
603n/a
604n/aif __name__ == "__main__":
605n/a
606n/a import doctest
607n/a print(doctest.testmod())