# Python code coverage for Lib/heapq.py

# | count | content |
---|---|---|

1 | n/a | """Heap queue algorithm (a.k.a. priority queue). |

2 | n/a | |

3 | n/a | Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for |

4 | n/a | all k, counting elements from 0. For the sake of comparison, |

5 | n/a | non-existing elements are considered to be infinite. The interesting |

6 | n/a | property of a heap is that a[0] is always its smallest element. |

7 | n/a | |

8 | n/a | Usage: |

9 | n/a | |

10 | n/a | heap = [] # creates an empty heap |

11 | n/a | heappush(heap, item) # pushes a new item on the heap |

12 | n/a | item = heappop(heap) # pops the smallest item from the heap |

13 | n/a | item = heap[0] # smallest item on the heap without popping it |

14 | n/a | heapify(x) # transforms list into a heap, in-place, in linear time |

15 | n/a | item = heapreplace(heap, item) # pops and returns smallest item, and adds |

16 | n/a | # new item; the heap size is unchanged |

17 | n/a | |

18 | n/a | Our API differs from textbook heap algorithms as follows: |

19 | n/a | |

20 | n/a | - We use 0-based indexing. This makes the relationship between the |

21 | n/a | index for a node and the indexes for its children slightly less |

22 | n/a | obvious, but is more suitable since Python uses 0-based indexing. |

23 | n/a | |

24 | n/a | - Our heappop() method returns the smallest item, not the largest. |

25 | n/a | |

26 | n/a | These two make it possible to view the heap as a regular Python list |

27 | n/a | without surprises: heap[0] is the smallest item, and heap.sort() |

28 | n/a | maintains the heap invariant! |

29 | n/a | """ |

30 | n/a | |

31 | n/a | # Original code by Kevin O'Connor, augmented by Tim Peters and Raymond Hettinger |

32 | n/a | |

33 | n/a | __about__ = """Heap queues |

34 | n/a | |

35 | n/a | [explanation by FranÃ§ois Pinard] |

36 | n/a | |

37 | n/a | Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for |

38 | n/a | all k, counting elements from 0. For the sake of comparison, |

39 | n/a | non-existing elements are considered to be infinite. The interesting |

40 | n/a | property of a heap is that a[0] is always its smallest element. |

41 | n/a | |

42 | n/a | The strange invariant above is meant to be an efficient memory |

43 | n/a | representation for a tournament. The numbers below are `k', not a[k]: |

44 | n/a | |

45 | n/a | 0 |

46 | n/a | |

47 | n/a | 1 2 |

48 | n/a | |

49 | n/a | 3 4 5 6 |

50 | n/a | |

51 | n/a | 7 8 9 10 11 12 13 14 |

52 | n/a | |

53 | n/a | 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |

54 | n/a | |

55 | n/a | |

56 | n/a | In the tree above, each cell `k' is topping `2*k+1' and `2*k+2'. In |

57 | n/a | a usual binary tournament we see in sports, each cell is the winner |

58 | n/a | over the two cells it tops, and we can trace the winner down the tree |

59 | n/a | to see all opponents s/he had. However, in many computer applications |

60 | n/a | of such tournaments, we do not need to trace the history of a winner. |

61 | n/a | To be more memory efficient, when a winner is promoted, we try to |

62 | n/a | replace it by something else at a lower level, and the rule becomes |

63 | n/a | that a cell and the two cells it tops contain three different items, |

64 | n/a | but the top cell "wins" over the two topped cells. |

65 | n/a | |

66 | n/a | If this heap invariant is protected at all time, index 0 is clearly |

67 | n/a | the overall winner. The simplest algorithmic way to remove it and |

68 | n/a | find the "next" winner is to move some loser (let's say cell 30 in the |

69 | n/a | diagram above) into the 0 position, and then percolate this new 0 down |

70 | n/a | the tree, exchanging values, until the invariant is re-established. |

71 | n/a | This is clearly logarithmic on the total number of items in the tree. |

72 | n/a | By iterating over all items, you get an O(n ln n) sort. |

73 | n/a | |

74 | n/a | A nice feature of this sort is that you can efficiently insert new |

75 | n/a | items while the sort is going on, provided that the inserted items are |

76 | n/a | not "better" than the last 0'th element you extracted. This is |

77 | n/a | especially useful in simulation contexts, where the tree holds all |

78 | n/a | incoming events, and the "win" condition means the smallest scheduled |

79 | n/a | time. When an event schedule other events for execution, they are |

80 | n/a | scheduled into the future, so they can easily go into the heap. So, a |

81 | n/a | heap is a good structure for implementing schedulers (this is what I |

82 | n/a | used for my MIDI sequencer :-). |

83 | n/a | |

84 | n/a | Various structures for implementing schedulers have been extensively |

85 | n/a | studied, and heaps are good for this, as they are reasonably speedy, |

86 | n/a | the speed is almost constant, and the worst case is not much different |

87 | n/a | than the average case. However, there are other representations which |

88 | n/a | are more efficient overall, yet the worst cases might be terrible. |

89 | n/a | |

90 | n/a | Heaps are also very useful in big disk sorts. You most probably all |

91 | n/a | know that a big sort implies producing "runs" (which are pre-sorted |

92 | n/a | sequences, which size is usually related to the amount of CPU memory), |

93 | n/a | followed by a merging passes for these runs, which merging is often |

94 | n/a | very cleverly organised[1]. It is very important that the initial |

95 | n/a | sort produces the longest runs possible. Tournaments are a good way |

96 | n/a | to that. If, using all the memory available to hold a tournament, you |

97 | n/a | replace and percolate items that happen to fit the current run, you'll |

98 | n/a | produce runs which are twice the size of the memory for random input, |

99 | n/a | and much better for input fuzzily ordered. |

100 | n/a | |

101 | n/a | Moreover, if you output the 0'th item on disk and get an input which |

102 | n/a | may not fit in the current tournament (because the value "wins" over |

103 | n/a | the last output value), it cannot fit in the heap, so the size of the |

104 | n/a | heap decreases. The freed memory could be cleverly reused immediately |

105 | n/a | for progressively building a second heap, which grows at exactly the |

106 | n/a | same rate the first heap is melting. When the first heap completely |

107 | n/a | vanishes, you switch heaps and start a new run. Clever and quite |

108 | n/a | effective! |

109 | n/a | |

110 | n/a | In a word, heaps are useful memory structures to know. I use them in |

111 | n/a | a few applications, and I think it is good to keep a `heap' module |

112 | n/a | around. :-) |

113 | n/a | |

114 | n/a | -------------------- |

115 | n/a | [1] The disk balancing algorithms which are current, nowadays, are |

116 | n/a | more annoying than clever, and this is a consequence of the seeking |

117 | n/a | capabilities of the disks. On devices which cannot seek, like big |

118 | n/a | tape drives, the story was quite different, and one had to be very |

119 | n/a | clever to ensure (far in advance) that each tape movement will be the |

120 | n/a | most effective possible (that is, will best participate at |

121 | n/a | "progressing" the merge). Some tapes were even able to read |

122 | n/a | backwards, and this was also used to avoid the rewinding time. |

123 | n/a | Believe me, real good tape sorts were quite spectacular to watch! |

124 | n/a | From all times, sorting has always been a Great Art! :-) |

125 | n/a | """ |

126 | n/a | |

127 | n/a | __all__ = ['heappush', 'heappop', 'heapify', 'heapreplace', 'merge', |

128 | n/a | 'nlargest', 'nsmallest', 'heappushpop'] |

129 | n/a | |

130 | n/a | def heappush(heap, item): |

131 | n/a | """Push item onto heap, maintaining the heap invariant.""" |

132 | n/a | heap.append(item) |

133 | n/a | _siftdown(heap, 0, len(heap)-1) |

134 | n/a | |

135 | n/a | def heappop(heap): |

136 | n/a | """Pop the smallest item off the heap, maintaining the heap invariant.""" |

137 | n/a | lastelt = heap.pop() # raises appropriate IndexError if heap is empty |

138 | n/a | if heap: |

139 | n/a | returnitem = heap[0] |

140 | n/a | heap[0] = lastelt |

141 | n/a | _siftup(heap, 0) |

142 | n/a | return returnitem |

143 | n/a | return lastelt |

144 | n/a | |

145 | n/a | def heapreplace(heap, item): |

146 | n/a | """Pop and return the current smallest value, and add the new item. |

147 | n/a | |

148 | n/a | This is more efficient than heappop() followed by heappush(), and can be |

149 | n/a | more appropriate when using a fixed-size heap. Note that the value |

150 | n/a | returned may be larger than item! That constrains reasonable uses of |

151 | n/a | this routine unless written as part of a conditional replacement: |

152 | n/a | |

153 | n/a | if item > heap[0]: |

154 | n/a | item = heapreplace(heap, item) |

155 | n/a | """ |

156 | n/a | returnitem = heap[0] # raises appropriate IndexError if heap is empty |

157 | n/a | heap[0] = item |

158 | n/a | _siftup(heap, 0) |

159 | n/a | return returnitem |

160 | n/a | |

161 | n/a | def heappushpop(heap, item): |

162 | n/a | """Fast version of a heappush followed by a heappop.""" |

163 | n/a | if heap and heap[0] < item: |

164 | n/a | item, heap[0] = heap[0], item |

165 | n/a | _siftup(heap, 0) |

166 | n/a | return item |

167 | n/a | |

168 | n/a | def heapify(x): |

169 | n/a | """Transform list into a heap, in-place, in O(len(x)) time.""" |

170 | n/a | n = len(x) |

171 | n/a | # Transform bottom-up. The largest index there's any point to looking at |

172 | n/a | # is the largest with a child index in-range, so must have 2*i + 1 < n, |

173 | n/a | # or i < (n-1)/2. If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so |

174 | n/a | # j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is |

175 | n/a | # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1. |

176 | n/a | for i in reversed(range(n//2)): |

177 | n/a | _siftup(x, i) |

178 | n/a | |

179 | n/a | def _heappop_max(heap): |

180 | n/a | """Maxheap version of a heappop.""" |

181 | n/a | lastelt = heap.pop() # raises appropriate IndexError if heap is empty |

182 | n/a | if heap: |

183 | n/a | returnitem = heap[0] |

184 | n/a | heap[0] = lastelt |

185 | n/a | _siftup_max(heap, 0) |

186 | n/a | return returnitem |

187 | n/a | return lastelt |

188 | n/a | |

189 | n/a | def _heapreplace_max(heap, item): |

190 | n/a | """Maxheap version of a heappop followed by a heappush.""" |

191 | n/a | returnitem = heap[0] # raises appropriate IndexError if heap is empty |

192 | n/a | heap[0] = item |

193 | n/a | _siftup_max(heap, 0) |

194 | n/a | return returnitem |

195 | n/a | |

196 | n/a | def _heapify_max(x): |

197 | n/a | """Transform list into a maxheap, in-place, in O(len(x)) time.""" |

198 | n/a | n = len(x) |

199 | n/a | for i in reversed(range(n//2)): |

200 | n/a | _siftup_max(x, i) |

201 | n/a | |

202 | n/a | # 'heap' is a heap at all indices >= startpos, except possibly for pos. pos |

203 | n/a | # is the index of a leaf with a possibly out-of-order value. Restore the |

204 | n/a | # heap invariant. |

205 | n/a | def _siftdown(heap, startpos, pos): |

206 | n/a | newitem = heap[pos] |

207 | n/a | # Follow the path to the root, moving parents down until finding a place |

208 | n/a | # newitem fits. |

209 | n/a | while pos > startpos: |

210 | n/a | parentpos = (pos - 1) >> 1 |

211 | n/a | parent = heap[parentpos] |

212 | n/a | if newitem < parent: |

213 | n/a | heap[pos] = parent |

214 | n/a | pos = parentpos |

215 | n/a | continue |

216 | n/a | break |

217 | n/a | heap[pos] = newitem |

218 | n/a | |

219 | n/a | # The child indices of heap index pos are already heaps, and we want to make |

220 | n/a | # a heap at index pos too. We do this by bubbling the smaller child of |

221 | n/a | # pos up (and so on with that child's children, etc) until hitting a leaf, |

222 | n/a | # then using _siftdown to move the oddball originally at index pos into place. |

223 | n/a | # |

224 | n/a | # We *could* break out of the loop as soon as we find a pos where newitem <= |

225 | n/a | # both its children, but turns out that's not a good idea, and despite that |

226 | n/a | # many books write the algorithm that way. During a heap pop, the last array |

227 | n/a | # element is sifted in, and that tends to be large, so that comparing it |

228 | n/a | # against values starting from the root usually doesn't pay (= usually doesn't |

229 | n/a | # get us out of the loop early). See Knuth, Volume 3, where this is |

230 | n/a | # explained and quantified in an exercise. |

231 | n/a | # |

232 | n/a | # Cutting the # of comparisons is important, since these routines have no |

233 | n/a | # way to extract "the priority" from an array element, so that intelligence |

234 | n/a | # is likely to be hiding in custom comparison methods, or in array elements |

235 | n/a | # storing (priority, record) tuples. Comparisons are thus potentially |

236 | n/a | # expensive. |

237 | n/a | # |

238 | n/a | # On random arrays of length 1000, making this change cut the number of |

239 | n/a | # comparisons made by heapify() a little, and those made by exhaustive |

240 | n/a | # heappop() a lot, in accord with theory. Here are typical results from 3 |

241 | n/a | # runs (3 just to demonstrate how small the variance is): |

242 | n/a | # |

243 | n/a | # Compares needed by heapify Compares needed by 1000 heappops |

244 | n/a | # -------------------------- -------------------------------- |

245 | n/a | # 1837 cut to 1663 14996 cut to 8680 |

246 | n/a | # 1855 cut to 1659 14966 cut to 8678 |

247 | n/a | # 1847 cut to 1660 15024 cut to 8703 |

248 | n/a | # |

249 | n/a | # Building the heap by using heappush() 1000 times instead required |

250 | n/a | # 2198, 2148, and 2219 compares: heapify() is more efficient, when |

251 | n/a | # you can use it. |

252 | n/a | # |

253 | n/a | # The total compares needed by list.sort() on the same lists were 8627, |

254 | n/a | # 8627, and 8632 (this should be compared to the sum of heapify() and |

255 | n/a | # heappop() compares): list.sort() is (unsurprisingly!) more efficient |

256 | n/a | # for sorting. |

257 | n/a | |

258 | n/a | def _siftup(heap, pos): |

259 | n/a | endpos = len(heap) |

260 | n/a | startpos = pos |

261 | n/a | newitem = heap[pos] |

262 | n/a | # Bubble up the smaller child until hitting a leaf. |

263 | n/a | childpos = 2*pos + 1 # leftmost child position |

264 | n/a | while childpos < endpos: |

265 | n/a | # Set childpos to index of smaller child. |

266 | n/a | rightpos = childpos + 1 |

267 | n/a | if rightpos < endpos and not heap[childpos] < heap[rightpos]: |

268 | n/a | childpos = rightpos |

269 | n/a | # Move the smaller child up. |

270 | n/a | heap[pos] = heap[childpos] |

271 | n/a | pos = childpos |

272 | n/a | childpos = 2*pos + 1 |

273 | n/a | # The leaf at pos is empty now. Put newitem there, and bubble it up |

274 | n/a | # to its final resting place (by sifting its parents down). |

275 | n/a | heap[pos] = newitem |

276 | n/a | _siftdown(heap, startpos, pos) |

277 | n/a | |

278 | n/a | def _siftdown_max(heap, startpos, pos): |

279 | n/a | 'Maxheap variant of _siftdown' |

280 | n/a | newitem = heap[pos] |

281 | n/a | # Follow the path to the root, moving parents down until finding a place |

282 | n/a | # newitem fits. |

283 | n/a | while pos > startpos: |

284 | n/a | parentpos = (pos - 1) >> 1 |

285 | n/a | parent = heap[parentpos] |

286 | n/a | if parent < newitem: |

287 | n/a | heap[pos] = parent |

288 | n/a | pos = parentpos |

289 | n/a | continue |

290 | n/a | break |

291 | n/a | heap[pos] = newitem |

292 | n/a | |

293 | n/a | def _siftup_max(heap, pos): |

294 | n/a | 'Maxheap variant of _siftup' |

295 | n/a | endpos = len(heap) |

296 | n/a | startpos = pos |

297 | n/a | newitem = heap[pos] |

298 | n/a | # Bubble up the larger child until hitting a leaf. |

299 | n/a | childpos = 2*pos + 1 # leftmost child position |

300 | n/a | while childpos < endpos: |

301 | n/a | # Set childpos to index of larger child. |

302 | n/a | rightpos = childpos + 1 |

303 | n/a | if rightpos < endpos and not heap[rightpos] < heap[childpos]: |

304 | n/a | childpos = rightpos |

305 | n/a | # Move the larger child up. |

306 | n/a | heap[pos] = heap[childpos] |

307 | n/a | pos = childpos |

308 | n/a | childpos = 2*pos + 1 |

309 | n/a | # The leaf at pos is empty now. Put newitem there, and bubble it up |

310 | n/a | # to its final resting place (by sifting its parents down). |

311 | n/a | heap[pos] = newitem |

312 | n/a | _siftdown_max(heap, startpos, pos) |

313 | n/a | |

314 | n/a | def merge(*iterables, key=None, reverse=False): |

315 | n/a | '''Merge multiple sorted inputs into a single sorted output. |

316 | n/a | |

317 | n/a | Similar to sorted(itertools.chain(*iterables)) but returns a generator, |

318 | n/a | does not pull the data into memory all at once, and assumes that each of |

319 | n/a | the input streams is already sorted (smallest to largest). |

320 | n/a | |

321 | n/a | >>> list(merge([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) |

322 | n/a | [0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25] |

323 | n/a | |

324 | n/a | If *key* is not None, applies a key function to each element to determine |

325 | n/a | its sort order. |

326 | n/a | |

327 | n/a | >>> list(merge(['dog', 'horse'], ['cat', 'fish', 'kangaroo'], key=len)) |

328 | n/a | ['dog', 'cat', 'fish', 'horse', 'kangaroo'] |

329 | n/a | |

330 | n/a | ''' |

331 | n/a | |

332 | n/a | h = [] |

333 | n/a | h_append = h.append |

334 | n/a | |

335 | n/a | if reverse: |

336 | n/a | _heapify = _heapify_max |

337 | n/a | _heappop = _heappop_max |

338 | n/a | _heapreplace = _heapreplace_max |

339 | n/a | direction = -1 |

340 | n/a | else: |

341 | n/a | _heapify = heapify |

342 | n/a | _heappop = heappop |

343 | n/a | _heapreplace = heapreplace |

344 | n/a | direction = 1 |

345 | n/a | |

346 | n/a | if key is None: |

347 | n/a | for order, it in enumerate(map(iter, iterables)): |

348 | n/a | try: |

349 | n/a | next = it.__next__ |

350 | n/a | h_append([next(), order * direction, next]) |

351 | n/a | except StopIteration: |

352 | n/a | pass |

353 | n/a | _heapify(h) |

354 | n/a | while len(h) > 1: |

355 | n/a | try: |

356 | n/a | while True: |

357 | n/a | value, order, next = s = h[0] |

358 | n/a | yield value |

359 | n/a | s[0] = next() # raises StopIteration when exhausted |

360 | n/a | _heapreplace(h, s) # restore heap condition |

361 | n/a | except StopIteration: |

362 | n/a | _heappop(h) # remove empty iterator |

363 | n/a | if h: |

364 | n/a | # fast case when only a single iterator remains |

365 | n/a | value, order, next = h[0] |

366 | n/a | yield value |

367 | n/a | yield from next.__self__ |

368 | n/a | return |

369 | n/a | |

370 | n/a | for order, it in enumerate(map(iter, iterables)): |

371 | n/a | try: |

372 | n/a | next = it.__next__ |

373 | n/a | value = next() |

374 | n/a | h_append([key(value), order * direction, value, next]) |

375 | n/a | except StopIteration: |

376 | n/a | pass |

377 | n/a | _heapify(h) |

378 | n/a | while len(h) > 1: |

379 | n/a | try: |

380 | n/a | while True: |

381 | n/a | key_value, order, value, next = s = h[0] |

382 | n/a | yield value |

383 | n/a | value = next() |

384 | n/a | s[0] = key(value) |

385 | n/a | s[2] = value |

386 | n/a | _heapreplace(h, s) |

387 | n/a | except StopIteration: |

388 | n/a | _heappop(h) |

389 | n/a | if h: |

390 | n/a | key_value, order, value, next = h[0] |

391 | n/a | yield value |

392 | n/a | yield from next.__self__ |

393 | n/a | |

394 | n/a | |

395 | n/a | # Algorithm notes for nlargest() and nsmallest() |

396 | n/a | # ============================================== |

397 | n/a | # |

398 | n/a | # Make a single pass over the data while keeping the k most extreme values |

399 | n/a | # in a heap. Memory consumption is limited to keeping k values in a list. |

400 | n/a | # |

401 | n/a | # Measured performance for random inputs: |

402 | n/a | # |

403 | n/a | # number of comparisons |

404 | n/a | # n inputs k-extreme values (average of 5 trials) % more than min() |

405 | n/a | # ------------- ---------------- --------------------- ----------------- |

406 | n/a | # 1,000 100 3,317 231.7% |

407 | n/a | # 10,000 100 14,046 40.5% |

408 | n/a | # 100,000 100 105,749 5.7% |

409 | n/a | # 1,000,000 100 1,007,751 0.8% |

410 | n/a | # 10,000,000 100 10,009,401 0.1% |

411 | n/a | # |

412 | n/a | # Theoretical number of comparisons for k smallest of n random inputs: |

413 | n/a | # |

414 | n/a | # Step Comparisons Action |

415 | n/a | # ---- -------------------------- --------------------------- |

416 | n/a | # 1 1.66 * k heapify the first k-inputs |

417 | n/a | # 2 n - k compare remaining elements to top of heap |

418 | n/a | # 3 k * (1 + lg2(k)) * ln(n/k) replace the topmost value on the heap |

419 | n/a | # 4 k * lg2(k) - (k/2) final sort of the k most extreme values |

420 | n/a | # |

421 | n/a | # Combining and simplifying for a rough estimate gives: |

422 | n/a | # |

423 | n/a | # comparisons = n + k * (log(k, 2) * log(n/k) + log(k, 2) + log(n/k)) |

424 | n/a | # |

425 | n/a | # Computing the number of comparisons for step 3: |

426 | n/a | # ----------------------------------------------- |

427 | n/a | # * For the i-th new value from the iterable, the probability of being in the |

428 | n/a | # k most extreme values is k/i. For example, the probability of the 101st |

429 | n/a | # value seen being in the 100 most extreme values is 100/101. |

430 | n/a | # * If the value is a new extreme value, the cost of inserting it into the |

431 | n/a | # heap is 1 + log(k, 2). |

432 | n/a | # * The probability times the cost gives: |

433 | n/a | # (k/i) * (1 + log(k, 2)) |

434 | n/a | # * Summing across the remaining n-k elements gives: |

435 | n/a | # sum((k/i) * (1 + log(k, 2)) for i in range(k+1, n+1)) |

436 | n/a | # * This reduces to: |

437 | n/a | # (H(n) - H(k)) * k * (1 + log(k, 2)) |

438 | n/a | # * Where H(n) is the n-th harmonic number estimated by: |

439 | n/a | # gamma = 0.5772156649 |

440 | n/a | # H(n) = log(n, e) + gamma + 1 / (2 * n) |

441 | n/a | # http://en.wikipedia.org/wiki/Harmonic_series_(mathematics)#Rate_of_divergence |

442 | n/a | # * Substituting the H(n) formula: |

443 | n/a | # comparisons = k * (1 + log(k, 2)) * (log(n/k, e) + (1/n - 1/k) / 2) |

444 | n/a | # |

445 | n/a | # Worst-case for step 3: |

446 | n/a | # ---------------------- |

447 | n/a | # In the worst case, the input data is reversed sorted so that every new element |

448 | n/a | # must be inserted in the heap: |

449 | n/a | # |

450 | n/a | # comparisons = 1.66 * k + log(k, 2) * (n - k) |

451 | n/a | # |

452 | n/a | # Alternative Algorithms |

453 | n/a | # ---------------------- |

454 | n/a | # Other algorithms were not used because they: |

455 | n/a | # 1) Took much more auxiliary memory, |

456 | n/a | # 2) Made multiple passes over the data. |

457 | n/a | # 3) Made more comparisons in common cases (small k, large n, semi-random input). |

458 | n/a | # See the more detailed comparison of approach at: |

459 | n/a | # http://code.activestate.com/recipes/577573-compare-algorithms-for-heapqsmallest |

460 | n/a | |

461 | n/a | def nsmallest(n, iterable, key=None): |

462 | n/a | """Find the n smallest elements in a dataset. |

463 | n/a | |

464 | n/a | Equivalent to: sorted(iterable, key=key)[:n] |

465 | n/a | """ |

466 | n/a | |

467 | n/a | # Short-cut for n==1 is to use min() |

468 | n/a | if n == 1: |

469 | n/a | it = iter(iterable) |

470 | n/a | sentinel = object() |

471 | n/a | if key is None: |

472 | n/a | result = min(it, default=sentinel) |

473 | n/a | else: |

474 | n/a | result = min(it, default=sentinel, key=key) |

475 | n/a | return [] if result is sentinel else [result] |

476 | n/a | |

477 | n/a | # When n>=size, it's faster to use sorted() |

478 | n/a | try: |

479 | n/a | size = len(iterable) |

480 | n/a | except (TypeError, AttributeError): |

481 | n/a | pass |

482 | n/a | else: |

483 | n/a | if n >= size: |

484 | n/a | return sorted(iterable, key=key)[:n] |

485 | n/a | |

486 | n/a | # When key is none, use simpler decoration |

487 | n/a | if key is None: |

488 | n/a | it = iter(iterable) |

489 | n/a | # put the range(n) first so that zip() doesn't |

490 | n/a | # consume one too many elements from the iterator |

491 | n/a | result = [(elem, i) for i, elem in zip(range(n), it)] |

492 | n/a | if not result: |

493 | n/a | return result |

494 | n/a | _heapify_max(result) |

495 | n/a | top = result[0][0] |

496 | n/a | order = n |

497 | n/a | _heapreplace = _heapreplace_max |

498 | n/a | for elem in it: |

499 | n/a | if elem < top: |

500 | n/a | _heapreplace(result, (elem, order)) |

501 | n/a | top = result[0][0] |

502 | n/a | order += 1 |

503 | n/a | result.sort() |

504 | n/a | return [r[0] for r in result] |

505 | n/a | |

506 | n/a | # General case, slowest method |

507 | n/a | it = iter(iterable) |

508 | n/a | result = [(key(elem), i, elem) for i, elem in zip(range(n), it)] |

509 | n/a | if not result: |

510 | n/a | return result |

511 | n/a | _heapify_max(result) |

512 | n/a | top = result[0][0] |

513 | n/a | order = n |

514 | n/a | _heapreplace = _heapreplace_max |

515 | n/a | for elem in it: |

516 | n/a | k = key(elem) |

517 | n/a | if k < top: |

518 | n/a | _heapreplace(result, (k, order, elem)) |

519 | n/a | top = result[0][0] |

520 | n/a | order += 1 |

521 | n/a | result.sort() |

522 | n/a | return [r[2] for r in result] |

523 | n/a | |

524 | n/a | def nlargest(n, iterable, key=None): |

525 | n/a | """Find the n largest elements in a dataset. |

526 | n/a | |

527 | n/a | Equivalent to: sorted(iterable, key=key, reverse=True)[:n] |

528 | n/a | """ |

529 | n/a | |

530 | n/a | # Short-cut for n==1 is to use max() |

531 | n/a | if n == 1: |

532 | n/a | it = iter(iterable) |

533 | n/a | sentinel = object() |

534 | n/a | if key is None: |

535 | n/a | result = max(it, default=sentinel) |

536 | n/a | else: |

537 | n/a | result = max(it, default=sentinel, key=key) |

538 | n/a | return [] if result is sentinel else [result] |

539 | n/a | |

540 | n/a | # When n>=size, it's faster to use sorted() |

541 | n/a | try: |

542 | n/a | size = len(iterable) |

543 | n/a | except (TypeError, AttributeError): |

544 | n/a | pass |

545 | n/a | else: |

546 | n/a | if n >= size: |

547 | n/a | return sorted(iterable, key=key, reverse=True)[:n] |

548 | n/a | |

549 | n/a | # When key is none, use simpler decoration |

550 | n/a | if key is None: |

551 | n/a | it = iter(iterable) |

552 | n/a | result = [(elem, i) for i, elem in zip(range(0, -n, -1), it)] |

553 | n/a | if not result: |

554 | n/a | return result |

555 | n/a | heapify(result) |

556 | n/a | top = result[0][0] |

557 | n/a | order = -n |

558 | n/a | _heapreplace = heapreplace |

559 | n/a | for elem in it: |

560 | n/a | if top < elem: |

561 | n/a | _heapreplace(result, (elem, order)) |

562 | n/a | top = result[0][0] |

563 | n/a | order -= 1 |

564 | n/a | result.sort(reverse=True) |

565 | n/a | return [r[0] for r in result] |

566 | n/a | |

567 | n/a | # General case, slowest method |

568 | n/a | it = iter(iterable) |

569 | n/a | result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)] |

570 | n/a | if not result: |

571 | n/a | return result |

572 | n/a | heapify(result) |

573 | n/a | top = result[0][0] |

574 | n/a | order = -n |

575 | n/a | _heapreplace = heapreplace |

576 | n/a | for elem in it: |

577 | n/a | k = key(elem) |

578 | n/a | if top < k: |

579 | n/a | _heapreplace(result, (k, order, elem)) |

580 | n/a | top = result[0][0] |

581 | n/a | order -= 1 |

582 | n/a | result.sort(reverse=True) |

583 | n/a | return [r[2] for r in result] |

584 | n/a | |

585 | n/a | # If available, use C implementation |

586 | n/a | try: |

587 | n/a | from _heapq import * |

588 | n/a | except ImportError: |

589 | n/a | pass |

590 | n/a | try: |

591 | n/a | from _heapq import _heapreplace_max |

592 | n/a | except ImportError: |

593 | n/a | pass |

594 | n/a | try: |

595 | n/a | from _heapq import _heapify_max |

596 | n/a | except ImportError: |

597 | n/a | pass |

598 | n/a | try: |

599 | n/a | from _heapq import _heappop_max |

600 | n/a | except ImportError: |

601 | n/a | pass |

602 | n/a | |

603 | n/a | |

604 | n/a | if __name__ == "__main__": |

605 | n/a | |

606 | n/a | import doctest |

607 | n/a | print(doctest.testmod()) |