Insertion Sort and Merge Sort

Why bother with sorting data? This lecture discusses that question and then goes into two different sorting methods. You will notice that many search algorithms assume sorted data, especially with the large data volumes that are common today.

Transcript:

The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.

PROFESSOR: So today's lecture is on sorting. We'll be talking about specific sorting algorithms today. I want to start by motivating why we're interested in sorting, which should be fairly easy. Then I want to discuss a particular sorting algorithm that's called insertion sort. That's probably the simplest sorting algorithm you can write, it's five lines of code. It's not the best sorting algorithm that's out there and so we'll try and improve it. We'll also talk about merge sort, which is a divide and conquer algorithm and that's going to motivate the last thing that I want to spend time on, which is recurrences and how you solve recurrences. Typically the recurrences that we'll be looking at in double o six are going to come from divide and conquer problems like merge sort but you'll see this over and over.

So let's talk about why we're interested in sorting. There's some fairly obvious applications like if you want to maintain a phone book, you've got a bunch of names and numbers corresponding to a telephone directory and you want to keep them in sorted order so it's easy to search, mp3 organizers, spreadsheets, et cetera. So there's lots of obvious applications. There's also some interesting problems that become easy once items are sorted. One example of that is finding a median.

So let's say that you have a bunch of items in an array a zero through n and a zero through n contains n numbers and they're not sorted. When you sort, you turn this into b 0 through n, where if it's just numbers, then you may sort them in increasing order or decreasing order. Let's just call it increasing order for now. Or if they're records, and they're not numbers, then you have to provide a comparison function to determine which record is smaller than another record. And that's another input that you have to have in order to do the sorting.

So it doesn't really matter what the items are as long as you have the comparison function. Think of it as less than or equal to. And if you have that and it's straightforward, obviously, to check that 3 is less than 4, et cetera. But it may be a little more complicated for more sophisticated sorting applications.

But the bottom line is that if you have your algorithm that takes a comparison function as an input, you're going to be able to, after a certain amount of time, get B 0 n. Now if you wanted to find the median of the set of numbers that were originally in the array A, what would you do once you have the sorted array B?

AUDIENCE: Isn't there a more efficient algorithm for median?

PROFESSOR: Absolutely. But this is sort of a side effect of having a sorted list. If you happen to have a sorted list, there's many ways that you could imagine building up a sorted list. One way is you have something that's completely unsorted and you run insertion sort or merge sort. Another way would be to maintain a sorted list as you're getting items put into the list.

So if you happened to have a sorted list and you need to have this sorted list for some reason, the point I'm making here is that finding the median is easy. And it's easy because all you have to do is look at-- depending on whether n is odd or even-- look at B of n over 2. That would give you the median because you'd have a bunch of numbers that are less than that and the equal set of numbers that are greater than that, which is the definition of median.

So this is not necessarily the best way, as you pointed out, of finding the median. But it's constant time if you have a sorted list. That's the point I wanted to make.

There are other things that you could do. And this came up in Erik's lecture, which is the notion of binary search-- finding an element in an array-- a specific element. You have a list of items-- again a 0 through n. And you're looking for a specific number or item.

You could, obviously, scan the array, and that would take you linear time to find this item. If the array happened to be sorted, then you can find this in logarithmic time using what's called binary search. Let's say you're looking for a specific item. Let's call it k. Binary search, roughly speaking, would work like-- you go compare k to, again, B of n over 2, and decide, given that B is sorted, you get to look at 1/2 of the array.

If B of n over 2 is not exactly k, then-- well, if it's exactly k you're done. Otherwise, you look at the left half. You do your divide and conquer paradigm. And you can do this in logarithmic time. So keep this in mind, because binary search is going to come up in today's lecture and again in other lectures.

It's really a great paradigm of divide and conquer-- probably the simplest. And it, essentially, takes something that's linear-- a linear search-- and turns it into logarithmic search. So those are a couple of problems that become easy if you have a sorted list. And there's some not so obvious applications of sorting-- for example, data compression.

If you wanted to compress a file, one of the things that you could do is to-- and it's a set of items-- you could sort the items. And that automatically finds duplicates. And you could say, if I have 100 items that are all identical, I'm going to compress the file by representing the item once and, then, having a number associated with the frequency of that item-- similar to what document distance does.

Document distance can be viewed as a way of compressing your initial input. Obviously, you lose the works of Shakespeare or whatever it was. And it becomes a bunch of words and frequencies. But it is something that compresses the input and gives you a different representation. And so people use sorting as a subroutine in data compression.

Computer graphics uses sorting. Most of the time, when you render scenes in computer graphics, you have many layers corresponding to the scenes. It turns out that, in computer graphics, most of the time you're actually rendering front to back because, when you have a big opaque object in front, you want to render that first, so you don't have to worry about everything that's occluded by this big opaque object. And that makes things more efficient. And so you keep things sorted front to back, most of the time, in computer graphics rendering.

But some of the time, if you're worried about transparency, you have to render things back to front. So typically, you have sorted lists corresponding to the different objects in both orders-- both increasing order and decreasing order. And you're maintaining that. So sorting is a real important subroutine in pretty much any sophisticated application you look at.

So it's worthwhile to look at the variety of sorting algorithms that are out there. And we're going to do some simple ones, today. But if you go and look at Wikipedia and do a Google search, there's all sorts of sorts like cocktail sort, and bitonic sort, and what have you. And there's reasons why each of these sorting algorithms exist. Because in specific cases, they end up winning on types of inputs or types of problems.

So let's take a look at our first sorting algorithm. I'm not going to write code but it will be in the notes. And it is in your document distance Python files. But I'll just give you pseudocode here and walk through what insertion sort looks like because the purpose of describing this algorithm to you is to analyze its complexity. We need to do some counting here, with respect to this algorithm, to figure out how fast it's going to run in and what the worst case complexity is.

So what is insertion sort? For i equals 1, 2, through n, given an input to be sorted, what we're going to do is we're going to insert A of i in the right position. And we're going to assume that we are sort of midway through the sorting process, where we have sorted A 0 through i minus 1. And we're going to expand this to this array to have i plus 1 elements. And A of i is going to get inserted into the correct position.

And we're going to do this by pairwise swaps down to the correct position for the number that is initially in A of i. So let's go through an example of this. We're going to sort in increasing order. Just have six numbers. And initially, we have 5, 2, 4, 6, 1, 3. And we're going to take a look at this.

And you start with the index 1, or the second element, because the very first element-- it's a single element and it's already sorted by definition. But you start from here. And this is what we call our key. And that's essentially a pointer to where we're at, right now. And the key keeps moving to the right as we go through the different steps of the algorithm.

And so what you do is you look at this and you have-- this is A of i. That's your key. And you have A of 0 to 0, which is 5. And since we want to sort in increasing order, this is not sorted. And so we do a swap.

So what this would do in this step is to do a swap. And we would go obtain 2, 5, 4, 6, 1, 3. So all that's happened here, in this step-- in the very first step where the key is in the second position-- is one swap happened.

Now, your key is here, at item 4. Again, you need to put 4 into the right spot. And so you do pairwise swaps. And in this case, you have to do one swap. And you get 2, 4, 5. And you're done with this iteration.

So what happens here is you have 2, 4, 5, 6, 1, 3. And now, the key is over here, at 6. Now, at this point, things are kind of easy, in the sense that you look at it and you say, well, I know this part is already started. 6 is greater than 5. So you have to do nothing.

So there's no swaps that happen in this step. So all that happens here is you're going to move the key to one step to the right. So you have 2, 4, 5, 6, 1, 3. And your key is now at 1. Here, you have to do more work. Now, you see one aspect of the complexity of this algorithm-- given that you're doing pairwise swaps-- the way this algorithm was defined, in pseudocode, out there, was I'm going to use pairwise swaps to find the correct position.

So what you're going to do is you're going to have to swap first 1 and 6. And then you'll swap-- 1 is over here. So you'll swap this position and that position. And then you'll swap-- essentially, do 4 swaps to get to the point where you have 1, 2, 4, 5, 6, 3. So this is the result. 1, 2, 4, 5, 6, 3.

And the important thing to understand, here, is that you've done four swaps to get 1 to the correct position. Now, you could imagine a different data structure where you move this over there and you shift them all to the right. But in fact, that shifting of these four elements is going to be computed in our model as four operations, or four steps, anyway. So there's no getting away from the fact that you have to do four things here. And the way the code that we have for insertion sort does this is by using pairwise swaps.

So we're almost done. Now, we have the key at 3. And now, 3 needs to get put into the correct position. And so you've got to do a few swaps. This is the last step. And what happens here is 3 is going to get swapped with 6. And then 3 needs to get swapped with 5. And then 3 needs to get swapped with 4. And then, since 3 is greater than 2, you're done. So you have 1, 2, 3, 4, 5, 6.

And that's it. So, analysis. How many steps do I have?

AUDIENCE: n squared?

PROFESSOR: No, how many steps do I have? I guess that wasn't a good question. If I think of a step as being a movement of the key, how many steps do I have? I have theta n steps. And in this case, you can think of it as n minus 1 steps, since you started with 2. But let's just call it theta n steps, in terms of key positions.

And you're right. It is n square because, at any given step, it's quite possible that I have to do theta n work. And one example is this one, right here, where I had to do four swaps. And in general, you can construct a scenario where, towards the end of the algorithm, you'd have to do theta n work. But if you had a list that was reverse sorted. You would, essentially, have to do, on an average n by two swaps as you go through each of the steps. And that's theta n.

So each step is theta n swaps. And when I say swaps, I could also say each step is theta n compares and swaps. And this is going to be important because I'm going to ask you an interesting question in a minute. But let me summarize. What I have here is a theta n squared algorithm. The reason this is a theta n squared algorithm is because I have theta n steps and each step is theta n.

When I'm counting, what am I counting it terms of operations? The assumption here-- unspoken assumption-- has been that an operation is a compare and a swap and they're, essentially, equal in cost.

And in most computers, that's true. You have a single instruction and, say, the x86 or the MIPS architecture that can do a compare, and the same thing for swapping registers. So perfectly reasonably assumption that compares and swaps for numbers have exactly the same cost. But if you had a record and you were comparing records, and the comparison function that you used for the records was in itself a method call or a subroutine, it's quite possible that all you're doing is swapping pointers or references to do the swap, but the comparison could be substantially more expensive.

Most of the time-- and we'll differentiate if it becomes necessary-- we're going to be counting comparisons in the sorting algorithms that we'll be putting out. And we'll be assuming that either comparison swaps are roughly the same or that compares are-- and we'll say which one, of course-- that compares are substantially more expensive than swaps. So if you had either of those cases for insertion sort, you have a theta n squared algorithm. You have theta n squared compares and theta n squared swaps.

Now, here's a question. Let's say that compares are more expensive than swaps. And so, I'm concerned about the theta n squared comparison cost. I'm not as concerned, because of the constant factors involved, with the theta n squared swap cost.

This is a question question. What's a simple fix-- change to this algorithm that would give me a better complexity in the case where compares are more expensive, or I'm only looking at the complexity of compares. So the theta whatever of compares. Anyone? Yeah, back there.

AUDIENCE: [INAUDIBLE]

PROFESSOR: You could compare with the middle. What did I call it? I called it something. What you just said, I called it something.

AUDIENCE: Binary search.

PROFESSOR: Binary search. That's right. Two cushions for this one. So you pick them up after lecture. So you're exactly right. You got it right. I called it binary search, up here.

And so you can take insertion sort and you can sort of trivially turn it into a theta n log n algorithm if we are talking about n being the number of compares. And all you have to do to do that is to say, you know what, I'm going to replace this with binary search. And you can do that-- and that was the key observation-- because A of 0 through i minus 1 is already sorted. And so you can do binary search on that part of the array.

So let me just write that down. Do a binary search on A of 0 through i minus 1, which is already sorted. And essentially, you can think of it as theta log i time, and for each of those steps. And so then you get your theta n log n theta n log n in terms of compares.

Does this help the swaps for an array data structure? No, because binary search will require insertion into A of 0 though i minus 1. So here's the problem. Why don't we have a full-fledged theta n log n algorithm, regardless of the cost of compares or swaps? We don't quite have that. We don't quite have that because we need to insert our A of i into the right position into A of 0 through i minus 1.

You do that if you have an array structure, it might get into the middle. And you have to shift things over to the right. And when you shift things over to the right, in the worst case, you may be shifting a lot of things over to the right. And that gets back to worst case complexity of theta n.

So a binary search in insertion sort gives you theta n log n for compares. But it's still theta n squared for swaps. So as you can see, there's many varieties of sorting algorithms. We just looked at a couple of them. And they were both insertion sort.

The second one that I just put up is, I guess, technically called binary insertion sort because it does binary search. And the vanilla insertion sort is the one that you have the code for in the doc dis program, or at least one of the doc dis files.

So let's move on and talk about a different algorithm. So what we'd like to do, now-- this class is about constant improvement. We're never happy. We always want to do a little bit better. And eventually, once we run out of room from an asymptotic standpoint, you take these other classes where you try and improve constant factors and get 10%, and 5%, and 1%, and so on, and so forth.

But we'll stick to improving asymptotic complexity. And we're not quite happy with binary insertion sort because, in the case of numbers, our binary insertion sort has theta n squared complexity, if you look at swaps. So we'd like to go find an algorithm that is theta n log n.

And I guess, eventually, we'll have to stop. But Erik will take care of that. There's a reason to stop. It's when you can prove that you can't do any better. And so we'll get to that, eventually.

So merge sort is also something that you've probably seen. But there probably will be a couple of subtleties that come out as I describe this algorithm that, hopefully, will be interesting to those of you who already know merge sort. And for those of you who don't, it's a very pretty algorithm.

It's a standard recursion algorithm-- recursive algorithm-- similar to a binary search. What we do, here, is we have an array, A. We split it into two parts, L and R. And essentially, we kind of do no work, really.

In terms of the L and R in the sense that we just call, we keep splitting, splitting, splitting. And all the work is done down at the bottom in this routine called merge, where we are merging a pair of elements at the leaves. And then, we merge two pairs and get four elements. And then we merge four tuples of elements, et cetera, and go all the way up.

So while I'm just saying L terms into L prime, out here, there's no real explicit code that you can see that turns L into L prime. It happens really later. There's no real sorting code, here. It happens in the merge routine. And you'll see that quite clearly when we run through an example.

So you have L and R turn into L prime and R prime. And what we end up getting is a sorted array, A. And we have what's called a merge routine that takes L prime and R prime and merges them into the sorted array.

So at the top level, what you see is split into two, and do a merge, and get to the sorted array. The input is of size n. You have two arrays of size n over 2. These are two sorted arrays of size n over 2. And then, finally, you have a sorted array of size n.

So if you want to follow the recursive of execution of this in a small example, then you'll be able to see how this works. And we'll do a fairly straightforward example with 8 elements. So at the top level-- before we get there, merge is going to assume that you have two sorted arrays, and merge them together. That's the invariant in merge sort, or for the merge routine. It assumes the inputs are sorted-- L and R. Actually I should say, L prime and R prime.

So let's say you have 20, 13, 7, and 2. You have 12, 11, 9, and 1. And this could be L prime. And this could be R prime. What you have is what we call a two finger algorithm.

And so you've got two fingers and each of them point to something. And in this case, one of them is pointing to L. My left finger is pointing to L prime, or some element L prime. My right finger is pointing to some element in R prime.

And I'm going to compare the two elements that my fingers are pointing to. And I'm going to choose, in this case, the smaller of those elements. And I'm going to put them into the sorted array.

So start out here. Look at that and that. And I compared 2 and 1. And which is smaller? 1 is smaller. So I'm going to write 1 down. This is a two finger algo for merge. And I put 1 down.

When I put 1 down, I had to cross out 1. So effectively, what happens is-- let me just circle that instead of crossing it out. And my finger moves up to 9. So now I'm pointing at 2 and 9. And I repeat this step.

So now, in this case, 2 is smaller. So I'm going to go ahead and write 2 down. And I can cross out 2 and move my finger up to 7. And so that's it. I won't bore you with the rest of the steps.

It's essentially walking up. You have a couple of pointers and you're walking up these two arrays. And you're writing down 1, 2, 7, 9, 11, 12, 13, 20. And that's your merge routine.

And all of the work, really, is done in the merge routine because, other than that, the body is simply a recursive call. You have to, obviously, split the array. But that's fairly straightforward.

If you have an array, A 0 through n-- and depending on whether n is odd or even-- you could imagine that you set L to be A 0 n by 2 minus 1, and R similarly. And so you just split it halfway in the middle. I'll talk about that a little bit more. There's a subtlety associated with that that we'll get to in a few minutes.

But to finish up in terms of the computation of merge sort. This is it. The merge routine is doing most, if not all, of the work. And this two finger algorithm is going to be able to take two sorted arrays and put them into a single sorted array by interspersing, or interleaving, these elements. And what's the complexity of merge if I have two arrays of size n over 2, here? What do I have?

AUDIENCE: n.

PROFESSOR: n. We'll give you a cushion, too. theta n complexity. So far so good.

I know you know the answer as to what the complexity of merge sort is. But I'm guessing that most of you won't be able to prove it to me because I'm kind of a hard guy to prove something to. And I could always say, no, I don't believe you or I don't understand.

The complexity-- and you've said this before, in class, and I think Erik's mentioned it-- the overall complexity of this algorithm is theta n log n And where does that come from? How do you prove that?

And so what we'll do, now, is take a look at merge sort. And we'll look at the recursion tree. And we'll try and-- there are many ways of proving that merge sort is theta n log n.

The way we're going to do this is what's called proof by picture. And it's not an established proof technique, but it's something that is very helpful to get an intuition behind the proof and why the result is true. And you can always take that and you can formalize it and make this something that everyone believes. And we'll also look at substitution, possibly in section tomorrow, for recurrence solving.

So where we're right now is that we have a divide and conquer algorithm that has a merge step that is theta n. And so, if I just look at this structure that I have here, I can write a recurrence for merge sort that looks like this. So when I say complexity, I can say T of n, which is the work done for n items, is going to be some constant time in order to divide the array.

So this could be the part corresponding to dividing the array. And there's going to be two problems of size n over 2. And so I have 2 T of n over 2. And this is the recursive part. And I'm going to have c times n, which is the merge part. And that's some constant times n, which is what we have, here, with respect to the theta n complexity.

So you have a recurrence like this and I know some of you have seen recurrences in 6.042. And you know how to solve this. What I'd like to do is show you this recursion tree expansion that, not only tells you how to solve this occurrence, but also gives you a means of solving recurrences where, instead of having c of n, you have something else out here. You have f of n, which is a different function from the linear function. And this recursion tree is, in my mind, the simplest way of arguing the theta n log n complexity of merge sort.

So what I want to do is expand this recurrence out. And let's do that over here. So I have c of n on top. I'm going to ignore this constant factor because c of n dominates. So I'll just start with c of n. I want to break things up, as I do the recursion.

So when I go c of n, at the top level-- that's the work I have to do at the merge, at the top level. And then when I go down to two smaller problems, each of them is size n over 2. So I do c times n divided by 2 [INAUDIBLE]. So this is just a constant c. I didn't want to write thetas up here. You could. And I'll say a little bit more about that later. But think of this cn as representing the theta n complexity. And c is this constant.

So c times n, here. c times n over 2, here. And then when I keep going, I have c times n over 4, c times n over 4, et cetera, and so on, and so forth. And when I come down all the way here, n is eventually going to become 1-- or essentially a constant-- and I'm going to have a bunch of c's here.

So here's another question, that I'd like you to answer. Someone tell me what the number of levels in this tree are, precisely, and the number of leaves in this tree are, precisely.

AUDIENCE: The number of levels is log n plus 1.

PROFESSOR: Log n plus 1. Log to the base 2 plus 1. And the number of leaves? You raised your hand back there, first. Number of leaves.

AUDIENCE: I think n.

PROFESSOR: Yeah, you're right. You think right. So 1 plus log n and n leaves. When n becomes 1, how many of them do you have? You're down to a single element, which is, by definition, sorted. And you have n leaves.

So now let's add up the work. I really like this picture because it's just so intuitive in terms of getting us the result that we're looking for. So you add up the work in each of the levels of this tree. So the top level is cn. The second level is cn because I added 1/2 and 1/2, cn, cn. Wow. What symmetry.

So you're doing the same amount of work modulo the constant factors, here, with what's going on with the c1, which we've ignored, but roughly the same amount of work in each of the levels. And now, you know how many levels there are. It's 1 plus log n.

So if you want to write an equation for T of n, it's 1 plus log n times c of n, which is theta of n log n. So I've mixed in constants c and thetas. For the purposes of this description, they're interchangeable. You will see recurrences that look like this, in class. And things like that.

Don't get confused. It's just a constant multiplicative factor in front of the function that you have. And it's just a little easier, I think, to write down these constant factors and realize that the amount of work done is the same in each of the leaves. And once you know the dimensions of this tree, in terms of levels and in terms of the number of leaves, you get your result.

So we've looked at two algorithm, so far. And insertion sort, if you talk about numbers, is theta n squared for swaps. Merge sort is theta n log n. Here's another interesting question. What is one advantage of insertion sort over merge sort?

AUDIENCE: [INAUDIBLE]

PROFESSOR: What does that mean?

AUDIENCE: You don't have to move elements outside of [INAUDIBLE].

PROFESSOR: That's exactly right. That's exactly right. So the two guys who answered the questions before with the levels, and you. Come to me after class.

So that's a great answer. It's in-place sorting is something that has to do with auxiliary space. And so what you see, here-- and it was a bit hidden, here. But the fact of the matter is that you had L prime and R prime. And L prime and R prime are different from L and R, which were the initial halves of the inputs to the sorting algorithm.

And what I said here is, we're going to dump this into A. That's what this picture shows. This says sorted array, A. And so you had to make a copy of the array-- the two halves L and R-- in order to do the recursion, and then to take the results and put them into the sorted array, A.

So you needed-- in merge sort-- you needed theta n auxiliary space. So merge sort, you need theta n extra space. And the definition of in-place sorting implies that you have theta 1-- constant-- auxiliary space.

The auxiliary space for insertion sort is simply that temporary variable that you need when you swap two elements. So when you want to swap a couple of registers, you gotta store one of the values in a temporary location, override the other, et cetera. And that's the theta 1 auxiliary space for insertion sort.

So there is an advantage of the version of insertion sort we've talked about, today, over merge sort. And if you have a billion elements, that's potentially something you don't want to store in memory. If you want to do something really fast and do everything in cache or main memory, and you want to sort billions are maybe even trillions of items, this becomes an important consideration.

I will say that you can reduce the constant factor of the theta n. So in the vanilla scheme, you could imagine that you have to have a copy of the array. So if you had n elements, you essentially have n extra items of storage. You can make that n over 2 with a simple coding trick by keeping 1/2 of A.

You can throw away one of the L's or one of the R's. And you can get it down to n over 2. And that turns out-- it's a reasonable thing to do if you have a billion elements and you want to reduce your storage by a constant factor. So that's one coding trick.

Now it turns out that you can actually go further. And there's a fairly sophisticated algorithm that's sort of beyond the scope of 6.006 that's an in-place merge sort. And this in-place merge sort is kind of impractical in the sense that it doesn't do very well in terms of the constant factors.

So while it's in-place and it's still theta n log n. The problem is that the running time of an in-place merge sort is much worse than the regular merge sort that uses theta n auxiliary space.

So people don't really use in-place merge sort. It's a great paper. It's a great thing to read. Its analysis is a bit sophisticated for double 0 6. So we wont go there. But it does exist. So you can take merge sort, and I just want to let you know that you can do things in-place.

In terms of numbers, some experiments we ran a few years ago-- so these may not be completely valid because I'm going to actually give you numbers-- but merge sort in Python, if you write a little curve fit program to do this, is 2.2n log n microseconds for a given n.

So this is the merge sort routine. And if you look at insertion sort, in Python, that's something like 0.2 n square microseconds. So you see the constant factors here.

If you do insertion sort in C, which is a compiled language, then, it's much faster. It's about 20 times faster. It's 0.01 n squared microseconds. So a little bit of practice on the side. We do ask you to write code. And this is important. The reason we're interested in algorithms is because people want to run them.

And what you can see is that you can actually find an n-- so regardless of whether you're Python or C, this tells you that asymptotic complexity is pretty important because, once n gets beyond about 4,000, you're going to see that merge sort in Python beats insertion sort in C.

So the constant factors get subsumed beyond certain values of n. So that's why asymptotic complexity is important. You do have a factor of 20, here, but that doesn't really help you in terms of keeping an n square algorithm competitive. It stays competitive for a little bit longer, but then falls behind.

That's what I wanted to cover for sorting. So hopefully, you have a sense of what happens with these two sorting algorithms. We'll look at a very different sorting algorithm next time, using heaps, which is a different data structure.

The last thing I want to do in the couple minutes I have left is give you a little more intuition as to recurrence solving based on this diagram that I wrote up there. And so we're going to use exactly this structure. And we're going to look at a couple of different recurrences that I won't really motivate in terms of having a specific algorithm, but I'll just write out the recurrence. And we'll look at the recursion tree for that. And I'll try and tease out of you the complexity associated with these recurrences of the overall complexity.

So let's take a look at T of n equals 2 T of n over 2 plus c n squared. Let me just call that c-- no need for the brackets. So constant c times n squared.

So if you had a crummy merge routine, and it was taking n square, and you coded it up wrong. It's not a great motivation for this recurrence, but it's a way this recurrence could have come up.

So what does this recursive tree look like? Well it looks kind of the same, obviously. You have c n square; you have c n square divided by 4; c n square divided by 4; c n square divided by 16, four times. Looking a little bit different from the other one.

The levels and the leaves are exactly the same. Eventually n is going to go down to 1. So you will see c all the way here. And you're going to have n leaves. And you will have, as before, 1 plus log n levels. Everything is the same.

And this is why I like this recursive tree formulation so much because, now, all I have to do is add up the work associated with each of the levels to get the solution to the recurrence. Now, take a look at what happens, here. c n square; c n square divided by 2; c n square divided by 4. And this is n times c. So what does that add up to?

AUDIENCE: [INAUDIBLE]

PROFESSOR: Yeah, exactly. Exactly right. So if you look at what happens, here, this dominates. All of the other things are actually less than that. And you said bounded by two c n square because this part is bounded by c n square and I already have c n square up at the top.

So this particular algorithm that corresponds to this crummy merge sort, or wherever this recurrence came from, is a theta n squared algorithm. And in this case, all of the work done is at the root-- at the top level of the recursion. Here, there was a roughly equal amount of work done in each of the different levels. Here, all of the work was done at the root.

And so to close up shop, here, let me just give you real quick a recurrence where all of the work is done at the leaves, just for closure. So if I had, magically, a merge routine that actually happened in constant time, either through buggy analysis, or because of it was buggy, then what does the tree look like for that?

And I can think of this as being theta 1. Or I can think of this as being just a constant c. I'll stick with that. So I have c, c, c. Woah, I tried to move that up. That doesn't work.

So I have n leaves, as before. And so if I look at what I have, here, I have c at the top level. I have 2c, and so on and so forth. 4c. And then I go all the way down to nc.

And so what happens here is this dominates. And so, in this recurrence, the whole thing runs in theta n. So the solution to that is theta n. And what you have here is all of the work being done at the leaves.

We're not going to really cover this theorem that gives you a mechanical way of figuring this out because we think the recursive tree is a better way of looking at. But you can see that, depending on what that function is, in terms of the work being done in the merge routine, you'd have different versions of recurrences. I'll stick around, and people who answered questions, please pick up you cushions. See you next time.


Source: Srini Devadas, https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/lecture-3-insertion-sort-merge-sort/
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License.

Last modified: Thursday, January 25, 2024, 12:37 PM