Problem Overview
You are designing a data structure that receives numbers one by one from a stream.
At any moment, you must be able to return the median of all numbers seen so far.
Definition of median:
-
If the count is odd → the middle value
-
If the count is even → the average of the two middle values
This problem is frequently asked by Google because it tests data structures, balancing logic, and real-time computation.
Example
Operations:
addNum(1)
addNum(2)
findMedian()
addNum(3)
findMedian()
State evolution:
[1] → median = 1
[1, 2] → median = (1 + 2) / 2 = 1.5
[1, 2, 3] → median = 2
Approach 1: Sort After Every Insertion
Idea:
-
Keep all numbers in a list
-
Sort the list after each insertion
-
Pick the middle value
Disadvantages:
-
Sorting costs O(n log n)
-
Too slow for up to 50,000 operations
-
Repeated sorting wastes time
This approach is unacceptable for large data streams.
Approach 2: Keep a Sorted Data Structure
Idea:
-
Insert numbers in sorted order
-
Use binary search to find the correct position
Disadvantages:
-
Insertion costs O(n) due to shifting elements
-
Still too slow overall
This improves sorting but still fails performance constraints.
Best Approach: Two Heaps Strategy
Core Idea
Split the numbers into two halves:
-
Left half → smaller numbers
-
Right half → larger numbers
Maintain the following rule:
-
Left side has equal or one more element than the right side
-
All elements in left ≤ all elements in right
This allows the median to be read instantly.
Data Structures Used
-
Max-heap (
left)
Stores the smaller half, largest element on top -
Min-heap (
right)
Stores the larger half, smallest element on top
Insertion Logic
When a new number arrives:
-
If it is smaller than or equal to the top of the left heap → push to left
-
Otherwise → push to right
-
Rebalance:
-
Left heap can have at most one extra element
-
If imbalance happens, move elements between heaps
-
Median Calculation
-
If both heaps have the same size
→ median is the average of both top elements -
If left heap has one extra element
→ median is the top of left heap
This works because left always contains the middle element.
Why This Works
-
Insertion takes O(log n)
-
Median retrieval takes O(1)
-
No sorting required
-
Works perfectly for streaming data
This is the optimal solution.
Time and Space Complexity
-
addNum → O(log n)
-
findMedian → O(1)
-
Space → O(n)
Final Code (Two Heaps)
class MedianFinder {
priority_queue<int> left;
priority_queue<int, vector<int>, greater<int>> right;
public:
MedianFinder() {}
void addNum(int num) {
if (left.empty() || num <= left.top())
left.push(num);
else
right.push(num);
if (left.size() > right.size() + 1) {
right.push(left.top());
left.pop();
} else if (right.size() > left.size()) {
left.push(right.top());
right.pop();
}
}
double findMedian() {
if (left.size() == right.size())
return (left.top() + right.top()) / 2.0;
return left.top();
}
};
