Reservoir Sampling Stream
0:00:00
Imagine you’re monitoring a live data stream of user actions on Yelp, such as page views or reviews, where each action is represented by an integer. The stream is potentially infinite, and you don’t know how many actions will arrive in total. Your task is to write an algorithm that, at any point, can select one action from all those seen so far, ensuring that every action has an equal chance of being chosen—no matter how many have arrived.
This is a classic scenario in data engineering and real-time analytics, where storing the entire stream is impractical due to memory constraints. Your algorithm must work efficiently with just a constant amount of extra space, updating your selection as each new action arrives.
Example:
Suppose the stream is 5, 8, 12, 7: - After seeing 5: selection must be 5. - After seeing 8: selection is 5 or 8, each with probability 0.5. - After seeing 12: selection is 5, 8, or 12, each with probability 1⁄3. - After seeing 7: selection is 5, 8, 12, or 7, each with probability 1⁄4.
Write the function below to implement this logic:
def select_random(x, current_choice, count):
"""
x: int
The new value arriving from the stream.
current_choice: int
The value currently selected.
count: int
Number of elements seen so far (including x).
Returns:
int -> Updated selected value.
"""
# Your code here
Note: You may only use O(1) additional space and cannot store the entire stream. The function is called once per incoming element.
.
.
.
.
Comments