Reservoir Sampling Stream

Start Timer

0:00:00

Upvote
1
Downvote
Save question
Mark as completed
View comments (13)

Imagine you’re monitoring a live data stream of user actions on Yelp, such as page views or reviews, where each action is represented by an integer. The stream is potentially infinite, and you don’t know how many actions will arrive in total. Your task is to write an algorithm that, at any point, can select one action from all those seen so far, ensuring that every action has an equal chance of being chosen—no matter how many have arrived.

This is a classic scenario in data engineering and real-time analytics, where storing the entire stream is impractical due to memory constraints. Your algorithm must work efficiently with just a constant amount of extra space, updating your selection as each new action arrives.

Example:

Suppose the stream is 5, 8, 12, 7: - After seeing 5: selection must be 5. - After seeing 8: selection is 5 or 8, each with probability 0.5. - After seeing 12: selection is 5, 8, or 12, each with probability 13. - After seeing 7: selection is 5, 8, 12, or 7, each with probability 14.

Write the function below to implement this logic:

def select_random(x, current_choice, count):
    """
    x: int
        The new value arriving from the stream.
    current_choice: int
        The value currently selected.
    count: int
        Number of elements seen so far (including x).

    Returns:
        int -> Updated selected value.
    """
    # Your code here

Note: You may only use O(1) additional space and cannot store the entire stream. The function is called once per incoming element.

.
.
.
.
.


Comments

Loading comments