How quickly do we calculate the median data fream on the rolling data window?



  • Generating an accidental dateset and on a rolling window of 1,000 values, I think the median:

    %%time
    sr = pd.Series(np.random.randint(0,100, size=20000)) 
    for i in range(10): 
        sr.rolling(1000).apply(lambda x: np.median(x))
    

    Result:

    Wall time: 28.8 s

    Three seconds to one. Such calculations need a lot. And the real date is 0.5M, not 20k.

    How do you think a moving median is faster?



  • Use Pandas built-in techniques:

    In [265]: %timeit sr.rolling(1000).median()
    14.4 ms ± 513 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    

    In [266]: %timeit sr.rolling(1000).apply(lambda x: np.median(x))
    1.72 s ± 88.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    on my laptop, the difference is 119 times. ♪

    PS also questions - why do this many times? In the case of median values of different columns in one sliding window, this is also done by vectorized Pandas methods. free cycles



Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2