Screening of pandas dataframe with variable step



  • There is a daterime(s):

    import pandas as pd
    d = {'stores': ['AG21', 'AG41', 'AG85', 'AG45', 
    'AG31', 'AS25', 'AR81', 'AA43',
    'AG21', 'AD83', 'AA36', 'AG55',
    'AT58', 'AD11', 'AH32', 'AE17'], 
    'linear': [430, 145 , 120, 180,
    250, 250, 250, 320,
    376, 390, 420, 580,
    350, 190, 125, 390]}
    df = pd.DataFrame(data=d)
    df = df.sort_values(by='linear')
    df
    

    In the linear column, the values from the calculations of the other code and are sorted by increasing.

    Then, manually, lines run from 1 to 6. For example, for the datereima above the linear column, there would be some manually similar ratings (in the eye)

    import pandas as pd
    d = {'stores': ['AG21', 'AG41', 'AG85', 'AG45', 
    'AG31', 'AS25', 'AR81', 'AA43',
    'AG21', 'AD83', 'AA36', 'AG55',
    'AT58', 'AD11', 'AH32', 'AE17'], 
    'linear': [430, 145 , 120, 180,
    250, 250, 250, 320,
    376, 390, 420, 580,
    350, 190, 125, 390]}
    df = pd.DataFrame(data=d)
    df = df.sort_values(by='linear')
    df['ratings'] = 1,1,1,2,2,3,3,3,4,4,4,5,5,5,5,6
    df
    

    They are approximately applied according to the similarities of the upper lines (after depreciation) with a small step and if the step is very different, the rating increases.

    There is not always a sixth or fifth rating. Example below:

    import pandas as pd
    d = {'stores': ['AG21', 'AG41', 'AG85', 'AG45', 
    'AG31', 'AS25', 'AR81', 'AA43'],
    'linear': [330, 145 , 120, 180,
    250, 150, 185, 320]}
    df = pd.DataFrame(data=d)
    df = df.sort_values(by='linear')
    df['ratings'] = 1,2,2,3,3,4,5,5
    df
    

    Please indicate how these ratings can be automated?



  • If you've got the right values, and you just need to break them to the quantile segments, you can just do it:

    df["cat"] = pd.qcut(df["linear"], 6, labels=False).values+1
    

    df:

       stores  linear  cat
    2    AG85     120    1
    14   AH32     125    1
    1    AG41     145    1
    3    AG45     180    2
    13   AD11     190    2
    4    AG31     250    2
    5    AS25     250    2
    6    AR81     250    2
    7    AA43     320    4
    12   AT58     350    4
    8    AG21     376    4
    9    AD83     390    5
    15   AE17     390    5
    10   AA36     420    6
    0    AG21     430    6
    11   AG55     580    6
    

    If you don't have any categories (as 3 in this example), then none of the values under these conditions fall into third sex. If you need to get six categories ironly, then I suggest that you first determine the intervals and then reuse by the method. pd.cut:

    intervals = np.linspace(df["linear"].min(), df["linear"].max(), endpoint=True, num=7)
    print(intervals) # [120. 196.66666667 273.33333333 350. 426.66666667 503.33333333 580. ]
    df["cat"] = pd.cut(df["linear"], intervals, labels=False, include_lowest=True)+1
    

    df:

       stores  linear  cat
    2    AG85     120    1
    14   AH32     125    1
    1    AG41     145    1
    3    AG45     180    1
    13   AD11     190    1
    4    AG31     250    2
    5    AS25     250    2
    6    AR81     250    2
    7    AA43     320    3
    12   AT58     350    3
    8    AG21     376    4
    9    AD83     390    4
    15   AE17     390    4
    10   AA36     420    4
    0    AG21     430    5
    11   AG55     580    6
    

    UPDATE

    If you need to divide the dates just into approximately equal parts, ***с потерей статистической значимости***a simple grouping:

    d = {'stores': ['AG21', 'AG41', 'AG85', 'AG45', 'AG31', 'AS25', 'AR81', 'AA43'], 'linear': [330, 145 , 120, 180, 250, 150, 185, 320]}
    df = pd.DataFrame(data=d)
    df = df.sort_values(by='linear')
    chunks = 6
    df["cat"] = df.groupby(np.arange(len(df))//(len(df)/chunks)).ngroup()+1
    print(df)
    

    df:

      stores  linear  cat
    2   AG85     120    1
    1   AG41     145    1
    5   AS25     150    2
    3   AG45     180    3
    6   AR81     185    4
    4   AG31     250    4
    7   AA43     320    5
    0   AG21     330    6​
    


Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2