J
ATUALIZED TO PARTNERSI hope I understand the problem.Let's go straight to the point:Creating A Dataframe>>> import pandas as pd
>>> dfA = pd.DataFrame({"sinal": ["SP1", "BH", "NAC", "SP1", "SP1", "BH", "BH", "BH", "BH", "BH"],
"cod_programa": ["XXXXX", "XXXXX", "XXXXX", "XXXXX", "XXXXX", "ZZZZZ", "ZZZZZ", "ZZZZZ", "ZZZZZ", "ZZZZZ"],
"data": ["23/05/2020", "01/05/2020", "02/05/2020", "01/05/2020", "02/05/2020", "02/05/2020", "03/05/2020", "01/05/2020", "02/05/2020", "03/05/2020"],
"tempo_disponivel": [10, 300, 118, 106, 6, 218, 66, 44, 190, 202]})
>>> dfA
sinal cod_programa data tempo_disponivel
0 SP1 XXXXX 23/05/2020 10
1 BH XXXXX 01/05/2020 300
2 NAC XXXXX 02/05/2020 118
3 SP1 XXXXX 01/05/2020 106
4 SP1 XXXXX 02/05/2020 6
5 BH ZZZZZ 02/05/2020 218
6 BH ZZZZZ 03/05/2020 66
7 BH ZZZZZ 01/05/2020 44
8 BH ZZZZZ 02/05/2020 190
9 BH ZZZZZ 03/05/2020 202
Transforming date (string) for date>>> dfA["data"] = pd.to_datetime(dfA['data'], format="%d/%m/%Y")
>>> dfA
sinal cod_programa data tempo_disponivel
0 SP1 XXXXX 2020-05-23 10
1 BH XXXXX 2020-05-01 300
2 NAC XXXXX 2020-05-02 118
3 SP1 XXXXX 2020-05-01 106
4 SP1 XXXXX 2020-05-02 6
5 BH ZZZZZ 2020-05-02 218
6 BH ZZZZZ 2020-05-03 66
7 BH ZZZZZ 2020-05-01 44
8 BH ZZZZZ 2020-05-02 190
9 BH ZZZZZ 2020-05-03 202
Creating DataFrame B>>> dfB = pd.DataFrame({"sinal": ["SP1", "SP1", "BH", "SP1", "NAC", "SP1", "SP1", "BH", "NAC", "SP1", "DF", "RJ"],
... "cod_programa": ["XXXX", "XXXX", "XXXX", "XXXX", "XXXX", "XXXX", "XXXX", "ZZZZ", "ZZZZ", "ZZZZ", "ZZZZ", "ZZZZ"],
... "data_exibicao": ["23/05/2020", "09/05/2020", "02/05/2020", "16/05/2020", "16/05/2020", "30/05/2020", "02/05/2020", "01/05/2020", "01/05/2020", "01/05/2020", "01/05/2020", "01/05/2020"],
... "data_inicio_programa": ["02/05/2020 19:01:00", "02/05/2020 19:01:00", "02/05/2020 19:01:00", "02/05/2020 19:05:00", "02/05/2020 19:05:00", "02/05/2020 19:05:00", "02/05/2020 19:05:00", "01/05/2020 11:30:00", "01/05/2020 11:30:00", "01/05/2020 11:30:00", "01/05/2020 11:30:00", "01/05/2020 11:30:00"],
... "media_audiencia": [325104.42, 255104.42, 455104.42, 118375.44, 1053999.80, 1299650.50, 14091402.82, 150998.68, 7846891.62, 841810.51, 123071.38, 581696.70]})
Transforming Dates>>> dfB["data_exibicao"] = pd.to_datetime(dfB['data_exibicao'], format="%d/%m/%Y")
>>> dfB["data_inicio_programa"] = pd.to_datetime(dfB['data_inicio_programa'], format="%d/%m/%Y %H:%M:%S")
>>> dfB
sinal cod_programa data_exibicao data_inicio_programa media_audiencia
0 SP1 XXXX 23/05/2020 02/05/2020 19:01:00 325104.42
1 SP1 XXXX 09/05/2020 02/05/2020 19:01:00 255104.42
2 BH XXXX 02/05/2020 02/05/2020 19:01:00 455104.42
3 SP1 XXXX 16/05/2020 02/05/2020 19:05:00 118375.44
4 NAC XXXX 16/05/2020 02/05/2020 19:05:00 1053999.80
5 SP1 XXXX 30/05/2020 02/05/2020 19:05:00 1299650.50
6 SP1 XXXX 02/05/2020 02/05/2020 19:05:00 14091402.82
7 BH ZZZZ 01/05/2020 01/05/2020 11:30:00 150998.68
8 NAC ZZZZ 01/05/2020 01/05/2020 11:30:00 7846891.62
9 SP1 ZZZZ 01/05/2020 01/05/2020 11:30:00 841810.51
10 DF ZZZZ 01/05/2020 01/05/2020 11:30:00 123071.38
11 RJ ZZZZ 01/05/2020 01/05/2020 11:30:00 581696.70
Catching the Existing Signals>>> sinais = list(set(dfB["sinal"]))
>>> sinais
['NAC', 'BH', 'DF', 'RJ', 'SP1']
Catching the biggest dates by sign>>> datas = []
>>> for sinal in sinais:
... dts = dfA[(dfA["sinal"] == sinal)].nlargest(4, "data")
... tmp = [row["data"] for _, row in dts.iterrows()]
... datas.append((sinal, tmp))
...
>>> datas
[('NAC', [Timestamp('2020-05-02 00:00:00')]), ('BH', [Timestamp('2020-05-03 00:00:00'), Timestamp('2020-05-03 00:00:00'), Timestamp('2020-05-02 00:00:00'), Timestamp('2020-05-02 00:00:00')]), ('DF', []), ('RJ', []), ('SP1', [Timestamp('2020-05-23 00:00:00'), Timestamp('2020-05-02 00:00:00'), Timestamp('2020-05-01 00:00:00')])]
Note: O numeric_only=True is already predicting future versions of pandas. The default of this parameter is None what in the future will bring the median of dates (datetime64 and datetime64tz).Catching medians by signal and date>>> for item in datas:
... sinal = item[0]
... dts = item[1]
... for dt in dts:
... d = f"{dt.year}-{dt.month}-{dt.day}"
... mediana = dfB[(dfB["sinal"] == sinal) & (dfB["data_exibicao"] == d)].median(numeric_only=True)
... print(sinal, d, mediana["media_audiencia"])
BH 2020-5-3 nan
BH 2020-5-3 nan
BH 2020-5-2 455104.42
BH 2020-5-2 455104.42
NAC 2020-5-2 nan
SP1 2020-5-23 325104.42
SP1 2020-5-2 14091402.82
SP1 2020-5-1 841810.51
Note: As I created the dataframes in hand, some dates did not hit, but reference server.I hope I helped