How do you choose a unique line of time among the same lines?
-
There's a lot of lines that have the same lines on the column 'name', but they have different dates.
I just need to keep that line from every group where the date is fresh. I've been trying to do this for six hours, but I'm not successful. Any idea, please?
-
Module management https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html :
- Read Excel file in Pandas DataFrame
- We'll sort out the dates.
- grouping classified names
- select only one last line from each group.
- Maintain the result of the new Excel file.
Example:
import pandas as pd
df = pd.read_excel("filename.xlsx")
res = df.sort_values("date").groupby("name", as_index=False).last(1)
res.to_excel("result.xlsx", index=False)
Example of data:
name date
0 aaa 2021-10-10
1 bbb 2020-01-01
2 aaa 2021-09-09
3 bbb 2020-02-02
4 ccc 2000-12-31
result:
In [223]: df.sort_values("date").groupby("name", as_index=False).last(1)
Out[223]:
name date
0 aaa 2021-10-10
1 bbb 2020-02-02
2 ccc 2000-12-31
Example of decision for published file:
import pandas as pd
df = pd.read_excel("test2.xlsx", parse_dates=["Время договора"])
res = df.groupby("Наименование инструмента", as_index=False).apply(lambda x: x.nlargest(1, "Время договора"))
res.to_excel("result.xlsx", index=False)