How do you choose a unique line of time among the same lines?



  • There's a lot of lines that have the same lines on the column 'name', but they have different dates.

    I just need to keep that line from every group where the date is fresh. I've been trying to do this for six hours, but I'm not successful. Any idea, please?



  • Module management https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html :

    • Read Excel file in Pandas DataFrame
    • We'll sort out the dates.
    • grouping classified names
    • select only one last line from each group.
    • Maintain the result of the new Excel file.

    Example:

    import pandas as pd
    

    df = pd.read_excel("filename.xlsx")
    res = df.sort_values("date").groupby("name", as_index=False).last(1)
    res.to_excel("result.xlsx", index=False)

    Example of data:

      name        date
    0 aaa 2021-10-10
    1 bbb 2020-01-01
    2 aaa 2021-09-09
    3 bbb 2020-02-02
    4 ccc 2000-12-31

    result:

    In [223]: df.sort_values("date").groupby("name", as_index=False).last(1)
    Out[223]:
    name date
    0 aaa 2021-10-10
    1 bbb 2020-02-02
    2 ccc 2000-12-31

    Example of decision for published file:

    import pandas as pd

    df = pd.read_excel("test2.xlsx", parse_dates=["Время договора"])
    res = df.groupby("Наименование инструмента", as_index=False).apply(lambda x: x.nlargest(1, "Время договора"))
    res.to_excel("result.xlsx", index=False)



Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2