Find biggest files in AWS EFS?



  • I have an EFS with a crazily high number of files, many being small. The directory tree is large as well.

    Listing directories can take an immensely long time.

    The EFS clearly has some very large files that need to be pruned based on the size (I can say this from app architecture/file size estimations for its main use case), but I can't locate the big files given the directory listing time.

    I could enable intelligent tiering to save money, but I'd like to identify and fix the root issue. Are there any options to speed up a search?

    It would be nice if I could go to every directory and just get the size of the first 5 files to see if any specific directories have large files in general, but any listing seems to be exhaustive and time consuming.



  • The only options I think you have are time consuming find and du commands.

    You can use the du command to look at all files, sort the output and give you the top 5. To do this for individual directories you can update the path.

    du -sk /path/to/efs/* |sort -nr |head -n 5
    

    With find you can search for files greater than a certain size and dump the list to a file. You can probably parse it down to the top 5 with a little bit of piping.

    find /path/to/efs -type f -size +200M >> /file_to_store_output.txt &
    

    The size units for your usage will probably be Megabytes or Gigabytes

    I'd recommend running these as background processes and writing the output to a file.

    References
    https://man7.org/linux/man-pages/man1/find.1.html
    https://linuxcommand.org/lc3_man_pages/du1.html


Log in to reply
 

Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2