Find biggest files in AWS EFS?
-
I have an EFS with a crazily high number of files, many being small. The directory tree is large as well.
Listing directories can take an immensely long time.
The EFS clearly has some very large files that need to be pruned based on the size (I can say this from app architecture/file size estimations for its main use case), but I can't locate the big files given the directory listing time.
I could enable intelligent tiering to save money, but I'd like to identify and fix the root issue. Are there any options to speed up a search?
It would be nice if I could go to every directory and just get the size of the first 5 files to see if any specific directories have large files in general, but any listing seems to be exhaustive and time consuming.
-
The only options I think you have are time consuming
find
anddu
commands.You can use the
du
command to look at all files, sort the output and give you the top 5. To do this for individual directories you can update the path.du -sk /path/to/efs/* |sort -nr |head -n 5
With
find
you can search for files greater than a certain size and dump the list to a file. You can probably parse it down to the top 5 with a little bit of piping.find /path/to/efs -type f -size +200M >> /file_to_store_output.txt &
The size units for your usage will probably be
M
egabytes orG
igabytesI'd recommend running these as background processes and writing the output to a file.
References
https://man7.org/linux/man-pages/man1/find.1.html
https://linuxcommand.org/lc3_man_pages/du1.html