Backward Progress

Today I found myself needing to use the comm utility to compare two 20GB files. This ended up taking about 10 minutes and while it was running I got curious at how much time was left. Knowing that comm must read through both files before it finishes, I decided to see if I could build a simple progress indicator based on the read offset of one of it’s file descriptors.

I jumped straight into the /proc filesystem and discovered the fdinfo directory. This directory contains a file for each of the process’ file descriptors indicating the offset, file mode and mount ID. With that information I threw together a small, somewhat dense, progress script:

#/bin/bash
PID=$(pidof comm)
read FD SZ <<< $(awk '$4~/r$/ {print $4 " " $7; exit}' <(lsof -p $PID))
echo "scale=2; ($(awk '/pos/ {print $2}' /proc/$PID/fdinfo/${FD%r}) / $SIZE) * 100" | bc -l

This script reads the first file descriptor ID and size of the underlying file from lsof output, then uses the bc language to calculate the percentage of bytes already read. Add a little watch magic and you can get realtime updates every second:

watch -n1 ./progress.sh

This post originally appeared on Medium.

#engineering