counting duplicates in a sorted sequence using command line tools


Question

I have a command (cmd1) that greps through a log file to filter out a set of numbers. The numbers are in random order, so I use sort -gr to get a reverse sorted list of numbers. There may be duplicates within this sorted list. I need to find the count for each unique number in that list.

For e.g. if the output of cmd1 is:

100 
100 
100 
99 
99 
26 
25 
24 
24

I need another command that I can pipe the above output to, so that, I get:

100     3
99      2
26      1
25      1
24      2
1
69
5/16/2016 1:46:57 AM

Accepted Answer

how about;

$ echo "100 100 100 99 99 26 25 24 24" \
    | tr " " "\n" \
    | sort \
    | uniq -c \
    | sort -k2nr \
    | awk '{printf("%s\t%s\n",$2,$1)}END{print}'

The result is :

100 3
99  2
26  1
25  1
24  2
84
1/20/2019 11:02:43 AM

uniq -c works for GNU uniq 8.23 at least, and does exactly what you want (assuming sorted input).


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon