So I wanted to identify quickly character frequency in a text file and quickly throw this out as a bar chart.
As I enjoy python it made sense to code it in python. The bar chart uses the pyplot bits from matplotlib.
It was also important to import collections because dictionaries are unordered and the bar chart would not display in alphabetical order.
1 2 3 4 5 | C:\Users\phillipsme\Desktop\python>python.exe charfreqency.py cipher.txt OrderedDict([('a', 135), ('b', 16), ('c', 60), ('d', 45), ('e', 174), ('f', 37), ('g', 21), ('h', 32), ('i', 122), ('j',0), ('k', 15), ('l', 61), ('m', 30), ('n', 125), ('o', 110), ('p', 42), ('q', 2), ('r', 103), ('s', 116), ('t',168), ('u', 54), ('v', 21), ('w', 16), ('x', 4), ('y', 27), ('z', 1)]) |
There is a small function being used ‘charanal()’ which returns an ordered dictionary with the frequency of each letter.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | #/usr/bin/env python import sys,collections from matplotlib import pyplot try: filename=sys.argv[1] rawstring=open(filename, 'r').read() filteredstring=rawstring.lower().replace('\n','') except: print "Usage: %s filename.txt" % sys.argv[0] ; sys.exit() def charanal(string): specials="" for bad in range(256): if bad<97 or bad>122: specials+=chr(bad) for char in specials: string=string.replace(char,'') results={} for letter in 'abcdefghijklmnopqrstuvwxyz': results[letter]=0 for char in string: results[char]+=1 return collections.OrderedDict(sorted(results.items())) orderedfrequency=charanal(filteredstring) print orderedfrequency pyplot.bar(range(len(orderedfrequency)), orderedfrequency.values()) pyplot.xticks(range(len(orderedfrequency)), orderedfrequency.keys(),ha='left') pyplot.show() |
Leave a Reply
You must be logged in to post a comment.