{"id":1057,"date":"2013-08-07T13:32:16","date_gmt":"2013-08-07T12:32:16","guid":{"rendered":"http:\/\/www.phillips321.co.uk\/?p=1057"},"modified":"2013-08-07T13:56:51","modified_gmt":"2013-08-07T12:56:51","slug":"python-character-frequency-analysis","status":"publish","type":"post","link":"https:\/\/www.phillips321.co.uk\/2013\/08\/07\/python-character-frequency-analysis\/","title":{"rendered":"python Character Frequency Analysis"},"content":{"rendered":"<p>So I wanted to identify quickly <a href=\"http:\/\/en.wikipedia.org\/wiki\/Letter_frequency\" target=\"_blank\">character frequency<\/a> in a text file and quickly throw this out as a bar chart.<\/p>\n<p>As I enjoy python it made sense to code it in python. The bar chart uses the <a href=\"http:\/\/matplotlib.org\/api\/pyplot_api.html\" target=\"_blank\">pyplot bits from matplotlib<\/a>.<\/p>\n<p>It was also important to <a href=\"http:\/\/docs.python.org\/2\/library\/collections.html\" target=\"_blank\">import collections<\/a> because dictionaries are unordered and the bar chart would not display in alphabetical order.<\/p>\n<div class=\"codecolorer-container text vibrant\" style=\"overflow:auto;white-space:nowrap;width:100%;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/>3<br \/>4<br \/>5<br \/><\/div><\/td><td><div class=\"text codecolorer\">C:\\Users\\phillipsme\\Desktop\\python&gt;python.exe charfreqency.py cipher.txt<br \/>\nOrderedDict([('a', 135), ('b', 16), ('c', 60), ('d', 45), ('e', 174), ('f', 37), ('g', 21),<br \/>\n('h', 32), ('i', 122), ('j',0), ('k', 15), ('l', 61), ('m', 30), ('n', 125), ('o', 110),<br \/>\n('p', 42), ('q', 2), ('r', 103), ('s', 116), ('t',168), ('u', 54), ('v', 21), ('w', 16),<br \/>\n('x', 4), ('y', 27), ('z', 1)])<\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p>There is a small function being used &#8216;charanal()&#8217; which returns an ordered dictionary with the frequency of each letter.<\/p>\n<div class=\"codecolorer-container python vibrant\" style=\"overflow:auto;white-space:nowrap;width:100%;height:300px;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/>3<br \/>4<br \/>5<br \/>6<br \/>7<br \/>8<br \/>9<br \/>10<br \/>11<br \/>12<br \/>13<br \/>14<br \/>15<br \/>16<br \/>17<br \/>18<br \/>19<br \/>20<br \/>21<br \/>22<br \/>23<br \/>24<br \/><\/div><\/td><td><div class=\"python codecolorer\"><span class=\"co1\">#\/usr\/bin\/env python<\/span><br \/>\n<span class=\"kw1\">import<\/span> <span class=\"kw3\">sys<\/span><span class=\"sy0\">,<\/span><span class=\"kw3\">collections<\/span><br \/>\n<span class=\"kw1\">from<\/span> matplotlib <span class=\"kw1\">import<\/span> pyplot<br \/>\n<span class=\"kw1\">try<\/span>:<br \/>\n&nbsp; &nbsp; filename<span class=\"sy0\">=<\/span><span class=\"kw3\">sys<\/span>.<span class=\"me1\">argv<\/span><span class=\"br0\">&#91;<\/span><span class=\"nu0\">1<\/span><span class=\"br0\">&#93;<\/span><br \/>\n&nbsp; &nbsp; rawstring<span class=\"sy0\">=<\/span><span class=\"kw2\">open<\/span><span class=\"br0\">&#40;<\/span>filename<span class=\"sy0\">,<\/span> <span class=\"st0\">'r'<\/span><span class=\"br0\">&#41;<\/span>.<span class=\"me1\">read<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><br \/>\n&nbsp; &nbsp; filteredstring<span class=\"sy0\">=<\/span>rawstring.<span class=\"me1\">lower<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span>.<span class=\"me1\">replace<\/span><span class=\"br0\">&#40;<\/span><span class=\"st0\">'<span class=\"es0\">\\n<\/span>'<\/span><span class=\"sy0\">,<\/span><span class=\"st0\">''<\/span><span class=\"br0\">&#41;<\/span><br \/>\n<span class=\"kw1\">except<\/span>: <span class=\"kw1\">print<\/span> <span class=\"st0\">&quot;Usage: %s filename.txt&quot;<\/span> % <span class=\"kw3\">sys<\/span>.<span class=\"me1\">argv<\/span><span class=\"br0\">&#91;<\/span><span class=\"nu0\">0<\/span><span class=\"br0\">&#93;<\/span> <span class=\"sy0\">;<\/span> <span class=\"kw3\">sys<\/span>.<span class=\"me1\">exit<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><br \/>\n<br \/>\n<span class=\"kw1\">def<\/span> charanal<span class=\"br0\">&#40;<\/span><span class=\"kw3\">string<\/span><span class=\"br0\">&#41;<\/span>:<br \/>\n&nbsp; &nbsp; specials<span class=\"sy0\">=<\/span><span class=\"st0\">&quot;&quot;<\/span><br \/>\n&nbsp; &nbsp; <span class=\"kw1\">for<\/span> bad <span class=\"kw1\">in<\/span> <span class=\"kw2\">range<\/span><span class=\"br0\">&#40;<\/span><span class=\"nu0\">256<\/span><span class=\"br0\">&#41;<\/span>: <span class=\"kw1\">if<\/span> bad<span class=\"sy0\">&lt;<\/span><span class=\"nu0\">97<\/span> <span class=\"kw1\">or<\/span> bad<span class=\"sy0\">&gt;<\/span><span class=\"nu0\">122<\/span>: specials+<span class=\"sy0\">=<\/span><span class=\"kw2\">chr<\/span><span class=\"br0\">&#40;<\/span>bad<span class=\"br0\">&#41;<\/span><br \/>\n&nbsp; &nbsp; <span class=\"kw1\">for<\/span> char <span class=\"kw1\">in<\/span> specials: <span class=\"kw3\">string<\/span><span class=\"sy0\">=<\/span><span class=\"kw3\">string<\/span>.<span class=\"me1\">replace<\/span><span class=\"br0\">&#40;<\/span>char<span class=\"sy0\">,<\/span><span class=\"st0\">''<\/span><span class=\"br0\">&#41;<\/span><br \/>\n&nbsp; &nbsp; results<span class=\"sy0\">=<\/span><span class=\"br0\">&#123;<\/span><span class=\"br0\">&#125;<\/span><br \/>\n&nbsp; &nbsp; <span class=\"kw1\">for<\/span> letter <span class=\"kw1\">in<\/span> <span class=\"st0\">'abcdefghijklmnopqrstuvwxyz'<\/span>: results<span class=\"br0\">&#91;<\/span>letter<span class=\"br0\">&#93;<\/span><span class=\"sy0\">=<\/span><span class=\"nu0\">0<\/span><br \/>\n&nbsp; &nbsp; <span class=\"kw1\">for<\/span> char <span class=\"kw1\">in<\/span> <span class=\"kw3\">string<\/span>: results<span class=\"br0\">&#91;<\/span>char<span class=\"br0\">&#93;<\/span>+<span class=\"sy0\">=<\/span><span class=\"nu0\">1<\/span><br \/>\n&nbsp; &nbsp; <span class=\"kw1\">return<\/span> <span class=\"kw3\">collections<\/span>.<span class=\"me1\">OrderedDict<\/span><span class=\"br0\">&#40;<\/span><span class=\"kw2\">sorted<\/span><span class=\"br0\">&#40;<\/span>results.<span class=\"me1\">items<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><span class=\"br0\">&#41;<\/span><span class=\"br0\">&#41;<\/span><br \/>\n<br \/>\norderedfrequency<span class=\"sy0\">=<\/span>charanal<span class=\"br0\">&#40;<\/span>filteredstring<span class=\"br0\">&#41;<\/span><br \/>\n<span class=\"kw1\">print<\/span> orderedfrequency<br \/>\n<br \/>\npyplot.<span class=\"me1\">bar<\/span><span class=\"br0\">&#40;<\/span><span class=\"kw2\">range<\/span><span class=\"br0\">&#40;<\/span><span class=\"kw2\">len<\/span><span class=\"br0\">&#40;<\/span>orderedfrequency<span class=\"br0\">&#41;<\/span><span class=\"br0\">&#41;<\/span><span class=\"sy0\">,<\/span> orderedfrequency.<span class=\"me1\">values<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><span class=\"br0\">&#41;<\/span><br \/>\npyplot.<span class=\"me1\">xticks<\/span><span class=\"br0\">&#40;<\/span><span class=\"kw2\">range<\/span><span class=\"br0\">&#40;<\/span><span class=\"kw2\">len<\/span><span class=\"br0\">&#40;<\/span>orderedfrequency<span class=\"br0\">&#41;<\/span><span class=\"br0\">&#41;<\/span><span class=\"sy0\">,<\/span> orderedfrequency.<span class=\"me1\">keys<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><span class=\"sy0\">,<\/span>ha<span class=\"sy0\">=<\/span><span class=\"st0\">'left'<\/span><span class=\"br0\">&#41;<\/span><br \/>\npyplot.<span class=\"me1\">show<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><a href=\"https:\/\/www.phillips321.co.uk\/wp-content\/uploads\/2013\/08\/barchart.png\"><img loading=\"lazy\" src=\"https:\/\/www.phillips321.co.uk\/wp-content\/uploads\/2013\/08\/barchart-300x226.png\" alt=\"barchart\" width=\"300\" height=\"226\" class=\"aligncenter size-medium wp-image-1059\" srcset=\"https:\/\/www.phillips321.co.uk\/wp-content\/uploads\/2013\/08\/barchart-300x226.png 300w, https:\/\/www.phillips321.co.uk\/wp-content\/uploads\/2013\/08\/barchart.png 815w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>So I wanted to identify quickly character frequency in a text file and quickly throw this out as a bar chart. As I enjoy python it made sense to code it in python. The bar chart uses the pyplot bits from matplotlib. It was also important to import collections because dictionaries are unordered and the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1059,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[385,290,387,36,386,384,111],"_links":{"self":[{"href":"https:\/\/www.phillips321.co.uk\/wp-json\/wp\/v2\/posts\/1057"}],"collection":[{"href":"https:\/\/www.phillips321.co.uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.phillips321.co.uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.phillips321.co.uk\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.phillips321.co.uk\/wp-json\/wp\/v2\/comments?post=1057"}],"version-history":[{"count":12,"href":"https:\/\/www.phillips321.co.uk\/wp-json\/wp\/v2\/posts\/1057\/revisions"}],"predecessor-version":[{"id":1070,"href":"https:\/\/www.phillips321.co.uk\/wp-json\/wp\/v2\/posts\/1057\/revisions\/1070"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.phillips321.co.uk\/wp-json\/wp\/v2\/media\/1059"}],"wp:attachment":[{"href":"https:\/\/www.phillips321.co.uk\/wp-json\/wp\/v2\/media?parent=1057"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.phillips321.co.uk\/wp-json\/wp\/v2\/categories?post=1057"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.phillips321.co.uk\/wp-json\/wp\/v2\/tags?post=1057"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}