{"id":211,"date":"2008-07-27T00:31:55","date_gmt":"2008-07-26T23:31:55","guid":{"rendered":"http:\/\/www.jurecuhalev.com\/blog\/?p=211"},"modified":"2008-07-27T00:34:32","modified_gmt":"2008-07-26T23:34:32","slug":"visualizing-books-using-zemanta-and-wordle","status":"publish","type":"post","link":"https:\/\/www.jurecuhalev.com\/blog\/visualizing-books-using-zemanta-and-wordle\/","title":{"rendered":"Visualizing books using Zemanta and Wordle"},"content":{"rendered":"<p>Most of my readers are by now probably already aware of <a href=\"http:\/\/wordle.net\/\">Wordle<\/a>, Java applet, that allows neat visualization tags. Given that <a class=\"zem_slink\" title=\"Zemanta ltd.\" rel=\"homepage\" href=\"http:\/\/www.zemanta.com\">Zemanta<\/a> released <a href=\"http:\/\/developer.zemanta.com\">early alpha API preview<\/a> recently, I was looking for a fun project to showcase some of it.<\/p>\n<p>So for this experiment I&#8217;m going to try to visualize some of the popular classic books, using text files of <a class=\"zem_slink\" title=\"Project Gutenberg\" rel=\"homepage\" href=\"http:\/\/www.gutenberg.org\/wiki\/Main_Page\">Gutenberg project.<\/a> Technical details at the bottom of the post.<\/p>\n<p><strong>Jane Austen &#8211; Pride and Prejudice<\/strong><\/p>\n<figure id=\"attachment_214\" aria-describedby=\"caption-attachment-214\" style=\"width: 300px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/pride-worlde-combined.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-214\" title=\"Pride and Prejudice as through words and tags\" src=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/pride-worlde-combined-300x115.png\" alt=\"Pride and Prejudice as through words and tags\" width=\"300\" height=\"115\" srcset=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/pride-worlde-combined-300x115.png 300w, https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/pride-worlde-combined.png 856w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-214\" class=\"wp-caption-text\">Pride and Prejudice as through words and tags<\/figcaption><\/figure>\n<figure id=\"attachment_212\" aria-describedby=\"caption-attachment-212\" style=\"width: 300px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/pride-worlde-tags.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-212\" title=\"Pride and Prejudice through words\" src=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/pride-worlde-tags-300x201.png\" alt=\"\" width=\"300\" height=\"201\" srcset=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/pride-worlde-tags-300x201.png 300w, https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/pride-worlde-tags.png 856w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-212\" class=\"wp-caption-text\">Pride and Prejudice through words<\/figcaption><\/figure>\n<figure id=\"attachment_213\" aria-describedby=\"caption-attachment-213\" style=\"width: 300px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/pride-worlde-links.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-213\" title=\"Pride and Prejudice as tags\" src=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/pride-worlde-links-300x204.png\" alt=\"Pride and Prejudice through words\" width=\"300\" height=\"204\" srcset=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/pride-worlde-links-300x204.png 300w, https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/pride-worlde-links.png 856w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-213\" class=\"wp-caption-text\">Pride and Prejudice as tags<\/figcaption><\/figure>\n<p><strong>Herman Melville &#8211; Moby-Dick<\/strong><\/p>\n<figure id=\"attachment_216\" aria-describedby=\"caption-attachment-216\" style=\"width: 262px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/moby-wordle-words.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-216\" title=\"Moby Dick through words\" src=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/moby-wordle-words-262x300.png\" alt=\"Moby Dick through words\" width=\"262\" height=\"300\" srcset=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/moby-wordle-words-262x300.png 262w, https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/moby-wordle-words.png 581w\" sizes=\"auto, (max-width: 262px) 100vw, 262px\" \/><\/a><figcaption id=\"caption-attachment-216\" class=\"wp-caption-text\">Moby Dick through words<\/figcaption><\/figure>\n<figure id=\"attachment_217\" aria-describedby=\"caption-attachment-217\" style=\"width: 300px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/moby-worlde-tags.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-217\" title=\"Moby Dick through tags\" src=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/moby-worlde-tags-300x124.png\" alt=\"Moby Dick through tags\" width=\"300\" height=\"124\" srcset=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/moby-worlde-tags-300x124.png 300w, https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/moby-worlde-tags.png 833w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-217\" class=\"wp-caption-text\">Moby Dick through tags<\/figcaption><\/figure>\n<p><strong>George Orwell &#8211; 1984<\/strong><\/p>\n<figure id=\"attachment_218\" aria-describedby=\"caption-attachment-218\" style=\"width: 255px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/1984-worlde-words.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-218\" title=\"1984 through words\" src=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/1984-worlde-words-255x300.png\" alt=\"1984 through words\" width=\"255\" height=\"300\" srcset=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/1984-worlde-words-255x300.png 255w, https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/1984-worlde-words.png 569w\" sizes=\"auto, (max-width: 255px) 100vw, 255px\" \/><\/a><figcaption id=\"caption-attachment-218\" class=\"wp-caption-text\">1984 through words<\/figcaption><\/figure>\n<figure id=\"attachment_219\" aria-describedby=\"caption-attachment-219\" style=\"width: 300px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/1984-worlde-tags.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-219\" title=\"1984 through tags\" src=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/1984-worlde-tags-300x196.png\" alt=\"1984 through tags\" width=\"300\" height=\"196\" srcset=\"https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/1984-worlde-tags-300x196.png 300w, https:\/\/www.jurecuhalev.com\/blog\/wp-content\/uploads\/2008\/07\/1984-worlde-tags.png 833w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-219\" class=\"wp-caption-text\">1984 through tags<\/figcaption><\/figure>\n<p><strong>Technical details<\/strong><\/p>\n<p>The whole process is done using a <a href=\"https:\/\/www.jurecuhalev.com\/blog-files\/zemantatags.pys\">simple python script<\/a>. The script reads in the text file, breaks it into chunks of 360 words, as is roughly one A5 page and then sends it to Zemanta API. It repeats this process for first 30 thousand words of the book. The limit is arbitrary, I just didn&#8217;t want to run the script for too long.<\/p>\n<p>Afterward, the text was manually pasted into Wordle and I played with random function and details until I started to like the image. You can also take a look at <a href=\"http:\/\/wordle.net\/gallery?username=Gandalfar\">my full Wordle gallery<\/a>.<\/p>\n<p><strong>Lessons learned<\/strong><\/p>\n<p>I&#8217;m especially happy how 1984 turned out. For this kind of visualization it&#8217;s important to choose source carefuly, so you can get more powerful results that way. I&#8217;ll probably continue experimenting with this on 1984 text.<\/p>\n<div class=\"zemanta-pixie\" style=\"margin-top: 10px; height: 15px;\"><a class=\"zemanta-pixie-a\" title=\"Zemified by Zemanta\" href=\"http:\/\/reblog.zemanta.com\/zemified\/774fcca6-5dc2-4262-8246-9c5a813d652a\/\"><img decoding=\"async\" class=\"zemanta-pixie-img\" style=\"border: medium none; float: right;\" src=\"http:\/\/img.zemanta.com\/reblog_c.png?x-id=774fcca6-5dc2-4262-8246-9c5a813d652a\" alt=\"Zemanta Pixie\" \/><\/a><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Most of my readers are by now probably already aware of Wordle, Java applet, that allows neat visualization tags. Given that Zemanta released early alpha API preview recently, I was looking for a fun project to showcase some of it. So for this experiment I&#8217;m going to try to visualize some of the popular classic [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[10,16,14,17],"tags":[357,360,356,355,354,352,353,932,940,358,934],"class_list":["post-211","post","type-post","status-publish","format-standard","hentry","category-happy","category-ideas","category-tech","category-zemanta","tag-357","tag-almost-mashup","tag-george-orwell","tag-herman-melville","tag-jane-austen","tag-moby-dick","tag-prejudice-pride","tag-python","tag-visualization","tag-wordle","tag-zemanta"],"acf":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.jurecuhalev.com\/blog\/wp-json\/wp\/v2\/posts\/211","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.jurecuhalev.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.jurecuhalev.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.jurecuhalev.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.jurecuhalev.com\/blog\/wp-json\/wp\/v2\/comments?post=211"}],"version-history":[{"count":3,"href":"https:\/\/www.jurecuhalev.com\/blog\/wp-json\/wp\/v2\/posts\/211\/revisions"}],"predecessor-version":[{"id":221,"href":"https:\/\/www.jurecuhalev.com\/blog\/wp-json\/wp\/v2\/posts\/211\/revisions\/221"}],"wp:attachment":[{"href":"https:\/\/www.jurecuhalev.com\/blog\/wp-json\/wp\/v2\/media?parent=211"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.jurecuhalev.com\/blog\/wp-json\/wp\/v2\/categories?post=211"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.jurecuhalev.com\/blog\/wp-json\/wp\/v2\/tags?post=211"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}