<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>x + 3 &#187; Linux</title>
	<atom:link href="http://xplus3.net/tag/linux/feed/" rel="self" type="application/rss+xml" />
	<link>http://xplus3.net</link>
	<description></description>
	<lastBuildDate>Fri, 19 Aug 2011 01:05:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
		<item>
		<title>Find Files by Size</title>
		<link>http://xplus3.net/2009/04/01/find-files-by-size/</link>
		<comments>http://xplus3.net/2009/04/01/find-files-by-size/#comments</comments>
		<pubDate>Wed, 01 Apr 2009 20:28:03 +0000</pubDate>
		<dc:creator>Jonathan Brinley</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[find]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[workflow]]></category>

		<guid isPermaLink="false">http://xplus3.net/?p=202</guid>
		<description><![CDATA[Find all TIFFs in a directory smaller than 90 MB: $ find /dir/to/search -name *.tif -size -90M -exec ls -lh {} \; Get just the size and path and write to a file: $ find /dir/to/search -name *.tif -size -90M -exec ls -lh {} \; &#124; awk '{print $5 , $8}' &#62; output.txt Useful for finding images that might have &#8230; <a href="http://xplus3.net/2009/04/01/find-files-by-size/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Find all TIFFs in a directory smaller than 90 MB:</p>
<pre>$ find /dir/to/search -name *.tif -size -90M -exec ls -lh {} \;</pre>
<p>Get just the size and path and write to a file:</p>
<pre>$ find /dir/to/search -name *.tif -size -90M -exec ls -lh {} \; | awk '{print $5 , $8}' &gt; output.txt</pre>
<p>Useful for finding images that might have been scanned at the wrong resolution/bit depth/etc.</p>
]]></content:encoded>
			<wfw:commentRss>http://xplus3.net/2009/04/01/find-files-by-size/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>OCR with OCRopus and Tesseract</title>
		<link>http://xplus3.net/2009/03/31/ocr-with-ocropus-and-tesseract/</link>
		<comments>http://xplus3.net/2009/03/31/ocr-with-ocropus-and-tesseract/#comments</comments>
		<pubDate>Tue, 31 Mar 2009 18:20:38 +0000</pubDate>
		<dc:creator>Jonathan Brinley</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[hOCR]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[OCR]]></category>
		<category><![CDATA[OCRopus]]></category>
		<category><![CDATA[Tesseract]]></category>

		<guid isPermaLink="false">http://xplus3.net/?p=193</guid>
		<description><![CDATA[While OCRing a batch of images through OmniPage the other day, I was silently cursing my computer. I had about 1,500 pages, and OmniPage was crashing after every second or third image. I&#8217;ve used versions 13-16 of the software, and this problem seems to just get worse with each new release. Fed up, I decided to look for an alternative. &#8230; <a href="http://xplus3.net/2009/03/31/ocr-with-ocropus-and-tesseract/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>While OCRing a batch of images through OmniPage the other day, I was silently cursing my computer. I had about 1,500 pages, and OmniPage was crashing after every second or third image. I&#8217;ve used versions 13-16 of the software, and this problem seems to just get worse with each new release. Fed up, I decided to look for an alternative.</p>
<p>I remembered seeing a few years ago that HP had open-sourced their OCR engine, <a href="http://code.google.com/p/tesseract-ocr/">Tesseract</a>, development of which has now been taken over by Google. Tesseract is supposedly very good at what it does, namely, recognizing characters in images.</p>
<p>Tesseract does not, however, have many essential features found in modern OCR software, including document layout analysis and output formatting. That&#8217;s where <a href="http://sites.google.com/site/ocropus/">OCRopus</a> comes in. I think of it as a wrapper around Tesseract, capable of doing the layout analysis and providing formatted output. In truth, it can do much more than that, and different OCR engines and other components can be plugged into OCRopus, but the preceding simplification works for my purposes.</p>
<h3>Usage</h3>
<p>Use OCRopus with a simple call from the command line:</p>
<pre>$ ocroscript recognize /path/to/file.png &gt; /path/to/output.html</pre>
<p>OCRopus will work its magic on file.png and give you an hOCR file. hOCR uses <code>class</code> and <code>title</code> attributes in an otherwise simple HTML file to embed layout information into the recognized text. I hope soon to create a script to transform the hOCR into a PDF; I&#8217;ll post more when it&#8217;s ready.</p>
<h3>Installation</h3>
<p>The trickiest part of using OCRopus is the installation. There are quite a few dependencies and some inaccurate documentation, so I made a few wrong turns along the way. Fortunately, I remembered to document what I was doing as I went. The instructions below represent the necessary steps to have an operable installation of OCRopus on Linux Mint as of 2009-03-27. For the record, I&#8217;m starting in <code>/var/tmp</code>.<br />
<span id="more-193"></span></p>
<h4>Install Tesseract</h4>
<p>As mentioned above, <a href="http://code.google.com/p/tesseract-ocr/">Tesseract</a> is the OCR engine that powers OCRopus.</p>
<pre>$ svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr-read-only
$ cd tesseract-ocr-read-only
$ ./configure
$ make
$ sudo make install
$ cd ..</pre>
<h4>Install iulib</h4>
<p><a href="http://code.google.com/p/iulib/">iulib</a> provides some basic image processing libraries used by OCRopus.</p>
<pre>$ svn checkout http://iulib.googlecode.com/svn/trunk/ iulib
$ cd iulib
$ sudo apt-get install scons
$ sudo apt-get install libpng12-dev libjpeg62-dev libtiff4-dev libavcodec-dev libavformat-dev libsdl-gfx1.2-dev libsdl-image1.2-dev
$ sudo apt-get install imagemagick
$ scons
$ sudo scons install
$ cd ..</pre>
<h4>Install Leptonica</h4>
<p><a href="http://code.google.com/p/leptonica/">Leptonica</a> provides more image processing and layout analysis capabilities.</p>
<pre>$ wget http://leptonica.googlecode.com/files/leptonlib-1.60.tar.gz
$ tar xvzf leptonlib-1.60.tar.gz
$ cd leptonlib-1.60
$ ./configure
$ make
$ sudo make install
$ cd ..</pre>
<h4>Install OpenFST</h4>
<p><a href="http://www.openfst.org/">OpenFST</a> provides language modeling code to OCRopus. Note that this takes a while (a couple of hours for me) to compile.</p>
<pre>$ wget http://mohri-lt.cs.nyu.edu/twiki/pub/FST/FstDownload/openfst-1.1.tar.gz
$ tar xvzf openfst-1.1.tar.gz
$ cd openfst-1.1
$ ./configure
$ make
$ sudo make install
$ cd ..</pre>
<h4>Install OCRopus</h4>
<p>We now have all our dependencies installed, so it&#8217;s time to install <a href="http://code.google.com/p/ocropus/">OCRopus</a>.</p>
<pre>$ sudo apt-get install libeditline-dev
$ svn checkout http://ocropus.googlecode.com/svn/trunk/ ocropus
$ cd ocropus
<del>$ ./configure
$ make
$ sudo make install</del></pre>
<p><strong>Update (2009-04-01):</strong> OCRopus is still young and has many bugs. One particularly annoying bug, one that is quite easy to fix: the Doctype declaration for the hOCR file was missing some quotes, rendering the XHTML invalid. I&#8217;ve submitted a patch. So, some slightly revised installation instructions, picking up in the <code>ocropus</code> directory:</p>
<pre>$ wget http://xplus3.net/downloads/fix_ocropus_doctype.diff
$ patch -p0 -i fix_ocropus_doctype.diff
$ ./configure
$ make
$ sudo make install</pre>
]]></content:encoded>
			<wfw:commentRss>http://xplus3.net/2009/03/31/ocr-with-ocropus-and-tesseract/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Running Linux in Windows with VirtualBox</title>
		<link>http://xplus3.net/2009/03/09/running-linux-in-windows-with-virtualbox/</link>
		<comments>http://xplus3.net/2009/03/09/running-linux-in-windows-with-virtualbox/#comments</comments>
		<pubDate>Mon, 09 Mar 2009 16:51:17 +0000</pubDate>
		<dc:creator>Jonathan Brinley</dc:creator>
				<category><![CDATA[Etcetera]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[VirtualBox]]></category>

		<guid isPermaLink="false">http://xplus3.net/?p=185</guid>
		<description><![CDATA[I recently decided to halfheartedly stick my toes back into the Linux waters. It&#8217;s been about six or seven years since I last played with it. At the time, I was a music student with an interest in computers, and it ended up being a little over my head. So now, a little wiser (I hope), a little more knowledgeable, &#8230; <a href="http://xplus3.net/2009/03/09/running-linux-in-windows-with-virtualbox/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I recently decided to halfheartedly stick my toes back into the Linux waters. It&#8217;s been about six or seven years since I last played with it. At the time, I was a music student with an interest in computers, and it ended up being a little over my head. So now, a little wiser (I hope), a little more knowledgeable, I wade back in.</p>
<p>Rather than wiping a hard disk, or even dual-booting, I opted to go with a less committed approach. I&#8217;ve set up a virtual computer to run within Windows Vista. I tried Microsoft Virtual PC first, but I never got past the boot stage (on several Linux distributions) before it collapsed into a whimpering heap of self-contradiction. Not dissuaded, I gave Sun&#8217;s <a href="http://www.virtualbox.org/">VirtualBox</a> a try, and it worked admirably.</p>
<p>So I now have <a href="http://www.linuxmint.com/">Linux Mint</a> 6 installed within VirtualBox within Vista. From what I&#8217;ve read about the distribution, it&#8217;s a lot like <a href="http://www.ubuntu.com/">Ubuntu</a>, with a few additional tools to make it easier for the uninitiated to use. I have two monitors, so I&#8217;ve pretty much just dedicated one to displaying Windows and one to displaying Linux.</p>
<p>So far, my experience has been pretty good. For basic basic usage (<em>i.e.</em>, Internet browsing, word processing, etc.), it seems as easy to use as Windows. mintInstall, the software installation program that comes bundled with Linux Mint, makes installation of the thousands of common software packages <em>very</em> easy, and apt-get fills in where mintInstall leaves off. I had <a href="http://www.xchat.org/">XChat</a> running happily withing moments, something I still haven&#8217;t figured out how to do on Vista. I still have a bit of learning to do inside the terminal, but I&#8217;m making progress.</p>
<p>As I run across obstacles, I&#8217;ll try to post them here (with their solutions, I hope) so I can remember how to do things again later.</p>
]]></content:encoded>
			<wfw:commentRss>http://xplus3.net/2009/03/09/running-linux-in-windows-with-virtualbox/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

