The Wikimedia data dump files are released in compressed forms: <ahref="http://en.wikipedia.org/bzip2"rel="nofollow"class="external text">bzip2</a> or <ahref="http://en.wikipedia.org/gzip"rel="nofollow"class="external text">gzip</a>. Prior to v0.5.2, XOWA required that the files be uncompressed in order to read them. v0.5.2 allows the user the option to either read directly from the compressed or uncompressed file.
</p>
<divid="toc"class="toc">
<divid="toctitle">
<h2>
Contents
</h2>
</div>
<ul>
<liclass="toclevel-1 tocsection-1">
<ahref="#bzip2:_disk_space_vs_speed"><spanclass="tocnumber">1</span><spanclass="toctext">bzip2: disk space vs speed</span></a>
<spanclass="mw-headline"id="bzip2:_disk_space_vs_speed">bzip2: disk space vs speed</span>
</h2>
<p>
Currently, reading from a bzip2 file is much slower than unzipping and reading from the xml file.<supid="cite_ref-0"class="reference"><ahref="#cite_note-0">[1]</a></sup>
</p>
<p>
For example, using a 10 GB English Wikipedia dump file:
</p>
<ul>
<li>
<b>unzip</b> takes 120 minutes and +40 GB extra disk space. This process includes unzipping to .xml with 7-zip (40 min: 40 GB) and then importing the wiki (80 min)
</li>
<li>
<b>bzip2</b> takes 330 minutes and + 0 GB extra disk space. This process includes reading directly from the .bz2 file (250 min: 0 GB) and importing the wiki (80 min)
</li>
</ul>
<p>
If you have the extra disk space, you will want to use the <b>unzip</b> route. If you are low on disk space, then you can use the <b>bzip2</b> route instead
<spanclass="mw-cite-backlink"><ahref="#cite_ref-0">^</a></span><spanclass="reference-text">This seems to be a result of Java's lack of support for an unsigned byte data-type, as well as other performance gains from a native C++/C application. (7-zip on Windows; bzip2 on Linux)</span>
<li><ahref="http://dumps.wikimedia.org/backup-index.html"title="Get wiki datababase dumps directly from Wikimedia">Wikimedia dumps</a></li>
<li><ahref="https://archive.org/search.php?query=xowa"title="Search archive.org for XOWA files">XOWA @ archive.org</a></li>
<li><ahref="http://en.wikipedia.org"title="Visit Wikipedia (and compare to XOWA!)">English Wikipedia</a></li>
</ul>
</div>
</div>
<divclass="portal"id='xowa-portal-donate'>
<h3>Donate</h3>
<divclass="body">
<ul>
<li><ahref="https://archive.org/donate/index.php"title="Support archive.org!">archive.org</a></li><!-- listed first due to recent fire damages: http://blog.archive.org/2013/11/06/scanning-center-fire-please-help-rebuild/ -->