1
0
mirror of https://github.com/gnosygnu/xowa.git synced 2024-10-27 20:34:16 +00:00
gnosygnu_xowa/home/wiki/Blog/2016-05.html
2017-03-19 22:18:45 -04:00

429 lines
23 KiB
HTML

<!DOCTYPE html>
<html dir="ltr">
<head>
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
<title>Blog/2016-05 - XOWA</title>
<link rel="shortcut icon" href="https://gnosygnu.github.io/xowa/xowa_logo.png" />
<link rel="stylesheet" href="https://gnosygnu.github.io/xowa/xowa_common.css" type="text/css">
</head>
<body class="mediawiki ltr sitedir-ltr ns-0 ns-subject skin-vector action-submit vector-animateLayout" spellcheck="false">
<div id="mw-page-base" class="noprint"></div>
<div id="mw-head-base" class="noprint"></div>
<div id="content" class="mw-body">
<h1 id="firstHeading" class="firstHeading"><span>Blog/2016-05</span></h1>
<div id="bodyContent" class="mw-body-content">
<div id="siteSub">From XOWA: the free, open-source, offline wiki application</div>
<div id="contentSub"></div>
<div id="mw-content-text" lang="en" dir="ltr" class="mw-content-ltr">
<div class='infobox plainlinks' style='width:70px;'>
<p>
<span style='font-size:15px;font-weight:bold;'>Blog links</span><br>
<br>
2017<br>
</p>
<ul>
<li>
<a href="http://xowa.org/home/wiki/Blog/2017-03.html" id="xolnki_2" title="Blog/2017-03" class="xowa-visited">2017-03</a>
</li>
<li>
<a href="http://xowa.org/home/wiki/Blog/2017-02.html" id="xolnki_3" title="Blog/2017-02">2017-02</a>
</li>
<li>
<a href="http://xowa.org/home/wiki/Blog/2017-01.html" id="xolnki_4" title="Blog/2017-01">2017-01</a>
</li>
</ul>
<p>
2016<br>
</p>
<ul>
<li>
<a href="http://xowa.org/home/wiki/Blog/2016-12.html" id="xolnki_5" title="Blog/2016-12">2016-12</a>
</li>
<li>
<a href="http://xowa.org/home/wiki/Blog/2016-11.html" id="xolnki_6" title="Blog/2016-11">2016-11</a>
</li>
<li>
<a href="http://xowa.org/home/wiki/Blog/2016-10.html" id="xolnki_7" title="Blog/2016-10">2016-10</a>
</li>
<li>
<a href="http://xowa.org/home/wiki/Blog/2016-09.html" id="xolnki_8" title="Blog/2016-09">2016-09</a>
</li>
<li>
<a href="http://xowa.org/home/wiki/Blog/2016-08.html" id="xolnki_9" title="Blog/2016-08">2016-08</a>
</li>
<li>
<a href="http://xowa.org/home/wiki/Blog/2016-07.html" id="xolnki_10" title="Blog/2016-07">2016-07</a>
</li>
<li>
<a href="http://xowa.org/home/wiki/Blog/2016-06.html" id="xolnki_11" title="Blog/2016-06">2016-06</a>
</li>
<li>
<b>2016-05</b>
</li>
<li>
<a href="http://xowa.org/home/wiki/Blog/2016-04.html" id="xolnki_13" title="Blog/2016-04">2016-04</a>
</li>
</ul>
<p>
<a href="http://xowa.org/home/wiki/Blog/Archives.html" id="xolnki_14" title="Blog/Archives">Archives</a>
</p>
</div>
<div id="toc" class="toc">
<div id="toctitle">
<h2>
Contents
</h2>
</div>
<ul>
<li class="toclevel-1 tocsection-1">
<a href="#Dev:_Handling_the_1_MB_limit_for_SQLite_on_Android_.282016-05-30_21:15_Mon.29"><span class="tocnumber">1</span> <span class="toctext">Dev: Handling the 1 MB limit for SQLite on Android (2016-05-30 21:15 Mon)</span></a>
</li>
<li class="toclevel-1 tocsection-2">
<a href="#Release:_NONE_.282016-05-29_20:00_Sun.29"><span class="tocnumber">2</span> <span class="toctext">Release: NONE (2016-05-29 20:00 Sun)</span></a>
</li>
<li class="toclevel-1 tocsection-3">
<a href="#Release:_NONE_.282016-05-24_11:00_Tue.29"><span class="tocnumber">3</span> <span class="toctext">Release: NONE (2016-05-24 11:00 Tue)</span></a>
</li>
<li class="toclevel-1 tocsection-4">
<a href="#Release:_NONE_.282016-05-22_20:00_Sun.29"><span class="tocnumber">4</span> <span class="toctext">Release: NONE (2016-05-22 20:00 Sun)</span></a>
</li>
<li class="toclevel-1 tocsection-5">
<a href="#Release:_NONE_.282016-05-15_19:45_Sun.29"><span class="tocnumber">5</span> <span class="toctext">Release: NONE (2016-05-15 19:45 Sun)</span></a>
</li>
<li class="toclevel-1 tocsection-6">
<a href="#Release:_NONE_.282016-05-08_19:45_Sun.29"><span class="tocnumber">6</span> <span class="toctext">Release: NONE (2016-05-08 19:45 Sun)</span></a>
</li>
<li class="toclevel-1 tocsection-7">
<a href="#Release:_v3.5.1.1_.282016-05-01_20:25_Sun.29"><span class="tocnumber">7</span> <span class="toctext">Release: v3.5.1.1 (2016-05-01 20:25 Sun)</span></a>
<ul>
<li class="toclevel-2 tocsection-8">
<a href="#.28Desktop.29_Minor_parser_fixes_for_English_Wiktionary"><span class="tocnumber">7.1</span> <span class="toctext">(Desktop) Minor parser fixes for English Wiktionary</span></a>
</li>
<li class="toclevel-2 tocsection-9">
<a href="#Next_release:_v3.5.2"><span class="tocnumber">7.2</span> <span class="toctext">Next release: v3.5.2</span></a>
</li>
</ul>
</li>
</ul>
</div>
<h2>
<span class="mw-headline" id="Dev:_Handling_the_1_MB_limit_for_SQLite_on_Android_.282016-05-30_21:15_Mon.29">Dev: Handling the 1 MB limit for SQLite on Android (2016-05-30 21:15 Mon)</span>
</h2>
<p>
I rediscovered an unintuitive bug today, and decided to take some time to document it.
</p>
<p>
XOWA stores all its data in SQLite databases, including images and html. Image sizes can range from a few KB to hundreds of MB. Note that storing these large images in the database works with SQLite on all three major desktop platforms: Linux, Windows, and Mac OS X.
</p>
<p>
However, SQLite on Android has a 1 MB limit for row data in a SELECT. If a row has more than 1 MB, then reading from the cursor will fail with an error like <code>IllegalStateException: Couldn't read row 0, col 0 from CursorWindow. Make sure the Cursor is initialize before accessing data from it.</code>
</p>
<p>
For example, consider the following setup:
</p>
<ul>
<li>
You have a database with a table defined like this:
</li>
</ul>
<div class="mw-highlight">
<pre style="overflow:auto">
CREATE TABLE blob_table
( blob_id INTEGER
, blob_column BLOB
)
</pre>
</div>
<ul>
<li>
You insert two rows: the first with 2 bytes of data and the second with 2 MB of data.
</li>
</ul>
<div class="mw-highlight">
<pre style="overflow:auto">
INSERT INTO blob_table (blob_id, blob_column) VALUES (0, '01');
INSERT INTO blob_table (blob_id, blob_column) VALUES (1, '0123456789...'); -- ... represents remaining 2 MB of data
</pre>
</div>
<ul>
<li>
You then run code to retrieve both rows
</li>
</ul>
<div class="mw-highlight">
<pre style="overflow:auto">
SQLiteDatabase db = SQLiteDatabase.openDatabase("/Android/data/org.xowa/files/blob_database.sqlite", null, SQLiteDatabase.OPEN_READONLY | SQLiteDatabase.NO_LOCALIZED_COLLATORS);
Cursor cursor = db.query("blob_table", new String[] {"blob_id", "blob_column"}, null, null, null, null, null, null);
cursor.moveToFirst();
// this line succeeds as blob_id + blob_column has a length of 6 bytes
int row_0_id = cursor.getInt(0);
cursor.moveToNext();
// this line fails b/c blob_id + blob_column has a total length &gt; 1 MB; note that it fails, even though just the INTEGER is being requested, not the BLOB.
int row_1_id = cursor.getInt(0);
</pre>
</div>
<p>
I believe this is because the transaction buffer is limited to 1 MB.<sup id="cite_ref-blob__android_docs_0-0" class="reference"><a href="#cite_note-blob__android_docs-0">[1]</a></sup> The official recommendation is not to store 1 MB BLOBs in the database. Instead, the BLOB should be stored on the filesystem, and only the BLOB's url should be stored in the database. This appears to be the recommendation of SQLite<sup id="cite_ref-blob__sqlite_guidelines_1-0" class="reference"><a href="#cite_note-blob__sqlite_guidelines-1">[2]</a></sup> as well as the general sentiment of most commenters on stackoverflow<sup id="cite_ref-blob__stackoverflow_sentiment_2-0" class="reference"><a href="#cite_note-blob__stackoverflow_sentiment-2">[3]</a></sup>. Note that SQLite defines the threshold for large BLOB as 100 KB.
</p>
<p>
To a certain extent, I understand why this is the recommendation:
</p>
<ul>
<li>
<b>Resource constraints</b>: An Android device may have limited memory, and there is a risk in devouring untold MB.
</li>
<li>
<b>BLOBs are not relational data</b>: Databases are for storing and querying relational data. You can't do any type of meaningful queries with BLOBs except for SELECTs
</li>
<li>
<b>Efficiency</b>: Filesystem storage will be more efficient for storing these large BLOBs. Databases pages will just intervene another layer on top of filesystem pages.
</li>
</ul>
<p>
Personally though, I disagree with this recommendation:
</p>
<ul>
<li>
<b>Unnecessary restriction</b>: There is no restriction to allocating a byte array of 2 MB (<code>byte[] array = new byte[2000000]</code>). Or adding 2 million objects to a list. Why should there be a similar one for retrieving SQLite data?
</li>
<li>
<b>BLOBs are data</b>: A good deal of database code is just "get me the data for this ID". It shouldn't matter if the data is less than 1 MB or greater than 1 MB. Especially in the SQLite world, where there isn't even a concept of field lengths for strings (<code>varchar(1)</code> is the same as <code>varchar(10000)</code>)
</li>
<li>
<b>Distribution complication</b>: XOWA distributes English Wikipedia as 80 database files. Storing these images as separate files would balloon the distribution to several thousand files.
</li>
</ul>
<p>
At this point, there seem to be three options to work around the 1 MB limit:
</p>
<ol>
<li>
<b>Access SQLite directly via C language</b>: This is complicated and not really something I want to try now.
</li>
<li>
<b>Use <a href="https://bitbucket.org/almworks/sqlite4java" rel="nofollow" class="external text">sqlite4java</a> which supports large BLOBs</b>: This is the same as number (1), but the work has already been done by another. Although this is promising, I'm not prepared to replace the existing SQlite bridge in XOWA: <a href="https://github.com/xerial/sqlite-jdbc" rel="nofollow" class="external text">Xerial SQLite JDBC</a>
</li>
<li>
<b>Get the length of the blob data and read the BLOB in 1 MB increments</b>: This basically requires 3+ SELECTs.
<ol>
<li>
<b>SELECT length of the blob</b>: <code>SELECT blob_id, length(blob_column) FROM blob_table WHERE blob_id = 0</code>
</li>
<li>
<b>SELECT the 1st MB of the blob</b>: <code>SELECT substr(blob_column, 1, 1000000) FROM blob_table WHERE blob_id = 0</code>
</li>
<li>
<b>SELECT the 2nd MB of the blob</b>: <code>SELECT substr(blob_column, 1000001, 1000000) FROM blob_table WHERE blob_id = 0</code>
</li>
<li>
<b>Keep SELECTing until all data is read</b>: Note that the last SELECT needs to <code>substr</code> the exact remainder of the blob. For example, a BLOB of 2.3 MB should have a final select of 300 KB: <code>SELECT substr(blob_column, 2000001, 300000) FROM blob_table WHERE blob_id = 0</code>
</li>
</ol>
</li>
</ol>
<p>
For now, I went with option #3, as it is the easiest to implement. It is slower, but fortunately BLOBs &gt; 1 MB are far and few between.
</p>
<hr>
<p>
<b>References</b>
</p>
<ol class="references">
<li id="cite_note-blob__android_docs-0">
<span class="mw-cite-backlink"><a href="#cite_ref-blob__android_docs_0-0">^</a></span> <span class="reference-text"><a href="https://developer.android.com/reference/android/os/TransactionTooLargeException.html" rel="nofollow" class="external free">https://developer.android.com/reference/android/os/TransactionTooLargeException.html</a></span>
<pre>
<span class="reference-text">The Binder transaction buffer has a limited fixed size, currently 1Mb, which is shared by all transactions in progress for the process.</span>
</pre>
</li>
<li id="cite_note-blob__sqlite_guidelines-1">
<span class="mw-cite-backlink"><a href="#cite_ref-blob__sqlite_guidelines_1-0">^</a></span> <span class="reference-text"><a href="http://www.sqlite.org/intern-v-extern-blob.html" rel="nofollow" class="external free">http://www.sqlite.org/intern-v-extern-blob.html</a></span>
<pre>
<span class="reference-text">For BLOBs smaller than 100KB, reads are faster when the BLOBs are stored directly in the database file. For BLOBs larger than 100KB, reads from a separate file are faster. </span>
</pre>
</li>
<li id="cite_note-blob__stackoverflow_sentiment-2">
<span class="mw-cite-backlink"><a href="#cite_ref-blob__stackoverflow_sentiment_2-0">^</a></span>
<ul>
<li>
<span class="reference-text"><a href="http://stackoverflow.com/questions/5406429/cursor-size-limit-in-android-sqlitedatabase" rel="nofollow" class="external free">http://stackoverflow.com/questions/5406429/cursor-size-limit-in-android-sqlitedatabase</a></span>
</li>
<li>
<span class="reference-text"><a href="http://stackoverflow.com/questions/12716859/retrieve-large-blob-from-android-sqlite-database" rel="nofollow" class="external free">http://stackoverflow.com/questions/12716859/retrieve-large-blob-from-android-sqlite-database</a></span>
</li>
<li>
<span class="reference-text"><a href="http://stackoverflow.com/questions/17300407/access-large-blob-in-android-sqlite-without-cursor" rel="nofollow" class="external free">http://stackoverflow.com/questions/17300407/access-large-blob-in-android-sqlite-without-cursor</a></span>
</li>
</ul>
</li>
</ol>
<h2>
<span class="mw-headline" id="Release:_NONE_.282016-05-29_20:00_Sun.29">Release: NONE (2016-05-29 20:00 Sun)</span>
</h2>
<p>
This week is also a no show for releases. I'm planning to make a release next week to handle a few parser issues. I'm also going to try to have the desktop app start reading the HTML databases.
</p>
<p>
As I'm still working on the wiki downloader, image dumps outside of English Wikipedia will be on hold for a while. If you want a copy, please drop me an email, or post an issue.
</p>
<p>
Thanks.
</p>
<h2>
<span class="mw-headline" id="Release:_NONE_.282016-05-24_11:00_Tue.29">Release: NONE (2016-05-24 11:00 Tue)</span>
</h2>
<p>
The 2016-05 English Wikipedia HTML dumps for Android are up. See <a href="https://archive.org/details/Xowa_enwiki_latest" rel="nofollow" class="external free">https://archive.org/details/Xowa_enwiki_latest</a>
</p>
<h2>
<span class="mw-headline" id="Release:_NONE_.282016-05-22_20:00_Sun.29">Release: NONE (2016-05-22 20:00 Sun)</span>
</h2>
<p>
I'm skipping the release again. I had to overhaul some of the internals of the downloader which cost me a couple of days. I'm not sure if I can get it ready for next week, so it may be a June release.
</p>
<p>
I did upload the images for English Wikipedia: <a href="https://archive.org/download/Xowa_enwiki_latest/Xowa_enwiki_2016-05-01_file_v2b.7z" rel="nofollow" class="external free">https://archive.org/download/Xowa_enwiki_latest/Xowa_enwiki_2016-05-01_file_v2b.7z</a> . I started uploading the Android html version, but ran into an issue with archive.org. You can start downloading the 2016-05 html files now: <a href="https://archive.org/download/Xowa_enwiki_latest" rel="nofollow" class="external free">https://archive.org/download/Xowa_enwiki_latest</a> . I'll upload the rest during the week.
</p>
<h2>
<span class="mw-headline" id="Release:_NONE_.282016-05-15_19:45_Sun.29">Release: NONE (2016-05-15 19:45 Sun)</span>
</h2>
<p>
This week is another skipped release, as the work continues on the wiki download page. In the meantime, I'm generating 2016-May English Wikipedia and will have them uploaded this week.
</p>
<h2>
<span class="mw-headline" id="Release:_NONE_.282016-05-08_19:45_Sun.29">Release: NONE (2016-05-08 19:45 Sun)</span>
</h2>
<p>
I'm skipping the release this week as I'm still working on the wiki download tool. Hopefully it will make it into the beta next weekend.
</p>
<h2>
<span class="mw-headline" id="Release:_v3.5.1.1_.282016-05-01_20:25_Sun.29">Release: v3.5.1.1 (2016-05-01 20:25 Sun)</span>
</h2>
<p>
The desktop app is a trivial release. It has a few minor parser fixes, primarily for English Wiktionary.
</p>
<p>
The Android app has no release.
</p>
<h3>
<span class="mw-headline" id=".28Desktop.29_Minor_parser_fixes_for_English_Wiktionary">(Desktop) Minor parser fixes for English Wiktionary</span>
</h3>
<p>
There are several minor parser fixes for English Wiktionary. The most major item is a simple implementation for {{categorytree}} which will show a link to the Category page, but not do the Javascript tree expansion.
</p>
<h3>
<span class="mw-headline" id="Next_release:_v3.5.2">Next release: v3.5.2</span>
</h3>
<p>
I'm still working on the wiki update tool. My goal is to have it for Android v3.5.2. Once that is done, I'll upload the next batch of wikis after it (English / French).
</p>
<p>
Also, on a related note, the English Wikipedia dumps for 2016-04-07 are done. Many thanks to Ariel Glenn for fixing the issue: <a href="https://lists.wikimedia.org/pipermail/xmldatadumps-l/2016-April/001301.html" rel="nofollow" class="external free">https://lists.wikimedia.org/pipermail/xmldatadumps-l/2016-April/001301.html</a> . I'm processing them now, and will try to post them this week.
</p>
</div>
</div>
</div>
<div id="mw-head" class="noprint">
<div id="left-navigation">
<div id="p-namespaces" class="vectorTabs">
<h3>Namespaces</h3>
<ul>
<li id="ca-nstab-main" class="selected"><span><a id="ca-nstab-main-href" href="index.html">Page</a></span></li>
</ul>
</div>
</div>
</div>
<div id='mw-panel' class='noprint'>
<div id='p-logo'>
<a style="background-image: url(https://gnosygnu.github.io/xowa/xowa_logo.png);" href="http://xowa.org/" title="Visit the main page"></a>
</div>
<div class="portal" id='xowa-portal-home'>
<h3>XOWA</h3>
<div class="body">
<ul>
<li><a href="http://xowa.org/index.html" title='Visit the main page'>Main page</a></li>
<li><a href="http://xowa.org/screenshots.html" title='See screenshots of XOWA'>Screenshots</a></li>
<li><a href="https://www.youtube.com/watch?v=q0qbXYXEH6M" title="See a video of XOWA Desktop in action">Video</a></li>
<li><a href="http://xowa.org/home/wiki/Help/Download_XOWA.html" title='Download the XOWA application'>Download XOWA</a></li>
<li><a href="http://xowa.org/home/wiki/Dashboard/Image_databases.html" title='Download offline wikis and image databases'>Download wikis</a></li>
</ul>
</div>
</div>
<div class="portal" id='xowa-portal-started'>
<h3>Getting started</h3>
<div class="body">
<ul>
<li><a href="http://xowa.org/home/wiki/App/Setup/System_requirements.html" title='Get XOWA&apos;s system requirements'>Requirements</a></li>
<li><a href="http://xowa.org/home/wiki/App/Setup/Installation.html" title='Get instructions for installing XOWA'>Installation</a></li>
<li><a href="http://xowa.org/home/wiki/App/Import/Simple_Wikipedia.html" title='Learn how to set up Simple Wikipedia'>Simple Wikipedia</a></li>
<li><a href="http://xowa.org/home/wiki/App/Import/English_Wikipedia.html" title='Learn how to set up English Wikipedia'>English Wikipedia</a></li>
<li><a href="http://xowa.org/home/wiki/App/Import/Other_wikis.html" title='Learn how to set up other Wikipedias'>Other Wikipedias</a></li>
</ul>
</div>
</div>
<div class="portal" id='xowa-portal-android'>
<h3>Android</h3>
<div class="body">
<ul>
<li><a href="http://xowa.org/home/wiki/Android/Setup.html" title='Setup XOWA on your Android device'>Setup</a></li>
<li><a href="https://www.youtube.com/watch?v=jsMTBxGweUw" title="See a video of XOWA Android in action">Video</a></li>
</ul>
</div>
</div>
<div class="portal" id='xowa-portal-help'>
<h3>Help</h3>
<div class="body">
<ul>
<li><a href="http://xowa.org/home/wiki/Help/About.html" title='Get more information about XOWA'>About</a></li>
<li><a href="http://xowa.org/home/wiki/Help/Contents.html" title='View a list of help topics'>Contents</a></li>
<li><a href="http://xowa.org/home/wiki/Help/Media.html" title='Read what others have written about XOWA'>Media</a></li>
<li><a href="http://xowa.org/home/wiki/Help/Feedback.html" title='Questions? Comments? Leave feedback for XOWA'>Feedback</a></li>
</ul>
</div>
</div>
<div class="portal" id='xowa-portal-blog'>
<h3>Blog</h3>
<div class="body">
<ul>
<li><a href="http://xowa.org/home/wiki/Blog.html" title='Follow XOWA''s development process'>Current</a></li>
</ul>
</div>
</div>
<div class="portal" id='xowa-portal-links'>
<h3>Links</h3>
<div class="body">
<ul>
<li><a href="http://dumps.wikimedia.org/backup-index.html" title="Get wiki datababase dumps directly from Wikimedia">Wikimedia dumps</a></li>
<li><a href="https://archive.org/search.php?query=xowa" title="Search archive.org for XOWA files">XOWA @ archive.org</a></li>
<li><a href="http://en.wikipedia.org" title="Visit Wikipedia (and compare to XOWA!)">English Wikipedia</a></li>
</ul>
</div>
</div>
<div class="portal" id='xowa-portal-donate'>
<h3>Donate</h3>
<div class="body">
<ul>
<li><a href="https://archive.org/donate/index.php" title="Support archive.org!">archive.org</a></li><!-- listed first due to recent fire damages: http://blog.archive.org/2013/11/06/scanning-center-fire-please-help-rebuild/ -->
<li><a href="https://donate.wikimedia.org/wiki/Special:FundraiserRedirector" title="Support Wikipedia!">Wikipedia</a></li>
<li><a href="http://xowa.org/home/wiki/Help/Donate.html" title="Support XOWA!">XOWA</a></li>
</ul>
</div>
</div>
</div>
</body>
</html>