Blog/2016-06
Contents
-
1 Release: v3.6.3.1 (2016-06-19 19:30 Sun)
- 1.1 (PC & Android) New Download Central to download HTML dumps and images
- 1.2 (PC & Server) Android HTML dumps are now usable
- 1.3 (PC & Android & Server) Search now reduces importance of short pages
- 1.4 (Server) Fix for broken Search
- 1.5 (PC) Minor fixes (Database, HTML Dump, Search, Special, Parser)
- 1.6 (Android) Minor changes (more Special pages: redesigned UI for special pages, HTML Page Title, flat icon)
- 1.7 (Wikis) English Wikipedia (2016-06) and Simple Wikipedia (2016-06)
- 1.8 Next release: v3.6.4
- 2 Release: Android Beta with wiki downloader (2016-06-12 23:00 Sun)
- 3 Release: NONE (2016-06-05 22:00 Sun)
Release: v3.6.3.1 (2016-06-19 19:30 Sun)
The PC app is a major release. It has a new Download page, can now read Android HTML databases by default, and includes many minor fixes
The Android app is a major release. It also has a new Download page, as well as some minor changes.
The Server app is a major release. It can read Android HTML databases by default, and has a fix for the broken Search.
(PC & Android) New Download Central to download HTML dumps and images
Download Central is the major feature of the release. This is an in-app downloader that can download wikis for both Android or desktop. It can download images as well.
For v3.6.3, there are only two wikis: Simple Wikipedia and English Wikipedia. More wikis will be added every week throughout 2016. Next week, v3.6.4 should have other English wikis, like English Wiktionary and Wikisource.
Going forward, Download Central will be the primary vehicle to get XOWA wikis. It provides a simple way to import wikis. It eliminates any confusion about which files to download and where to copy them. Monthly updates for English Wikipedia will be published here as well.
To try it out, go to Download Central. For more info, see the Download Central help page.
(PC & Server) Android HTML dumps are now usable
This is the other major feature of the release. Previously, the app could only read wikitext dumps. Now, it can read the XOWA Android HTML dumps. This has a few key benefits:
- One micro-SD card: many platforms: The same micro-SD card can be used to read XOWA on Windows, Linux, Mac OS X, Raspberry PI or Android
- Fast page loads: The app no longer needs to generate HTML from the wikitext. It can just load the HTML directly. This will be significantly faster. For example, a page like Earth might have taken 5 seconds to load. Now it loads in 1 second (or less).
- Wikidata no longer required: The HTML dumps no longer need Wikidata, which took up an additional 32 GB of space. Previously, Wikidata (www.wikidata.org) needed to be imported, or else some data would be missing.
However, there are drawbacks as well:
-
Missing features: The HTML dumps currently do not support these features:
- Table of Contents: Android generates Table of Contents in a different manner
- Redlinks: This feature still needs to be implemented.
- Links in other languages: Wikidata is still needed for this information
- Categories (partially): Categories are dynamically rendered. The HTML dumps includes the first 200 items in a category, but if there are more, they won't be available
- Other omissions: Image Map and Gallery were found to be broken for 2016-06 English Wikipedia. They are fixed for future releases. However, there will probably be other items that will be discovered as well.
- Plans: Each of these features will be implemented over the course of 2016. The end-goal is to have the HTML dumps produce the same output as the wikitext dumps.
- Larger size: The HTML dumps are larger than the wikitext (approximately 30 GB vs 20 GB).
- Plans: This will be whittled down over releases, though it's unlikely that the HTML dumps will ever be smaller than 25 GB.
- Requires separate post-processing generation step: The wikitext dumps were automatically generated by downloading an XML dump. The HTML dumps requires another post-processing step that is not simple to run (See: Dev/Command-line/Dumps)
- Plans: This may be simplified with a "Generate HTML dump" button in the future, but generating an HTML dump for English Wikipedia will probably be a resource-intensive task
Finally, a few notes on wikitext dumps versus HTML dumps.
- More work on HTML dumps in the future: Going forward, a lot of development work will go into the HTML dumps. This is necessary as the wikitext dumps are too slow for Android.
- HTML dumps will be uploaded to archive.org: Previously, Wikitext dumps were uploaded to archive.org. Now it will be HTML dumps instead.
- Wikitext dump support is not going away: Wikitext dumps will still be supported and used. Note that they are still a critical precursor to HTML dumps. (HTML dumps can't be produced without them).
(PC & Android & Server) Search now reduces importance of short pages
The new XOWA Search Engine uses PageRank to rate pages by importance. Although this works well for Wikipedia, it sometimes overrates pages which exist for encyclopedic book-keeping.
For example, a lot of Wikipedia pages will have a small box called "Authority Control" at the bottom of the page. This box will have links to other pages like https://en.wikipedia.org/wiki/Integrated_Authority_Control If a million pages have this Integrated Authority Control link, then PageRank rates this page highly. ("1 million pages link to it!") However, the page itself is fairly short, and is not really one of the most important articles in Wikipedia (it would score higher than India, Insect, Italy, etc).
v3.6.3 tries to reduce the importance of these pages if these articles are "short". This heuristic was already present in the previous versions of the search engine, but has been further tweaked.
(Server) Fix for broken Search
This was kindly reported by thombles here: https://github.com/gnosygnu/xowa/issues/59 . The new Search Engine in 2016-03 broke the HTTP Server search. This was fixed in this release.
(PC) Minor fixes (Database, HTML Dump, Search, Special, Parser)
These can be described briefly as follows:
- Database: Small wikis now generate a "core.xowa" file in addition to a "text.xowa" file. Previously, they only generated a "text.xowa" file.
- HTML dump: Image Map and Gallery weren't working correctly. Note that these are still broken in the current 2016-06 dump, but will be fixed in the 2016-07 one.
- Search: A few searches would be blank if the page also existed in a different namespace (for example, dokuwiki)
- Special: Special page names can now be case-insensitive (Special:RANDOM) or use native-language terms (Spezial:Zufällige_Seite)
- Parser: A handful of script errors around redirect links and country flags
(Android) Minor changes (more Special pages: redesigned UI for special pages, HTML Page Title, flat icon)
These can also be described briefly as follows:
-
More Special pages: The following two pages have been added:
- Wiki Info: Shows information about the wiki, including size and location. Also allows deletion of the wiki. Available by doing: Menu -> XOWA -> Info icon
- Log: Shows log information for troubleshooting. Available by doing: Menu -> Settings -> Logs
- Redesigned UI for special pages: Special pages now use a common look and feel. This involves shared CSS, more icons, and a simple "help" panel. Also, they no longer show the footer (view page in browser; last modified, CC Content)
- HTML Page Title: The Page Title used to be an Android TextView widget. Now it's part of the HTML document. This makes resizing the page much nicer (the widget used to jump around when the page was resized)
- Flat icon: XOWA now has a flat icon
(Wikis) English Wikipedia (2016-06) and Simple Wikipedia (2016-06)
These are available through Download Central.
For users who want the 2016-06 English Wikipedia image update, see App/Import/Download Central
Next release: v3.6.4
I'm focusing on HTML dump related issues for the next few weeks, particularly table of contents and redlinks
For wikis, I'm going to update the generate the other English wikis for Download Central.
Release: Android Beta with wiki downloader (2016-06-12 23:00 Sun)
I released a beta version of the wiki downloader: See https://play.google.com/store/apps/details?id=org.xowa.beta and https://github.com/gnosygnu/xowa/releases . I'm planning on doing some minor tweaks this week, before uploading it to the main XOWA Android app next week. I'll also release the desktop app as well.
In addition, I generated 2016-06 English Wikipedia, and will be uploading a new version to be used with the wiki downloader. Once I get the wiki downloader stabilized, I'll start updating the other wikis.
Release: NONE (2016-06-05 22:00 Sun)
There's no release again this week. I've been bogged down in Android SQLite optimizations for the wiki downloader. I'm going to release something next week as I'd like to get back to regular releases.
English Wikipedia is building now, so hopefully I'll have that ready for next week.