<spanclass="mbox-text-span">Please note that this script is for power users. It is not meant for casual users.</span>
</p>
<p>
<spanclass="mbox-text-span">Please read through these instructions carefully. If you fail to follow these instructions, you may end up downloading millions of images by accident, and have your IP address banned by Wikimedia.</span>
</p>
<p>
<spanclass="mbox-text-span">Also, the script will change in the future, and without any warning. There is no backward compatibility. Although the XOWA databases have a fixed format, the scripts do not. If you discover that your script breaks, please refer to this page, contact me for assistance, or go through the code.</span>
<ahref="#Pre-existing_image_databases_for_your_wiki_.28optional.29"><spanclass="tocnumber">2.5</span><spanclass="toctext">Pre-existing image databases for your wiki (optional)</span></a>
You will need the latest version of commons.wikimedia.org. Note that if you have an older version, you will have missing images or wrong size information.
</p>
<p>
For example, if you have a commons.wikimedia.org from 2015-04-22 and are trying to import a 2015-05-17 English Wikipedia, then any new images added after 2015-04-22 will not be picked up.
You also need to have the latest version of www.wikidata.org. Note that English Wikipedia and other wikis uses Wikidata through the {{#property}} call or Module code. If you have an earlier version, then data will be missing or out of date.
You should have a recent-generation machine with relatively high-performance hardware, especially if you're planning to generate images for English Wikipedia.
</p>
<p>
For context, here is my current machine setup for generating the image dumps:
You should have a broadband connection to the internet. The script will need to download dump files from Wikimedia and some dump files (like English Wikipedia) will be in the 10s of GB.
</p>
<p>
You can opt to download these files separately and place them in the appropriate location beforehand. However, the script below assumes that the machine is always online. If you are offline, you will need to comment the "util.download" lines yourself.
</p>
<h3>
<spanclass="mw-headline"id="Pre-existing_image_databases_for_your_wiki_.28optional.29">Pre-existing image databases for your wiki (optional)</span>
</h3>
<p>
XOWA will automatically re-use the images from existing image databases so that you do not have to redownload them. This is particularly useful for large wikis where redownloading millions of images would be unwanted.
</p>
<p>
It is strongly advised that you download the image database for your wiki. You can find a full list here: <ahref="http://xowa.sourceforge.net/image_dbs.html"rel="nofollow"class="external free">http://xowa.sourceforge.net/image_dbs.html</a> Note that if an image database does not exist for your wiki, you can still proceed to use the script
</p>
<ul>
<li>
If you have v1 image databases, they should be placed in <code>/xowa/file/wiki_domain-prv</code>. For example, English Wikipedia should have <code>/xowa/file/en.wikipedia.org-prv/fsdb.main/fsdb.bin.0000.sqlite3</code>
</li>
<li>
If you have v2 image databases, they should be placed in <code>/xowa/wiki/wiki_domain/prv</code>. For example, English Wikipedia should have <code>/xowa/wiki/en.wikipedia.org/prv/en.wikipedia.org-file-ns.000-db.001.xowa</code>
</li>
</ul>
<h2>
<spanclass="mw-headline"id="gfs">gfs</span>
</h2>
<p>
The script is written in the <code>gfs</code> format. This is a custom scripting format specific to XOWA. It is similar to JSON, but also supports commenting.
</p>
<p>
Unfortunately the error-handling for gfs is quite minimal. When making changes, please do them in small steps and be prepared to go to backups.
</p>
<p>
The following is a brief list of rules:
</p>
<ul>
<li>
Comments are made with either "//","\n" or "/*","*/". For example: <code>// single-line comment</code> or <code>/* multi-line comment*/</code>
</li>
<li>
Booleans are "y" and "n" (yes / no or true / false). For example: <code>enabled = 'y';</code>
</li>
<li>
Numbers are 32-bit integers and are not enclosed in quotes. For example, <code>count = 10000;</code>
</li>
<li>
Strings are surrounded by apostrophes (') or quotes ("). For example: <code>key = 'val';</code>
</li>
<li>
Statements are terminated by a semi-colon (;). For example: <code>procedure1;</code>
</li>
<li>
Statements can take arguments in parentheses. For example: <code>procedure1('argument1', 'argument2', 'argument3');</code>
</li>
<li>
Statements are grouped with curly braces. ({}). For example: <code>group {procedure1; procedure2; procedure3;}</code>
</li>
</ul>
<h2>
<spanclass="mw-headline"id="Terms">Terms</span>
</h2>
<h3>
<spanclass="mw-headline"id="lnki">lnki</span>
</h3>
<p>
A <code>lnki</code> is short for "<b>l</b>i<b>nk</b><b>i</b>nternal". It refers to all wikitext with the double bracket syntax: [[A]]. A more elaborate example for files would be [[File:A.png|thumb|200x300px|upright=.80]]. Note that the abbreviation was chosen to differentiate it from <code>lnke</code> which is short for "<b>l</b>i<b>nk</b><b>e</b>nternal". For the purposes of the script, all lnki data comes from the current wiki's data dump
</p>
<h3>
<spanclass="mw-headline"id="orig">orig</span>
</h3>
<ul>
<li>
An <code>orig</code> is short for "<b>orig</b>inal file". It refers to the original file metadata. For the purposes of this script, all orig data comes from commons.wikimedia.org
</li>
</ul>
<h3>
<spanclass="mw-headline"id="xfer">xfer</span>
</h3>
<ul>
<li>
An <code>xfer</code> is short for "transfer file". It refers to the actual file to be downloaded.
</li>
</ul>
<h3>
<spanclass="mw-headline"id="fsdb">fsdb</span>
</h3>
<ul>
<li>
The <code>fsdb</code> is short for "<b>f</b>ile <b>s</b>ystem <b>d</b>ata<b>b</b>ase". It refers to the internal table format of the XOWA image databases.
</li>
</ul>
<p>
<br>
</p>
<h2>
<spanclass="mw-headline"id="Script">Script</span>
</h2>
<preclass='code'>
app.bldr.pause_at_end_('n');
app.scripts.run_file_by_type('xowa_cfg_app');
app.bldr.cmds {
// build commons database; this only needs to be done once, whenever commons is updated
<li><ahref="http://dumps.wikimedia.org/backup-index.html"title="Get wiki datababase dumps directly from Wikimedia">Wikimedia dumps</a></li>
<li><ahref="https://archive.org/search.php?query=xowa"title="Search archive.org for XOWA files">XOWA @ archive.org</a></li>
<li><ahref="http://en.wikipedia.org"title="Visit Wikipedia (and compare to XOWA!)">English Wikipedia</a></li>
</ul>
</div>
</div>
<divclass="portal"id='xowa-portal-donate'>
<h3>Donate</h3>
<divclass="body">
<ul>
<li><ahref="https://archive.org/donate/index.php"title="Support archive.org!">archive.org</a></li><!-- listed first due to recent fire damages: http://blog.archive.org/2013/11/06/scanning-center-fire-please-help-rebuild/ -->