<ahref="https://github.com/Mattze96/mwad"rel="nofollow"class="external text">mwad</a> is a python script / executable by <ahref="https://github.com/Mattze96"rel="nofollow"class="external text">Mattze96</a> to generate XML dumps using the MediaWiki API
</p>
<divid="toc"class="toc">
<divid="toctitle">
<h2>
Contents
</h2>
</div>
<ul>
<liclass="toclevel-1 tocsection-1">
<ahref="#Overview:_XML_Dumps"><spanclass="tocnumber">1</span><spanclass="toctext">Overview: XML Dumps</span></a>
Wikia wikis: On the Special:Statistics for a given wiki. For example, for the freespeech wikia wiki, one can go to <ahref="http://freespeech.wikia.com/wiki/Special:Statistics"rel="nofollow"class="external free">http://freespeech.wikia.com/wiki/Special:Statistics</a>
</li>
<li>
Other wikis: Varies and depends on wiki setup.
</li>
</ul>
<p>
For non-Wikimedia wikis (Wikia wikis and other wikis), the dumps may not be available or out-of-date. For example, the freespech wikia has a dump date of 2013-12-26, which is over 2 and a half years old.
</p>
<p>
For Wikia wikis, one can request an XML dump by doing the following:
</p>
<ul>
<li>
Logging in with a user account
</li>
<li>
Requesting a dump through the Special:Statistics page
</li>
<li>
Waiting for the dump to be generated
</li>
</ul>
<p>
Other wikis may require emails to the wiki's admins.
</p>
<p>
An alternative to this process is to use Mattze96's mwad: the Media Wiki Api dump
</p>
<h2>
<spanclass="mw-headline"id="Usage">Usage</span>
</h2>
<p>
Currently mwad is available as a command-line executable and a python script.
</p>
<ul>
<li>
For up-to-date info, see <ahref="https://github.com/Mattze96/mwad"rel="nofollow"class="external free">https://github.com/Mattze96/mwad</a>
</li>
<li>
For info as of 2016-07-10, see <ahref="#mwad_usage_notes"id="xolnki_2">mwad usage notes below</a>
</li>
<li>
For a walk-through synopsis, see the following:
</li>
</ul>
<h3>
<spanclass="mw-headline"id="Generating_the_dump">Generating the dump</span>
<b>Do not run this on Wikimedia wikis</b>. Wikimedia has strict web-crawling policies. If you run this on a Wikimedia wiki, such as en.wikipedia.org, your IP address will probably be banned and you will be unable to access Wikipedia.
</li>
<li>
<b>Pay attention to the licenses for the wiki</b>. All Wikia wikis are under a Creative Commons license for article text<supid="cite_ref-wikia_licensing_0-0"class="reference"><ahref="#cite_note-wikia_licensing-0">[1]</a></sup>. Other wikis may follow similiarly permissive licensing but it is your responsibility to check. If a wiki has a strict copyright license, please do not run mwad on it.
</li>
<li>
<b>Web-scraping policies may get your IP banned</b>. Different wikis may have different limits on number of articles downloaded, even through their API. If you're downloading a large wiki, you should consult first with the wiki's admins. Otherwise, your IP address may be flagged as an unauthorized web-crawler and you will be banned.
<li><ahref="http://dumps.wikimedia.org/backup-index.html"title="Get wiki datababase dumps directly from Wikimedia">Wikimedia dumps</a></li>
<li><ahref="https://archive.org/search.php?query=xowa"title="Search archive.org for XOWA files">XOWA @ archive.org</a></li>
<li><ahref="http://en.wikipedia.org"title="Visit Wikipedia (and compare to XOWA!)">English Wikipedia</a></li>
</ul>
</div>
</div>
<divclass="portal"id='xowa-portal-donate'>
<h3>Donate</h3>
<divclass="body">
<ul>
<li><ahref="https://archive.org/donate/index.php"title="Support archive.org!">archive.org</a></li><!-- listed first due to recent fire damages: http://blog.archive.org/2013/11/06/scanning-center-fire-please-help-rebuild/ -->