1
0
mirror of https://github.com/gnosygnu/xowa.git synced 2025-05-30 14:04:56 +00:00

update Command-Line/dumps

This commit is contained in:
gnosygnu 2017-02-02 11:48:15 -05:00
parent b4a7e4742e
commit f26baacee6
21 changed files with 165 additions and 131 deletions

View File

@ -138,7 +138,7 @@
<span class="mw-headline" id="Detailed_start">Detailed start</span>
</h2>
<p>
See <a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_3" title="Wiki setup/English wikis">Wiki_setup/English_wikis</a>
See <a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_3" title="Wiki setup/English wikis" class="xowa-visited">Wiki_setup/English_wikis</a>
</p>
</div>

View File

@ -170,7 +170,12 @@
</h3>
<ul>
<li>
Setup Python 2: <a href="https://www.python.org/downloads/" rel="nofollow" class="external free">https://www.python.org/downloads/</a> . For the rest of this walkthrough, we'll assume it's installed at <code>C:\Python2.7</code>
Setup Python 2: <a href="https://www.python.org/downloads/" rel="nofollow" class="external free">https://www.python.org/downloads/</a> . For the rest of this walkthrough, we'll assume it's installed at <code>C:\Python2.7</code>
<ul>
<li>
<b>Note that the dumpgenerator.py script will not work with Python 3</b>
</li>
</ul>
</li>
<li>
Download the wikiteam project from <a href="https://github.com/WikiTeam/wikiteam" rel="nofollow" class="external free">https://github.com/WikiTeam/wikiteam</a> For the rest of this walkthrough, we'll assume it's downloaded to <code>C:\WikiTeam</code>

View File

@ -416,7 +416,7 @@
The Android app is a major release. It has a Random feature, shows more images, and adds CSS-tweaks.
</p>
<h3>
<span class="mw-headline" id="Documentation_for_html-dump_script._See_Dev.2FCommand-line.2FDumps">Documentation for html-dump script. See <a href="http://xowa.org/home/wiki/Dev/Command-line/Dumps.html" id="xolnki_15" title="Dev/Command-line/Dumps">Dev/Command-line/Dumps</a></span>
<span class="mw-headline" id="Documentation_for_html-dump_script._See_Dev.2FCommand-line.2FDumps">Documentation for html-dump script. See <a href="http://xowa.org/home/wiki/Dev/Command-line/Dumps.html" id="xolnki_15" title="Dev/Command-line/Dumps" class="xowa-visited">Dev/Command-line/Dumps</a></span>
</h3>
<p>
This item is self-explanatory. The XOWA Android app is getting more stable, so I felt it would be time to document the generation of the HTML databases.

View File

@ -437,7 +437,7 @@
</dl>
<ul>
<li>
<b>Requires separate post-processing generation step</b>: The wikitext dumps were automatically generated by downloading an XML dump. The HTML dumps requires another post-processing step that is not simple to run (See: <a href="http://xowa.org/home/wiki/Dev/Command-line/Dumps.html" id="xolnki_20" title="Dev/Command-line/Dumps">Dev/Command-line/Dumps</a>)
<b>Requires separate post-processing generation step</b>: The wikitext dumps were automatically generated by downloading an XML dump. The HTML dumps requires another post-processing step that is not simple to run (See: <a href="http://xowa.org/home/wiki/Dev/Command-line/Dumps.html" id="xolnki_20" title="Dev/Command-line/Dumps" class="xowa-visited">Dev/Command-line/Dumps</a>)
</li>
</ul>
<dl>

View File

@ -157,7 +157,7 @@
</p>
<ul>
<li>
<a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_15" title="Wiki setup/English wikis">Wiki_setup/English_wikis</a>
<a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_15" title="Wiki setup/English wikis" class="xowa-visited">Wiki_setup/English_wikis</a>
</li>
<li>
<a href="http://xowa.org/home/wiki/Wiki_setup/German_wikis.html" id="xolnki_16" title="Wiki setup/German wikis">Wiki_setup/German_wikis</a>
@ -238,7 +238,7 @@
</p>
<ul>
<li>
<a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_18" title="Wiki setup/English wikis">Wiki_setup/English_wikis</a>
<a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_18" title="Wiki setup/English wikis" class="xowa-visited">Wiki_setup/English_wikis</a>
</li>
<li>
<a href="http://xowa.org/home/wiki/Wiki_setup/German_wikis.html" id="xolnki_19" title="Wiki setup/German wikis">Wiki_setup/German_wikis</a>

View File

@ -818,7 +818,7 @@
</ul>
<dl>
<dd>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_13" title="Wiki setup/English wikis">Wiki_setup/English_wikis</a>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_13" title="Wiki setup/English wikis" class="xowa-visited">Wiki_setup/English_wikis</a>
</dd>
</dl>
<h3>
@ -1122,7 +1122,7 @@
</ul>
<dl>
<dd>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_19" title="Wiki setup/English wikis">Wiki_setup/English_wikis</a>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_19" title="Wiki setup/English wikis" class="xowa-visited">Wiki_setup/English_wikis</a>
</dd>
</dl>
<ul>
@ -1426,7 +1426,7 @@
<span style='font-variant:small-caps'>Resolved by</span>: Include latest download central database.
</dd>
<dd>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_22" title="Wiki setup/English wikis">Wiki_setup/English_wikis</a> <a href="http://xowa.org/home/wiki/Wiki_setup/German_wikis.html" id="xolnki_23" title="Wiki setup/German wikis">Wiki_setup/German_wikis</a>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_22" title="Wiki setup/English wikis" class="xowa-visited">Wiki_setup/English_wikis</a> <a href="http://xowa.org/home/wiki/Wiki_setup/German_wikis.html" id="xolnki_23" title="Wiki setup/German wikis">Wiki_setup/German_wikis</a>
</dd>
</dl>
<h2>
@ -1442,7 +1442,7 @@
</ul>
<dl>
<dd>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_24" title="Wiki setup/English wikis">Wiki_setup/English_wikis</a>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_24" title="Wiki setup/English wikis" class="xowa-visited">Wiki_setup/English_wikis</a>
</dd>
</dl>
<ul>
@ -2249,7 +2249,7 @@
</ul>
<dl>
<dd>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Dev/Command-line/Dumps.html" id="xolnki_35" title="Dev/Command-line/Dumps">Dev/Command-line/Dumps</a>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Dev/Command-line/Dumps.html" id="xolnki_35" title="Dev/Command-line/Dumps" class="xowa-visited">Dev/Command-line/Dumps</a>
</dd>
</dl>
<ul>

View File

@ -2582,7 +2582,7 @@
</p>
<ul>
<li>
Command-line: Expand instructions for generating HTML dumps. See: <a href="http://xowa.org/home/wiki/Dev/Command-line/Dumps.html" id="xolnki_30" title="Dev/Command-line/Dumps">Dev/Command-line/Dumps</a>
Command-line: Expand instructions for generating HTML dumps. See: <a href="http://xowa.org/home/wiki/Dev/Command-line/Dumps.html" id="xolnki_30" title="Dev/Command-line/Dumps" class="xowa-visited">Dev/Command-line/Dumps</a>
</li>
</ul>
<p>

View File

@ -64,7 +64,7 @@
<span style='font-variant:small-caps'>Resolved by</span>: Include latest download central database.
</dd>
<dd>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_2" title="Wiki setup/English wikis">Wiki_setup/English_wikis</a> <a href="http://xowa.org/home/wiki/Wiki_setup/German_wikis.html" id="xolnki_3" title="Wiki setup/German wikis">Wiki_setup/German_wikis</a>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_2" title="Wiki setup/English wikis" class="xowa-visited">Wiki_setup/English_wikis</a> <a href="http://xowa.org/home/wiki/Wiki_setup/German_wikis.html" id="xolnki_3" title="Wiki setup/German wikis">Wiki_setup/German_wikis</a>
</dd>
</dl>
<h2>
@ -80,7 +80,7 @@
</ul>
<dl>
<dd>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_4" title="Wiki setup/English wikis">Wiki_setup/English_wikis</a>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_4" title="Wiki setup/English wikis" class="xowa-visited">Wiki_setup/English_wikis</a>
</dd>
</dl>
<ul>

View File

@ -56,7 +56,7 @@
</ul>
<dl>
<dd>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_2" title="Wiki setup/English wikis">Wiki_setup/English_wikis</a>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_2" title="Wiki setup/English wikis" class="xowa-visited">Wiki_setup/English_wikis</a>
</dd>
</dl>
<ul>

View File

@ -70,7 +70,7 @@
</ul>
<dl>
<dd>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_6" title="Wiki setup/English wikis">Wiki_setup/English_wikis</a>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Wiki_setup/English_wikis.html" id="xolnki_6" title="Wiki setup/English wikis" class="xowa-visited">Wiki_setup/English_wikis</a>
</dd>
</dl>
<h3>

View File

@ -279,7 +279,7 @@
</ul>
<dl>
<dd>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Dev/Command-line/Dumps.html" id="xolnki_5" title="Dev/Command-line/Dumps">Dev/Command-line/Dumps</a>
<span style='font-variant:small-caps'>Links</span>: <a href="http://xowa.org/home/wiki/Dev/Command-line/Dumps.html" id="xolnki_5" title="Dev/Command-line/Dumps" class="xowa-visited">Dev/Command-line/Dumps</a>
</dd>
</dl>
<ul>

View File

@ -68,7 +68,7 @@
<a href="#Requirements"><span class="tocnumber">2</span> <span class="toctext">Requirements</span></a>
<ul>
<li class="toclevel-2 tocsection-3">
<a href="#commons.wikimedia.org_.28thum"><span class="tocnumber">2.1</span> <span class="toctext">commons.wikimedia.org (thum</span></a>
<a href="#commons.wikimedia.org"><span class="tocnumber">2.1</span> <span class="toctext">commons.wikimedia.org</span></a>
</li>
<li class="toclevel-2 tocsection-4">
<a href="#www.wikidata.org"><span class="tocnumber">2.2</span> <span class="toctext">www.wikidata.org</span></a>
@ -158,7 +158,7 @@
<span class="mw-headline" id="Requirements">Requirements</span>
</h2>
<h3>
<span class="mw-headline" id="commons.wikimedia.org_.28thum">commons.wikimedia.org (thum</span>
<span class="mw-headline" id="commons.wikimedia.org">commons.wikimedia.org</span>
</h3>
<p>
You will need the latest version of commons.wikimedia.org. Note that if you have an older version, you will have missing images or wrong size information.
@ -304,6 +304,10 @@
<pre class='code'>
app.bldr.pause_at_end_('n');
app.scripts.run_file_by_type('xowa_cfg_app');
app.cfg.set_temp('app', 'xowa.app.web.enabled', 'y');
app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.text', '0');
app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.html', '0');
app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.file', '0');
app.bldr.cmds {
// build commons database; this only needs to be done once, whenever commons is updated
add ('commons.wikimedia.org' , 'util.cleanup') {delete_all = 'y';}
@ -391,9 +395,11 @@ app.bldr.cmds {
// cleanup all downloaded files as well as temporary files
add ('simple.wikipedia.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
// OBSOLETE: use v2
// v1 html generator
// parse every page in the listed namespace and gather data on their lnkis.
// this step will take the longest amount of time.
/*
add ('simple.wikipedia.org' , 'file.lnki_temp') {
// save data every # of pages
commit_interval = 10000;
@ -426,13 +432,14 @@ app.bldr.cmds {
hzip_diff = 'y';
}
}
*/
// v2 html generator; allows for multi-threaded / multi-machine builds
/*
add ('simple.wikipedia.org' , 'wiki.mass_parse.init') {cfg {ns_ids = '0|4|14';}}
add ('simple.wikipedia.org' , 'wiki.mass_parse.init') {cfg {ns_ids = '0|4|14|8';}}
add ('simple.wikipedia.org' , 'wiki.mass_parse.exec') {
cfg {
num_wkrs = 8; load_all_templates = 'y'; cleanup_interval = 50; hzip_enabled = 'y'; hdiff_enabled ='y'; manual_now = '2016-08-01 01:02:03';
load_all_imglinks = 'y';
// uncomment the following 3 lines if using the build script as a "worker" helping a "server"
// num_pages_in_pool = 32000;
@ -443,8 +450,7 @@ app.bldr.cmds {
// note that if multi-machine mode is enabled, all worker directories must be manually copied to the server directory (a build command will be added later)
add ('simple.wikipedia.org' , 'wiki.mass_parse.make');
*/
// aggregate the lnkis
add ('simple.wikipedia.org' , 'file.lnki_regy');
@ -492,7 +498,10 @@ app.bldr.run;
<pre class='code'>
app.bldr.pause_at_end_('n');
app.scripts.run_file_by_type('xowa_cfg_app');
app.cfgs.get('app.user.cfg.security.web_access_enabled', 'app').val = 'y';
app.cfg.set_temp('app', 'xowa.app.web.enabled', 'y');
app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.text', '0');
app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.html', '0');
app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.file', '0');
app.bldr.cmds {
/*
add ('www.wikidata.org' , 'util.cleanup') {delete_all = 'y';}
@ -502,56 +511,47 @@ app.bldr.cmds {
add ('www.wikidata.org' , 'util.download') {dump_type = 'image';}
add ('www.wikidata.org' , 'text.init');
add ('www.wikidata.org' , 'text.page');
add ('www.wikidata.org' , 'text.cat.core');
add ('www.wikidata.org' , 'text.cat.link');
add ('www.wikidata.org' , 'text.cat.hidden');
add ('www.wikidata.org' , 'text.term');
add ('www.wikidata.org' , 'text.css');
add ('www.wikidata.org' , 'wiki.image');
add ('www.wikidata.org' , 'file.page_regy') {build_commons = 'y'}
add ('www.wikidata.org' , 'wiki.page_dump.make');
add ('www.wikidata.org' , 'wiki.page_props');
add ('www.wikidata.org' , 'wiki.categorylinks');
add ('www.wikidata.org' , 'wiki.redirect') {commit_interval = 1000; progress_interval = 100; cleanup_interval = 100;}
add ('www.wikidata.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
// add ('www.wikidata.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
add ('commons.wikimedia.org' , 'util.cleanup') {delete_all = 'y';}
add ('commons.wikimedia.org' , 'util.download') {dump_type = 'pages-articles';}
add ('commons.wikimedia.org' , 'util.download') {dump_type = 'image';}
add ('commons.wikimedia.org' , 'util.download') {dump_type = 'categorylinks';}
add ('commons.wikimedia.org' , 'util.download') {dump_type = 'page_props';}
add ('commons.wikimedia.org' , 'util.download') {dump_type = 'image';}
add ('commons.wikimedia.org' , 'text.init');
add ('commons.wikimedia.org' , 'text.page');
add ('commons.wikimedia.org' , 'text.cat.core');
add ('commons.wikimedia.org' , 'text.cat.link');
add ('commons.wikimedia.org' , 'text.cat.hidden');
add ('commons.wikimedia.org' , 'text.term');
add ('commons.wikimedia.org' , 'text.css');
add ('commons.wikimedia.org' , 'wiki.image');
add ('commons.wikimedia.org' , 'file.page_regy') {build_commons = 'y'}
add ('commons.wikimedia.org' , 'wiki.page_dump.make');
add ('commons.wikimedia.org' , 'wiki.redirect') {commit_interval = 1000; progress_interval = 100; cleanup_interval = 100;}
*/
/*
// en.wikipedia.org
// add ('commons.wikimedia.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
add ('en.wikipedia.org' , 'util.download') {dump_type = 'pages-articles';}
add ('en.wikipedia.org' , 'util.download') {dump_type = 'pagelinks';}
add ('en.wikipedia.org' , 'util.download') {dump_type = 'categorylinks';}
add ('en.wikipedia.org' , 'util.download') {dump_type = 'page_props';}
add ('en.wikipedia.org' , 'util.download') {dump_type = 'image';}
add ('en.wikipedia.org' , 'util.download') {dump_type = 'pagelinks';}
add ('en.wikipedia.org' , 'util.download') {dump_type = 'imagelinks';}
*/
/*
// en.wikipedia.org
add ('en.wikipedia.org' , 'text.init');
add ('en.wikipedia.org' , 'text.page') {redirect_id_enabled = 'y';}
add ('en.wikipedia.org' , 'text.search');
add ('en.wikipedia.org' , 'text.css');
add ('en.wikipedia.org' , 'text.cat.core');
add ('en.wikipedia.org' , 'text.cat.link');
add ('en.wikipedia.org' , 'text.cat.hidden');
add ('en.wikipedia.org' , 'text.term');
// add ('en.wikipedia.org' , 'wiki.redirect') {commit_interval = 1000; progress_interval = 100; cleanup_interval = 100;}
add ('en.wikipedia.org' , 'wiki.image');
add ('en.wikipedia.org' , 'wiki.page_dump.make');
add ('en.wikipedia.org' , 'wiki.page_link');
add ('en.wikipedia.org' , 'wiki.imagelinks');
add ('en.wikipedia.org' , 'wiki.page_dump.make');
add ('en.wikipedia.org' , 'wiki.redirect') {commit_interval = 1000; progress_interval = 100; cleanup_interval = 100;}
add ('en.wikipedia.org' , 'wiki.page_link');
add ('en.wikipedia.org' , 'search.page__page_score') {iteration_max = 100;}
add ('en.wikipedia.org' , 'search.link__link_score') {page_rank_enabled = 'y';
score_adjustment_mgr {
@ -566,32 +566,43 @@ app.bldr.cmds {
}
}
add ('en.wikipedia.org' , 'search.word__link_count')
/*
// SELECT * FROM xowa_cfg WHERE cfg_key = 'props.modified_latest';
add ('en.wikipedia.org' , 'file.lnki_temp') {
commit_interval = 10000; progress_interval = 50; cleanup_interval = 50; select_size = 25;
ns_ids = '0|4|14|100';
hdump_bldr {enabled = 'y'; hzip_enabled = 'y'; hzip_diff = 'y';}
}
add ('en.wikipedia.org' , 'file.lnki_regy');
add ('commons.wikimedia.org' , 'file.page_regy') {build_commons = 'y'}
add ('en.wikipedia.org' , 'file.page_regy') {build_commons = 'n';}
add ('en.wikipedia.org' , 'wiki.image');
add ('en.wikipedia.org' , 'file.orig_regy');
// SELECT * FROM orig_regy WHERE lnki_ttl = 'BSicon_CONTr.svg';
// SELECT * FROM page_regy WHERE src_ttl = 'BSicon_CONTr.svg';
add ('en.wikipedia.org' , 'file.xfer_temp.thumb');
// SELECT Count(*) FROM xfer_regy WHERE xfer_status = 0;
// SELECT * FROM xfer_regy WHERE xfer_status = 0 AND lnki_page_id = 372692; --en.w:Featured_picture_candidates
add ('en.wikipedia.org' , 'file.xfer_regy');
add ('en.wikipedia.org' , 'wiki.page_props');
add ('en.wikipedia.org' , 'wiki.categorylinks');
*/
/*
add ('en.wikipedia.org' , 'file.page_regy') {build_commons = 'n'}
add ('en.wikipedia.org' , 'wiki.mass_parse.init') {cfg {ns_ids = '0|4|100|14|8';}}
// add ('en.wikipedia.org' , 'wiki.mass_parse.resume');
add ('en.wikipedia.org' , 'wiki.mass_parse.exec') {cfg {
num_wkrs = 8; load_all_templates = 'y'; load_ifexists_ns = '*'; cleanup_interval = 25; hzip_enabled = 'y'; hdiff_enabled ='y'; manual_now = '2017-01-01 01:02:03';}
// num_wkrs = 1; load_all_templates = 'n'; load_all_imglnks = 'n'; cleanup_interval = 50; hzip_enabled = 'y'; hdiff_enabled ='y'; manual_now = '2016-07-28 01:02:03';}
}
add ('en.wikipedia.org' , 'wiki.mass_parse.make');
*/
/*
add ('en.wikipedia.org' , 'file.lnki_temp') {
commit_interval = 10000; progress_interval = 50; cleanup_interval = 50; select_size = 25;
ns_ids = '0|4|14|100|12|8|6|10|828|108|118|446|710|2300|2302|2600';
hdump_bldr {enabled = 'y'; hzip_enabled = 'y'; hzip_diff = 'y';}
}
*/
/*
add ('commons.wikimedia.org' , 'file.page_regy') {build_commons = 'y'}
add ('en.wikipedia.org' , 'file.page_regy') {build_commons = 'n';}
add ('en.wikipedia.org' , 'file.lnki_regy');
// add ('en.wikipedia.org' , 'wiki.image');
add ('en.wikipedia.org' , 'file.orig_regy');
add ('en.wikipedia.org' , 'file.xfer_temp.thumb');
add ('en.wikipedia.org' , 'file.xfer_regy');
add ('en.wikipedia.org' , 'file.xfer_regy_update');
*/
/*
add ('en.wikipedia.org' , 'file.fsdb_make') {
commit_interval = 1000; progress_interval = 200; select_interval = 10000;
ns_ids = '0|4|14|100';
ns_ids = '0|4|100|14|8';
// // specify whether original wiki databases are v1 (.sqlite3) or v2 (.xowa)
// // src_bin_mgr__fsdb_version = 'v2';
// src_bin_mgr__fsdb_version = 'v2';
// trg_bin_mgr__fsdb_version = 'v1';
@ -603,6 +614,7 @@ app.bldr.cmds {
}
add ('en.wikipedia.org' , 'file.orig_reg');
add ('en.wikipedia.org' , 'wiki.page_dump.drop');
add ('en.wikipedia.org' , 'file.page_file_map.create');
*/
}
app.bldr.run;
@ -614,6 +626,9 @@ app.bldr.run;
<li>
2016-10-12: explicitly set web_access_enabled to y
</li>
<li>
2017-02-02: updated script for multi-threaded version and new options
</li>
</ul>
</div>

View File

@ -68,7 +68,7 @@
<a href="#Requirements"><span class="tocnumber">2</span> <span class="toctext">Requirements</span></a>
<ul>
<li class="toclevel-2 tocsection-3">
<a href="#commons.wikimedia.org_.28thum"><span class="tocnumber">2.1</span> <span class="toctext">commons.wikimedia.org (thum</span></a>
<a href="#commons.wikimedia.org"><span class="tocnumber">2.1</span> <span class="toctext">commons.wikimedia.org</span></a>
</li>
<li class="toclevel-2 tocsection-4">
<a href="#www.wikidata.org"><span class="tocnumber">2.2</span> <span class="toctext">www.wikidata.org</span></a>
@ -158,7 +158,7 @@
<span class="mw-headline" id="Requirements">Requirements</span>
</h2>
<h3>
<span class="mw-headline" id="commons.wikimedia.org_.28thum">commons.wikimedia.org (thum</span>
<span class="mw-headline" id="commons.wikimedia.org">commons.wikimedia.org</span>
</h3>
<p>
You will need the latest version of commons.wikimedia.org. Note that if you have an older version, you will have missing images or wrong size information.
@ -304,6 +304,10 @@
<pre class='code'>
app.bldr.pause_at_end_('n');
app.scripts.run_file_by_type('xowa_cfg_app');
app.cfg.set_temp('app', 'xowa.app.web.enabled', 'y');
app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.text', '0');
app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.html', '0');
app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.file', '0');
app.bldr.cmds {
// build commons database; this only needs to be done once, whenever commons is updated
add ('commons.wikimedia.org' , 'util.cleanup') {delete_all = 'y';}
@ -391,9 +395,11 @@ app.bldr.cmds {
// cleanup all downloaded files as well as temporary files
add ('simple.wikipedia.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
// OBSOLETE: use v2
// v1 html generator
// parse every page in the listed namespace and gather data on their lnkis.
// this step will take the longest amount of time.
/*
add ('simple.wikipedia.org' , 'file.lnki_temp') {
// save data every # of pages
commit_interval = 10000;
@ -426,13 +432,14 @@ app.bldr.cmds {
hzip_diff = 'y';
}
}
*/
// v2 html generator; allows for multi-threaded / multi-machine builds
/*
add ('simple.wikipedia.org' , 'wiki.mass_parse.init') {cfg {ns_ids = '0|4|14';}}
add ('simple.wikipedia.org' , 'wiki.mass_parse.init') {cfg {ns_ids = '0|4|14|8';}}
add ('simple.wikipedia.org' , 'wiki.mass_parse.exec') {
cfg {
num_wkrs = 8; load_all_templates = 'y'; cleanup_interval = 50; hzip_enabled = 'y'; hdiff_enabled ='y'; manual_now = '2016-08-01 01:02:03';
load_all_imglinks = 'y';
// uncomment the following 3 lines if using the build script as a "worker" helping a "server"
// num_pages_in_pool = 32000;
@ -443,8 +450,7 @@ app.bldr.cmds {
// note that if multi-machine mode is enabled, all worker directories must be manually copied to the server directory (a build command will be added later)
add ('simple.wikipedia.org' , 'wiki.mass_parse.make');
*/
// aggregate the lnkis
add ('simple.wikipedia.org' , 'file.lnki_regy');
@ -492,7 +498,10 @@ app.bldr.run;
<pre class='code'>
app.bldr.pause_at_end_('n');
app.scripts.run_file_by_type('xowa_cfg_app');
app.cfgs.get('app.user.cfg.security.web_access_enabled', 'app').val = 'y';
app.cfg.set_temp('app', 'xowa.app.web.enabled', 'y');
app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.text', '0');
app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.html', '0');
app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.file', '0');
app.bldr.cmds {
/*
add ('www.wikidata.org' , 'util.cleanup') {delete_all = 'y';}
@ -502,56 +511,47 @@ app.bldr.cmds {
add ('www.wikidata.org' , 'util.download') {dump_type = 'image';}
add ('www.wikidata.org' , 'text.init');
add ('www.wikidata.org' , 'text.page');
add ('www.wikidata.org' , 'text.cat.core');
add ('www.wikidata.org' , 'text.cat.link');
add ('www.wikidata.org' , 'text.cat.hidden');
add ('www.wikidata.org' , 'text.term');
add ('www.wikidata.org' , 'text.css');
add ('www.wikidata.org' , 'wiki.image');
add ('www.wikidata.org' , 'file.page_regy') {build_commons = 'y'}
add ('www.wikidata.org' , 'wiki.page_dump.make');
add ('www.wikidata.org' , 'wiki.page_props');
add ('www.wikidata.org' , 'wiki.categorylinks');
add ('www.wikidata.org' , 'wiki.redirect') {commit_interval = 1000; progress_interval = 100; cleanup_interval = 100;}
add ('www.wikidata.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
// add ('www.wikidata.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
add ('commons.wikimedia.org' , 'util.cleanup') {delete_all = 'y';}
add ('commons.wikimedia.org' , 'util.download') {dump_type = 'pages-articles';}
add ('commons.wikimedia.org' , 'util.download') {dump_type = 'image';}
add ('commons.wikimedia.org' , 'util.download') {dump_type = 'categorylinks';}
add ('commons.wikimedia.org' , 'util.download') {dump_type = 'page_props';}
add ('commons.wikimedia.org' , 'util.download') {dump_type = 'image';}
add ('commons.wikimedia.org' , 'text.init');
add ('commons.wikimedia.org' , 'text.page');
add ('commons.wikimedia.org' , 'text.cat.core');
add ('commons.wikimedia.org' , 'text.cat.link');
add ('commons.wikimedia.org' , 'text.cat.hidden');
add ('commons.wikimedia.org' , 'text.term');
add ('commons.wikimedia.org' , 'text.css');
add ('commons.wikimedia.org' , 'wiki.image');
add ('commons.wikimedia.org' , 'file.page_regy') {build_commons = 'y'}
add ('commons.wikimedia.org' , 'wiki.page_dump.make');
add ('commons.wikimedia.org' , 'wiki.redirect') {commit_interval = 1000; progress_interval = 100; cleanup_interval = 100;}
*/
/*
// en.wikipedia.org
// add ('commons.wikimedia.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
add ('en.wikipedia.org' , 'util.download') {dump_type = 'pages-articles';}
add ('en.wikipedia.org' , 'util.download') {dump_type = 'pagelinks';}
add ('en.wikipedia.org' , 'util.download') {dump_type = 'categorylinks';}
add ('en.wikipedia.org' , 'util.download') {dump_type = 'page_props';}
add ('en.wikipedia.org' , 'util.download') {dump_type = 'image';}
add ('en.wikipedia.org' , 'util.download') {dump_type = 'pagelinks';}
add ('en.wikipedia.org' , 'util.download') {dump_type = 'imagelinks';}
*/
/*
// en.wikipedia.org
add ('en.wikipedia.org' , 'text.init');
add ('en.wikipedia.org' , 'text.page') {redirect_id_enabled = 'y';}
add ('en.wikipedia.org' , 'text.search');
add ('en.wikipedia.org' , 'text.css');
add ('en.wikipedia.org' , 'text.cat.core');
add ('en.wikipedia.org' , 'text.cat.link');
add ('en.wikipedia.org' , 'text.cat.hidden');
add ('en.wikipedia.org' , 'text.term');
// add ('en.wikipedia.org' , 'wiki.redirect') {commit_interval = 1000; progress_interval = 100; cleanup_interval = 100;}
add ('en.wikipedia.org' , 'wiki.image');
add ('en.wikipedia.org' , 'wiki.page_dump.make');
add ('en.wikipedia.org' , 'wiki.page_link');
add ('en.wikipedia.org' , 'wiki.imagelinks');
add ('en.wikipedia.org' , 'wiki.page_dump.make');
add ('en.wikipedia.org' , 'wiki.redirect') {commit_interval = 1000; progress_interval = 100; cleanup_interval = 100;}
add ('en.wikipedia.org' , 'wiki.page_link');
add ('en.wikipedia.org' , 'search.page__page_score') {iteration_max = 100;}
add ('en.wikipedia.org' , 'search.link__link_score') {page_rank_enabled = 'y';
score_adjustment_mgr {
@ -566,32 +566,43 @@ app.bldr.cmds {
}
}
add ('en.wikipedia.org' , 'search.word__link_count')
/*
// SELECT * FROM xowa_cfg WHERE cfg_key = 'props.modified_latest';
add ('en.wikipedia.org' , 'file.lnki_temp') {
commit_interval = 10000; progress_interval = 50; cleanup_interval = 50; select_size = 25;
ns_ids = '0|4|14|100';
hdump_bldr {enabled = 'y'; hzip_enabled = 'y'; hzip_diff = 'y';}
}
add ('en.wikipedia.org' , 'file.lnki_regy');
add ('commons.wikimedia.org' , 'file.page_regy') {build_commons = 'y'}
add ('en.wikipedia.org' , 'file.page_regy') {build_commons = 'n';}
add ('en.wikipedia.org' , 'wiki.image');
add ('en.wikipedia.org' , 'file.orig_regy');
// SELECT * FROM orig_regy WHERE lnki_ttl = 'BSicon_CONTr.svg';
// SELECT * FROM page_regy WHERE src_ttl = 'BSicon_CONTr.svg';
add ('en.wikipedia.org' , 'file.xfer_temp.thumb');
// SELECT Count(*) FROM xfer_regy WHERE xfer_status = 0;
// SELECT * FROM xfer_regy WHERE xfer_status = 0 AND lnki_page_id = 372692; --en.w:Featured_picture_candidates
add ('en.wikipedia.org' , 'file.xfer_regy');
add ('en.wikipedia.org' , 'wiki.page_props');
add ('en.wikipedia.org' , 'wiki.categorylinks');
*/
/*
add ('en.wikipedia.org' , 'file.page_regy') {build_commons = 'n'}
add ('en.wikipedia.org' , 'wiki.mass_parse.init') {cfg {ns_ids = '0|4|100|14|8';}}
// add ('en.wikipedia.org' , 'wiki.mass_parse.resume');
add ('en.wikipedia.org' , 'wiki.mass_parse.exec') {cfg {
num_wkrs = 8; load_all_templates = 'y'; load_ifexists_ns = '*'; cleanup_interval = 25; hzip_enabled = 'y'; hdiff_enabled ='y'; manual_now = '2017-01-01 01:02:03';}
// num_wkrs = 1; load_all_templates = 'n'; load_all_imglnks = 'n'; cleanup_interval = 50; hzip_enabled = 'y'; hdiff_enabled ='y'; manual_now = '2016-07-28 01:02:03';}
}
add ('en.wikipedia.org' , 'wiki.mass_parse.make');
*/
/*
add ('en.wikipedia.org' , 'file.lnki_temp') {
commit_interval = 10000; progress_interval = 50; cleanup_interval = 50; select_size = 25;
ns_ids = '0|4|14|100|12|8|6|10|828|108|118|446|710|2300|2302|2600';
hdump_bldr {enabled = 'y'; hzip_enabled = 'y'; hzip_diff = 'y';}
}
*/
/*
add ('commons.wikimedia.org' , 'file.page_regy') {build_commons = 'y'}
add ('en.wikipedia.org' , 'file.page_regy') {build_commons = 'n';}
add ('en.wikipedia.org' , 'file.lnki_regy');
// add ('en.wikipedia.org' , 'wiki.image');
add ('en.wikipedia.org' , 'file.orig_regy');
add ('en.wikipedia.org' , 'file.xfer_temp.thumb');
add ('en.wikipedia.org' , 'file.xfer_regy');
add ('en.wikipedia.org' , 'file.xfer_regy_update');
*/
/*
add ('en.wikipedia.org' , 'file.fsdb_make') {
commit_interval = 1000; progress_interval = 200; select_interval = 10000;
ns_ids = '0|4|14|100';
ns_ids = '0|4|100|14|8';
// // specify whether original wiki databases are v1 (.sqlite3) or v2 (.xowa)
// // src_bin_mgr__fsdb_version = 'v2';
// src_bin_mgr__fsdb_version = 'v2';
// trg_bin_mgr__fsdb_version = 'v1';
@ -603,6 +614,7 @@ app.bldr.cmds {
}
add ('en.wikipedia.org' , 'file.orig_reg');
add ('en.wikipedia.org' , 'wiki.page_dump.drop');
add ('en.wikipedia.org' , 'file.page_file_map.create');
*/
}
app.bldr.run;
@ -614,6 +626,9 @@ app.bldr.run;
<li>
2016-10-12: explicitly set web_access_enabled to y
</li>
<li>
2017-02-02: updated script for multi-threaded version and new options
</li>
</ul>
</div>

View File

@ -181,7 +181,7 @@
<a href="http://xowa.org/home/wiki/Dev/Command-line.html" id="xolnki_24" title="Dev/Command-line" class="xowa-visited">Overview</a>
</li>
<li>
<a href="http://xowa.org/home/wiki/Dev/Command-line/Dumps.html" id="xolnki_25" title="Dev/Command-line/Dumps">Image dumps</a>
<a href="http://xowa.org/home/wiki/Dev/Command-line/Dumps.html" id="xolnki_25" title="Dev/Command-line/Dumps" class="xowa-visited">Image dumps</a>
</li>
<li>
<a href="http://xowa.org/home/wiki/Dev/Command-line/Wikidata.html" id="xolnki_26" title="Dev/Command-line/Wikidata">Wikidata</a>

View File

@ -278,17 +278,17 @@
</div>
<div id='xowa.files.cache.info__undo' data-xocfg-type='memo' class='xocfg_itm_delete xocfg_itm_hide'>
<span class='xoimg_btn_x16 xoimg_list_undo' onclick='return xo.cfg_edit.delete__send("xowa.files.cache.info")' title="reset to &quot;cache folder: C:\xowa\file
space used: 0.000 B
file count: 0
oldest file:
space used: 69.812 KB
file count: 1
oldest file: 2017-02-02 11:03:45
&quot;">&nbsp;</span>
</div>
<div class='xocfg_itm_data'>
<textarea id="xowa.files.cache.info" data-xocfg-key="xowa.files.cache.info" data-xocfg-type="memo" accesskey="d" class="xocfg_data__memo xocfg_data__readonly" readonly="true">
cache folder: C:\xowa\file
space used: 0.000 B
file count: 0
oldest file:
space used: 69.812 KB
file count: 1
oldest file: 2017-02-02 11:03:45
</textarea>
</div><input type='hidden' id='xowa.files.cache.info__key_box' value='xowa.files.cache.info'> <input type='hidden' id='xowa.files.cache.info__ctx_box' value='app'>
</div>

View File

@ -264,7 +264,7 @@ li.active a, li.active a:hover
</div>
</div>
<p>
gplx.Gfo_invk__noop@7c584ba
gplx.Gfo_invk__noop@70b02e04
</p>
</div>
</div>

View File

@ -312,8 +312,7 @@
<div class='xocfg_itm_data'>
<textarea id="xowa.app.startup.tabs.previous_list" data-xocfg-key="xowa.app.startup.tabs.previous_list" data-xocfg-type="memo" accesskey="d" class="xocfg_data__memo xocfg_data__readonly" readonly="true">
home/wiki/Special:XowaCfg
home/wiki/App/Wiki_types/Wikia.com
home/wiki/Dev/File/Setup/Windows
home/wiki/Dev/Command-line/Dumps
</textarea>
</div><input type='hidden' id='xowa.app.startup.tabs.previous_list__key_box' value='xowa.app.startup.tabs.previous_list'> <input type='hidden' id='xowa.app.startup.tabs.previous_list__ctx_box' value='app'>
</div>

View File

@ -278,17 +278,17 @@
</div>
<div id='xowa.files.cache.info__undo' data-xocfg-type='memo' class='xocfg_itm_delete xocfg_itm_hide'>
<span class='xoimg_btn_x16 xoimg_list_undo' onclick='return xo.cfg_edit.delete__send("xowa.files.cache.info")' title="reset to &quot;cache folder: C:\xowa\file
space used: 0.000 B
file count: 0
oldest file:
space used: 69.812 KB
file count: 1
oldest file: 2017-02-02 11:03:45
&quot;">&nbsp;</span>
</div>
<div class='xocfg_itm_data'>
<textarea id="xowa.files.cache.info" data-xocfg-key="xowa.files.cache.info" data-xocfg-type="memo" accesskey="d" class="xocfg_data__memo xocfg_data__readonly" readonly="true">
cache folder: C:\xowa\file
space used: 0.000 B
file count: 0
oldest file:
space used: 69.812 KB
file count: 1
oldest file: 2017-02-02 11:03:45
</textarea>
</div><input type='hidden' id='xowa.files.cache.info__key_box' value='xowa.files.cache.info'> <input type='hidden' id='xowa.files.cache.info__ctx_box' value='app'>
</div>

View File

@ -276,7 +276,7 @@ li.active a, li.active a:hover
</div>
</div>
<p>
gplx.Gfo_invk__noop@7c584ba
gplx.Gfo_invk__noop@70b02e04
</p>
</div>
</div>

View File

@ -126,11 +126,11 @@
<div id='xowa.wiki.hdumps.read_preferred__name' class='xocfg_itm_name'>
Prefer HTML Databases for Read tab
</div>
<div id='xowa.wiki.hdumps.read_preferred__undo' data-xocfg-type='bool' class='xocfg_itm_delete'>
<div id='xowa.wiki.hdumps.read_preferred__undo' data-xocfg-type='bool' class='xocfg_itm_delete xocfg_itm_hide'>
<span class='xoimg_btn_x16 xoimg_list_undo' onclick='return xo.cfg_edit.delete__send("xowa.wiki.hdumps.read_preferred")' title="reset to &quot;y&quot;">&nbsp;</span>
</div>
<div class='xocfg_itm_data'>
<input id="xowa.wiki.hdumps.read_preferred" data-xocfg-key="xowa.wiki.hdumps.read_preferred" data-xocfg-type="bool" accesskey="d" class="xocfg_data__bool" type="checkbox">
<input id="xowa.wiki.hdumps.read_preferred" data-xocfg-key="xowa.wiki.hdumps.read_preferred" data-xocfg-type="bool" accesskey="d" class="xocfg_data__bool" type="checkbox" checked="checked">
</div><input type='hidden' id='xowa.wiki.hdumps.read_preferred__key_box' value='xowa.wiki.hdumps.read_preferred'> <input type='hidden' id='xowa.wiki.hdumps.read_preferred__ctx_box' value='app'>
</div>
<div id='xowa.wiki.hdumps.read_preferred_help_div' class='xohelp_div'>

View File

@ -276,7 +276,7 @@ li.active a, li.active a:hover
</div>
</div>
<p>
gplx.Gfo_invk__noop@7c584ba
gplx.Gfo_invk__noop@70b02e04
</p>
</div>
</div>