2016-04-17 15:48:03 +00:00
<!DOCTYPE html>
< html dir = "ltr" >
< head >
< meta http-equiv = "content-type" content = "text/html;charset=UTF-8" / >
< title > Dev/Command-line/Site meta - XOWA< / title >
< link rel = "shortcut icon" href = "https://gnosygnu.github.io/xowa/xowa_logo.png" / >
< link rel = "stylesheet" href = "https://gnosygnu.github.io/xowa/xowa_common.css" type = "text/css" >
2016-05-08 23:51:56 +00:00
< style data-id = "xowa_html.b394ccc80246a04a003f29c1c3b2c21c" >
2016-04-17 15:48:03 +00:00
.console {font-family: monospace; color: #EEEEEE ; background-color: black ; border: medium solid black;}
.code
,.path
,.url {font-family: monospace; color: black ; background-color: #f9f9f9 ; border: medium solid #f9f9f9;}
.bold {font-weight: 900;}
< / style >
2016-05-08 23:51:56 +00:00
< style data-id = "xowa_html.b394ccc80246a04a003f29c1c3b2c21c" >
2016-04-17 15:48:03 +00:00
.console {font-family: monospace; color: #EEEEEE ; background-color: black ; border: medium solid black;}
.code
,.path
,.url {font-family: monospace; color: black ; background-color: #f9f9f9 ; border: medium solid #f9f9f9;}
.bold {font-weight: 900;}
< / style >
< / head >
< body class = "mediawiki ltr sitedir-ltr ns-0 ns-subject skin-vector action-submit vector-animateLayout" spellcheck = "false" >
< div id = "mw-page-base" class = "noprint" > < / div >
< div id = "mw-head-base" class = "noprint" > < / div >
< div id = "content" class = "mw-body" >
< h1 id = "firstHeading" class = "firstHeading" > < span > Dev/Command-line/Site meta< / span > < / h1 >
< div id = "bodyContent" class = "mw-body-content" >
< div id = "siteSub" > From XOWA: the free, open-source, offline wiki application< / div >
< div id = "contentSub" > < / div >
< div id = "mw-content-text" lang = "en" dir = "ltr" class = "mw-content-ltr" >
< div id = "toc" class = "toc" >
< div id = "toctitle" >
< h2 >
Contents
< / h2 >
< / div >
< ul >
< li class = "toclevel-1 tocsection-1" >
< a href = "#Background" > < span class = "tocnumber" > 1< / span > < span class = "toctext" > Background< / span > < / a >
< / li >
< li class = "toclevel-1 tocsection-2" >
< a href = "#Process" > < span class = "tocnumber" > 2< / span > < span class = "toctext" > Process< / span > < / a >
< / li >
< / ul >
< / div >
< p >
XOWA can download the metadata for the Wikimedia wikis
< / p >
< h2 >
< span class = "mw-headline" id = "Background" > Background< / span >
< / h2 >
< p >
Wikimedia exposes an API for accessing the meta-data for a given wiki. For example, for English Wikipedia, the following will return most of the meta-data around the wiki installation.
< / p >
< pre class = 'code' >
https://en.wikipedia.org/w/api.php?action=query& meta=siteinfo& siprop=general|namespaces|statistics|interwikimap|namespacealiases|specialpagealiases|libraries|extensions|skins|magicwords|functionhooks|showhooks|extensiontags|protocols|defaultoptions|languages
< / pre >
< p >
XOWA can call this API to download metadata for each wiki and save them in a database for data-processing. XOWA uses this info to resolve namespaces, but it will also incorporate other metadata from this API in future releases.
< / p >
< h2 >
< span class = "mw-headline" id = "Process" > Process< / span >
< / h2 >
< p >
Assuming you are on a Windows system with XOWA installed at < code > C:\xowa< / code >
< / p >
< ul >
< li >
Create a plain text-file called "C:\xowa\build_site_meta.gfs"
< / li >
< li >
Save the following text to the file:
< / li >
< / ul >
< pre class = 'code' >
app.bldr.pause_at_end_('n');
app.scripts.run_file_by_type('xowa_cfg_app');
app.bldr.cmds {
// NOTE: wiki doesn't matter; just use any wiki name that is on your system
add('simple.wikipedia.org', 'util.site_meta') {
// path of the database to generate; default is C:\xowa\bin\any\xowa\cfg\wiki\site_meta.sqlite3
db_url = 'C:\xowa\site_meta__enwiki.sqlite3';
// skip any wikis which have been downloaded after this time. default is now() - 1 day
// the purpose of this argument is to avoid recalling the api if it's already been called recently.
// for example, if the script runs for 800 wikis and fails for 3 wikis,
// you can rerun the script again and it will only download the 3 failed ones; not all 800
cutoff_time = '2015-07-01';
// list of wikis to download; note that each wiki must be separated by a new-line. default is all wikis listed in [[Dashboard/Import/Online]]
wikis =
'en.wikipedia.org
en.wiktionary.org';
}
}
app.bldr.run;
< / pre >
< ul >
< li >
Run the file with the following:
< / li >
< / ul >
< pre class = 'code' >
java -jar xowa_windows.jar --app_mode cmd --cmd_file C:\xowa\build_site_meta.gfs
< / pre >
< ul >
< li >
Open C:\xowa\site_meta__enwiki.sqlite3 in a sqlite shell and run the following:
< / li >
< / ul >
< pre class = 'code' >
SELECT * FROM site_statistic;
< / pre >
< / div >
< / div >
< / div >
< div id = "mw-head" class = "noprint" >
< div id = "left-navigation" >
< div id = "p-namespaces" class = "vectorTabs" >
< h3 > Namespaces< / h3 >
< ul >
< li id = "ca-nstab-main" class = "selected" > < span > < a id = "ca-nstab-main-href" href = "index.html" > Page< / a > < / span > < / li >
< / ul >
< / div >
< / div >
< / div >
< div id = 'mw-panel' class = 'noprint' >
< div id = 'p-logo' >
< a style = "background-image: url(https://gnosygnu.github.io/xowa/xowa_logo.png);" href = "http://xowa.org/" title = "Visit the main page" > < / a >
< / div >
< div class = "portal" id = 'xowa-portal-home' >
< h3 > XOWA< / h3 >
< div class = "body" >
< ul >
< li > < a href = "http://xowa.org/index.html" title = 'Visit the main page' > Main page< / a > < / li >
< li > < a href = "http://xowa.org/screenshots.html" title = 'See screenshots of XOWA' > Screenshots< / a > < / li >
< li > < a href = "http://xowa.org/home/wiki/Help/Download_XOWA.html" title = 'Download the XOWA application' > Download XOWA< / a > < / li >
< li > < a href = "http://xowa.org/home/wiki/Dashboard/Image_databases.html" title = 'Download offline wikis and image databases' > Download wikis< / a > < / li >
< / ul >
< / div >
< / div >
< div class = "portal" id = 'xowa-portal-started' >
< h3 > Getting started< / h3 >
< div class = "body" >
< ul >
< li > < a href = "http://xowa.org/home/wiki/App/Setup/System_requirements.html" title = 'Get XOWA's system requirements' > Requirements< / a > < / li >
< li > < a href = "http://xowa.org/home/wiki/App/Setup/Installation.html" title = 'Get instructions for installing XOWA' > Installation< / a > < / li >
< li > < a href = "http://xowa.org/home/wiki/App/Import/Simple_Wikipedia.html" title = 'Learn how to set up Simple Wikipedia' > Simple Wikipedia< / a > < / li >
< li > < a href = "http://xowa.org/home/wiki/App/Import/English_Wikipedia.html" title = 'Learn how to set up English Wikipedia' > English Wikipedia< / a > < / li >
< li > < a href = "http://xowa.org/home/wiki/App/Import/Other_wikis.html" title = 'Learn how to set up other Wikipedias' > Other Wikipedias< / a > < / li >
< / ul >
< / div >
< / div >
< div class = "portal" id = 'xowa-portal-android' >
< h3 > Android< / h3 >
< div class = "body" >
< ul >
< li > < a href = "http://xowa.org/home/wiki/Android/Setup.html" title = 'Setup XOWA on your Android device' > Setup< / a > < / li >
< / ul >
< / div >
< / div >
< div class = "portal" id = 'xowa-portal-help' >
< h3 > Help< / h3 >
< div class = "body" >
< ul >
< li > < a href = "http://xowa.org/home/wiki/Help/About.html" title = 'Get more information about XOWA' > About< / a > < / li >
< li > < a href = "http://xowa.org/home/wiki/Help/Contents.html" title = 'View a list of help topics' > Contents< / a > < / li >
< li > < a href = "http://xowa.org/home/wiki/Help/Media.html" title = 'Read what others have written about XOWA' > Media< / a > < / li >
< li > < a href = "http://xowa.org/home/wiki/Help/Feedback.html" title = 'Questions? Comments? Leave feedback for XOWA' > Feedback< / a > < / li >
< / ul >
< / div >
< / div >
< div class = "portal" id = 'xowa-portal-blog' >
< h3 > Blog< / h3 >
< div class = "body" >
< ul >
< li > < a href = "http://xowa.org/home/wiki/Blog.html" title = 'Follow XOWA' ' s development process ' > Current< / a > < / li >
< / ul >
< / div >
< / div >
< div class = "portal" id = 'xowa-portal-links' >
< h3 > Links< / h3 >
< div class = "body" >
< ul >
< li > < a href = "http://dumps.wikimedia.org/backup-index.html" title = "Get wiki datababase dumps directly from Wikimedia" > Wikimedia dumps< / a > < / li >
< li > < a href = "https://archive.org/search.php?query=xowa" title = "Search archive.org for XOWA files" > XOWA @ archive.org< / a > < / li >
< li > < a href = "http://en.wikipedia.org" title = "Visit Wikipedia (and compare to XOWA!)" > English Wikipedia< / a > < / li >
< / ul >
< / div >
< / div >
< div class = "portal" id = 'xowa-portal-donate' >
< h3 > Donate< / h3 >
< div class = "body" >
< ul >
< li > < a href = "https://archive.org/donate/index.php" title = "Support archive.org!" > archive.org< / a > < / li > <!-- listed first due to recent fire damages: http://blog.archive.org/2013/11/06/scanning - center - fire - please - help - rebuild/ -->
< li > < a href = "https://donate.wikimedia.org/wiki/Special:FundraiserRedirector" title = "Support Wikipedia!" > Wikipedia< / a > < / li >
<!-- <li><a href="" title="Support XOWA! (but only after you've supported archive.org and Wikipedia)">XOWA</a></li> -->
< / ul >
< / div >
< / div >
2016-04-17 18:00:49 +00:00
2016-04-17 15:48:03 +00:00
< / div >
< / body >
< / html >