You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
gnosygnu_xowa/home/wiki/App/Full-text_search/Lucene/Search_indexes/Building.html

187 lines
9.2 KiB

<!DOCTYPE html>
<html dir="ltr">
<head>
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
<title>App/Full-text search/Lucene/Search indexes/Building - XOWA</title>
<link rel="shortcut icon" href="https://gnosygnu.github.io/xowa/xowa_logo.png" />
<link rel="stylesheet" href="https://gnosygnu.github.io/xowa/xowa_common.css" type="text/css">
</head>
<body class="mediawiki ltr sitedir-ltr ns-0 ns-subject skin-vector action-submit vector-animateLayout" spellcheck="false">
<div id="mw-page-base" class="noprint"></div>
<div id="mw-head-base" class="noprint"></div>
<div id="content" class="mw-body">
<h1 id="firstHeading" class="firstHeading"><span>App/Full-text search/Lucene/Search indexes/Building</span></h1>
<div id="bodyContent" class="mw-body-content">
<div id="siteSub">From XOWA: the free, open-source, offline wiki application</div>
<div id="contentSub"></div>
<div id="mw-content-text" lang="en" dir="ltr" class="mw-content-ltr">
<p>
XOWA can generate full-text search indexes for existing Download Central wikis
</p>
<h2>
<span class="mw-headline" id="Purpose">Purpose</span>
</h2>
<p>
There are two reasons why one would want to build their own search index:
</p>
<ul>
<li>
<b>Old Download Central wiki</b>: Wikis built before 2017-04 will not have search indexes. Rather than wait for a new version, or download a new one, you can build one for the existing wiki
</li>
<li>
<b>Building a custom index</b>: Download Central imposes 2 restrictions to keep the disk usage low for search indexes: You may want to build a custom index in order to work around the following two limitations:
<ul>
<li>
<b>Main namespace only</b>: The Project, Portal, Category and other namespaces are not indexed. For example, some Wikisources have an Author and an Index index.
</li>
<li>
<b>Proximity queries are not supported</b>: Lucene supports proximity queries such as <code>"word1 word2"~8</code> which means find pages where word1 and word2 are within 8 words of each other. This support can be added, but it uses significantly more space. For English Wikipedia, the index size can go from 9 GB to 40 GB.
</li>
</ul>
</li>
</ul>
<h2>
<span class="mw-headline" id="Requirements">Requirements</span>
</h2>
<p>
Indexes can only be built for wikis downloaded from Download Central.
</p>
<p>
If your wiki isn't yet on Download Central, please contact me through <a href="/wiki/Help/Feedback" id="xolnki_2" title="Help/Feedback">Help/Feedback</a> and I'll add your wiki to the queue.
</p>
<h2>
<span class="mw-headline" id="Process">Process</span>
</h2>
<ul>
<li>
Go to <a href="/wiki/Special:XowaSearchBuilder" id="xolnki_3" title="Special:XowaSearchBuilder">Special:XowaSearchBuilder</a>
</li>
<li>
Choose the domain for your wiki. For example, <code>en.wiktionary.org</code>
</li>
<li>
Choose namespaces for your wiki. For example, <code>0,4,14</code>. For more info, see <a href="https://en.wikipedia.org/wiki/Wikipedia:Namespace" rel="nofollow" class="external free">https://en.wikipedia.org/wiki/Wikipedia:Namespace</a>
</li>
<li>
Choose index options.
<ul>
<li>
<b>Documents</b>: This index uses the least amount of space. However, it's not as accurate as "Documents / Frequencies"
</li>
<li>
<b>Documents / Frequencies</b>: This is the default index option used for all of XOWA wikis. It is slightly more accurate, as it tracks the number of words page. For example, if you're searching for "earth" and Page1 has "earth" 1 time, Page2 has "earth" 10 times, and Page3 has "earth" 20 times, then "Documents / Frequencies" returns the pages in the following order: Page3, Page2, Page1. "Documents" would list them in a random order.
</li>
<li>
<b>Documents / Frequencies / Positions</b>: This index option allows proximity queries such as <code>"word1 word2"~8</code>. However, it can use 4 to 5 times as much space
</li>
<li>
<b>Documents / Frequencies / Positions / Offsets</b>: This index option is primarily used for Lucene highlighting. XOWA uses its own highlighter in order to save space. At the moment, there's no reason to choose this option, but it may be useful to some power Lucene users.
</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
<div id="mw-head" class="noprint">
<div id="left-navigation">
<div id="p-namespaces" class="vectorTabs">
<h3>Namespaces</h3>
<ul>
<li id="ca-nstab-main" class="selected"><span><a id="ca-nstab-main-href" href="index.html">Page</a></span></li>
</ul>
</div>
</div>
</div>
<div id='mw-panel' class='noprint'>
<div id='p-logo'>
<a style="background-image: url(https://gnosygnu.github.io/xowa/xowa_logo.png);" href="http://xowa.org/" title="Visit the main page"></a>
</div>
<div class="portal" id='xowa-portal-home'>
<h3>XOWA</h3>
<div class="body">
<ul>
<li><a href="http://xowa.org/index.html" title='Visit the main page'>Main page</a></li>
<li><a href="http://xowa.org/screenshots.html" title='See screenshots of XOWA'>Screenshots</a></li>
<li><a href="https://www.youtube.com/watch?v=q0qbXYXEH6M" title="See a video of XOWA Desktop in action">Video</a></li>
<li><a href="http://xowa.org/home/wiki/Help/Download_XOWA.html" title='Download the XOWA application'>Download XOWA</a></li>
<li><a href="http://xowa.org/home/wiki/Dashboard/Image_databases.html" title='Download offline wikis and image databases'>Download wikis</a></li>
</ul>
</div>
</div>
<div class="portal" id='xowa-portal-started'>
<h3>Getting started</h3>
<div class="body">
<ul>
<li><a href="http://xowa.org/home/wiki/App/Setup/System_requirements.html" title='Get XOWA&apos;s system requirements'>Requirements</a></li>
<li><a href="http://xowa.org/home/wiki/App/Setup/Installation.html" title='Get instructions for installing XOWA'>Installation</a></li>
<li><a href="http://xowa.org/home/wiki/App/Import/Simple_Wikipedia.html" title='Learn how to set up Simple Wikipedia'>Simple Wikipedia</a></li>
<li><a href="http://xowa.org/home/wiki/App/Import/English_Wikipedia.html" title='Learn how to set up English Wikipedia'>English Wikipedia</a></li>
<li><a href="http://xowa.org/home/wiki/App/Import/Other_wikis.html" title='Learn how to set up other Wikipedias'>Other Wikipedias</a></li>
</ul>
</div>
</div>
<div class="portal" id='xowa-portal-android'>
<h3>Android</h3>
<div class="body">
<ul>
<li><a href="http://xowa.org/home/wiki/Android/Setup.html" title='Setup XOWA on your Android device'>Setup</a></li>
<li><a href="https://www.youtube.com/watch?v=jsMTBxGweUw" title="See a video of XOWA Android in action">Video</a></li>
</ul>
</div>
</div>
<div class="portal" id='xowa-portal-help'>
<h3>Help</h3>
<div class="body">
<ul>
<li><a href="http://xowa.org/home/wiki/Help/About.html" title='Get more information about XOWA'>About</a></li>
<li><a href="http://xowa.org/home/wiki/Help/Contents.html" title='View a list of help topics'>Contents</a></li>
<li><a href="http://xowa.org/home/wiki/Help/Media.html" title='Read what others have written about XOWA'>Media</a></li>
<li><a href="http://xowa.org/home/wiki/Help/Feedback.html" title='Questions? Comments? Leave feedback for XOWA'>Feedback</a></li>
</ul>
</div>
</div>
<div class="portal" id='xowa-portal-blog'>
<h3>Blog</h3>
<div class="body">
<ul>
<li><a href="http://xowa.org/home/wiki/Blog.html" title='Follow XOWA''s development process'>Current</a></li>
</ul>
</div>
</div>
<div class="portal" id='xowa-portal-links'>
<h3>Links</h3>
<div class="body">
<ul>
<li><a href="http://dumps.wikimedia.org/backup-index.html" title="Get wiki datababase dumps directly from Wikimedia">Wikimedia dumps</a></li>
<li><a href="https://archive.org/search.php?query=xowa" title="Search archive.org for XOWA files">XOWA @ archive.org</a></li>
<li><a href="http://en.wikipedia.org" title="Visit Wikipedia (and compare to XOWA!)">English Wikipedia</a></li>
</ul>
</div>
</div>
<div class="portal" id='xowa-portal-donate'>
<h3>Donate</h3>
<div class="body">
<ul>
<li><a href="https://archive.org/donate/index.php" title="Support archive.org!">archive.org</a></li><!-- listed first due to recent fire damages: http://blog.archive.org/2013/11/06/scanning-center-fire-please-help-rebuild/ -->
<li><a href="https://donate.wikimedia.org/wiki/Special:FundraiserRedirector" title="Support Wikipedia!">Wikipedia</a></li>
<li><a href="http://xowa.org/home/wiki/Help/Donate.html" title="Support XOWA!">XOWA</a></li>
</ul>
</div>
</div>
</div>
</body>
</html>