You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

375 lines
14 KiB

<!DOCTYPE html>
<html dir="ltr">
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
<title>App/Search/Score - XOWA</title>
<link rel="shortcut icon" href="" />
<link rel="stylesheet" href="" type="text/css">
<body class="mediawiki ltr sitedir-ltr ns-0 ns-subject skin-vector action-submit vector-animateLayout" spellcheck="false">
<div id="mw-page-base" class="noprint"></div>
<div id="mw-head-base" class="noprint"></div>
<div id="content" class="mw-body">
<h1 id="firstHeading" class="firstHeading"><span>App/Search/Score</span></h1>
<div id="bodyContent" class="mw-body-content">
<div id="siteSub">From XOWA: the free, open-source, offline wiki application</div>
<div id="contentSub"></div>
<div id="mw-content-text" lang="en" dir="ltr" class="mw-content-ltr">
XOWA calculates a score to every page for the purpose of ranking search results.
<div id="toc" class="toc">
<div id="toctitle" class="toctitle">
<li class="toclevel-1 tocsection-1">
<a href="#Overview"><span class="tocnumber">1</span> <span class="toctext">Overview</span></a>
<li class="toclevel-1 tocsection-2">
<a href="#Scaling_/_Ranking"><span class="tocnumber">2</span> <span class="toctext">Scaling / Ranking</span></a>
<li class="toclevel-2 tocsection-3">
<a href="#Scaling"><span class="tocnumber">2.1</span> <span class="toctext">Scaling</span></a>
<li class="toclevel-2 tocsection-4">
<a href="#Ranking"><span class="tocnumber">2.2</span> <span class="toctext">Ranking</span></a>
<li class="toclevel-1 tocsection-5">
<a href="#Calculation"><span class="tocnumber">3</span> <span class="toctext">Calculation</span></a>
<li class="toclevel-2 tocsection-6">
<a href="#PageRank"><span class="tocnumber">3.1</span> <span class="toctext">PageRank</span></a>
<li class="toclevel-2 tocsection-7">
<a href="#Short_pages_are_penalized"><span class="tocnumber">3.2</span> <span class="toctext">Short pages are penalized</span></a>
<li class="toclevel-2 tocsection-8">
<a href="#Scores_are_re-scaled"><span class="tocnumber">3.3</span> <span class="toctext">Scores are re-scaled</span></a>
<span class="mw-headline" id="Overview">Overview</span>
From a broad perspective, the following happens:
A <a href="" rel="nofollow" class="external text">PageRank</a> score is calculated for a page. This score is <a href="" rel="nofollow" class="external text">scaled</a> from 0 to 1,000,000
A page-length score is then calculated for the page. This score is <a href="" rel="nofollow" class="external text">ranked</a> from 0 to 1,000,000
The PageRank score is then multiplied by a ratio if it has a low page-length score
The resulting PageRank score is then ranked from 0 to 1,000,000.
<span class="mw-headline" id="Scaling_/_Ranking">Scaling / Ranking</span>
XOWA uses "scaling" and "ranking" at various stages to calculate the score.
<span class="mw-headline" id="Scaling">Scaling</span>
A simplified definition of scaling is converting a number from one range to another range based on proportion. For a more thorough definition, see <a href="" rel="nofollow" class="external text">the Wikipedia page on feature scaling</a>
For example, let's say you have a score of 100 in a range of 0 to 400 and want to scale it to 0 to 1000. The following steps would be involved:
Take 100 and divide it by 400. This yields .25
Take .25 and multiply it by 1000. This yields 250.
The following formula is the basis for scaling:
<span id='xowa_math_txt_0'>
newScore = \frac{oldScore - \text{min}(oldRange)} {\text{max}(oldRange)-\text{min}(oldRange)} \cdot (\text{max}(newRange)-\text{min}(newRange))
Or, to use the example from above:
<span id='xowa_math_txt_1'>
250 = \frac{100 - 0} {400-0} \cdot (1000-0)
<span class="mw-headline" id="Ranking">Ranking</span>
A simplified definition of ranking is assigning a number based on its order in a population of numbers. For those familiar with a school setting, this is "grading on a curve". For a more thorough definition, see <a href="" rel="nofollow" class="external text">the Wikipedia page on percentile ranks</a>
For example, let's say you have the following:
A minimum score of 0
A maximum score of 100
5 entities with the following scores
A : 99
B : 10
C : 42
D : 71
E : 56
Ranking would do the following:
Sort the scores
A : 99
D : 71
E : 56
C : 42
B : 10
Calculate the "interval" for each score by taking the maximum and dividing by the number of scores.
In this case, this would be 20: 100 / 5
Assign each score a new score based on its proportionate position in the sort.
A : 100
D : 80
E : 60
C : 40
B : 20
<span class="mw-headline" id="Calculation">Calculation</span>
<span class="mw-headline" id="PageRank">PageRank</span>
The basis of XOWA's page score is <a href="" rel="nofollow" class="external text">PageRank</a>.
In brief, PageRank will give high scores to pages which are:
linked to by many pages
linked to by pages which have a high score.
Note that #2 is recursive (a page will have a high score only if it is linked to by many pages). For more info, a good starting point is <a href="" rel="nofollow" class="external text">the Wikipedia page on PageRank</a>.
After XOWA calculates the PageRank, XOWA then scales this score in a range of 0 to 1,000,000
<span class="mw-headline" id="Short_pages_are_penalized">Short pages are penalized</span>
XOWA penalizes short pages. This reduces the effect of small stub pages which are linked to by many articles, but mostly from boilerplate navigation boxes.
XOWA ranks pages based on page-length. The generated score is in a range from 0 to 1,000,000
Currently the method is:
If the page is in the bottom 60% of page lengths...
Then multiply the page score by that percentage.
For example, a page that has a length in the bottom 10% and a score of 1000, will have a score of 100 (1000 * 10%). In contrast, a page that has a length in the top 65% with a score of 9,000 will still have a score of 9,000.
Note that this calculation is an ad-hoc creation and will probably change in the future.
<span class="mw-headline" id="Scores_are_re-scaled">Scores are re-scaled</span>
The final step is to take the modified score and rank it from 0 to 1,000,000. Note that this final score is an integer (not a decimal / float)
<div id="mw-head" class="noprint">
<div id="left-navigation">
<div id="p-namespaces" class="vectorTabs">
<li id="ca-nstab-main" class="selected"><span><a id="ca-nstab-main-href" href="index.html">Page</a></span></li>
<div id='mw-panel' class='noprint'>
<div id='p-logo'>
<a style="background-image: url(;" href="" title="Visit the main page"></a>
<div class="portal" id='xowa-portal-home'>
<div class="body">
<li><a href="" title='Visit the main page'>Main page</a></li>
<li><a href="" title='See screenshots of XOWA'>Screenshots</a></li>
<li><a href="" title="See a video of XOWA Desktop in action">Video</a></li>
<li><a href="" title='Download the XOWA application'>Download XOWA</a></li>
<li><a href="" title='Download offline wikis and image databases'>Download wikis</a></li>
<div class="portal" id='xowa-portal-started'>
<h3>Getting started</h3>
<div class="body">
<li><a href="" title='Get XOWA&apos;s system requirements'>Requirements</a></li>
<li><a href="" title='Get instructions for installing XOWA'>Installation</a></li>
<li><a href="" title='Learn how to set up Simple Wikipedia'>Simple Wikipedia</a></li>
<li><a href="" title='Learn how to set up English Wikipedia'>English Wikipedia</a></li>
<li><a href="" title='Learn how to set up other Wikipedias'>Other Wikipedias</a></li>
<div class="portal" id='xowa-portal-android'>
<div class="body">
<li><a href="" title='Setup XOWA on your Android device'>Setup</a></li>
<li><a href="" title="See a video of XOWA Android in action">Video</a></li>
<div class="portal" id='xowa-portal-help'>
<div class="body">
<li><a href="" title='Get more information about XOWA'>About</a></li>
<li><a href="" title='View a list of help topics'>Contents</a></li>
<li><a href="" title='Read what others have written about XOWA'>Media</a></li>
<li><a href="" title='Questions? Comments? Leave feedback for XOWA'>Feedback</a></li>
<div class="portal" id='xowa-portal-blog'>
<div class="body">
<li><a href="" title='Follow XOWA''s development process'>Current</a></li>
<div class="portal" id='xowa-portal-links'>
<div class="body">
<li><a href="" title="Get wiki datababase dumps directly from Wikimedia">Wikimedia dumps</a></li>
<li><a href="" title="Search for XOWA files">XOWA @</a></li>
<li><a href="" title="Visit Wikipedia (and compare to XOWA!)">English Wikipedia</a></li>
<div class="portal" id='xowa-portal-donate'>
<div class="body">
<li><a href="" title="Support!"></a></li><!-- listed first due to recent fire damages: -->
<li><a href="" title="Support Wikipedia!">Wikipedia</a></li>
<li><a href="" title="Support XOWA!">XOWA</a></li>