A <ahref="https://en.wikipedia.org/wiki/PageRank"rel="nofollow"class="external text">PageRank</a> score is calculated for a page. This score is <ahref="https://en.wikipedia.org/wiki/Feature_scaling"rel="nofollow"class="external text">scaled</a> from 0 to 1,000,000
</li>
<li>
A page-length score is then calculated for the page. This score is <ahref="https://en.wikipedia.org/wiki/Percentile_rank"rel="nofollow"class="external text">ranked</a> from 0 to 1,000,000
</li>
<li>
The PageRank score is then multiplied by a ratio if it has a low page-length score
</li>
<li>
The resulting PageRank score is then ranked from 0 to 1,000,000.
A simplified definition of scaling is converting a number from one range to another range based on proportion. For a more thorough definition, see <ahref="https://en.wikipedia.org/wiki/Feature_scaling"rel="nofollow"class="external text">the Wikipedia page on feature scaling</a>
</p>
<p>
For example, let's say you have a score of 100 in a range of 0 to 400 and want to scale it to 0 to 1000. The following steps would be involved:
</p>
<ul>
<li>
Take 100 and divide it by 400. This yields .25
</li>
<li>
Take .25 and multiply it by 1000. This yields 250.
A simplified definition of ranking is assigning a number based on its order in a population of numbers. For those familiar with a school setting, this is "grading on a curve". For a more thorough definition, see <ahref="https://en.wikipedia.org/wiki/Percentile_rank"rel="nofollow"class="external text">the Wikipedia page on percentile ranks</a>
</p>
<p>
For example, let's say you have the following:
</p>
<ul>
<li>
A minimum score of 0
</li>
<li>
A maximum score of 100
</li>
<li>
5 entities with the following scores
<ul>
<li>
A : 99
</li>
<li>
B : 10
</li>
<li>
C : 42
</li>
<li>
D : 71
</li>
<li>
E : 56
</li>
</ul>
</li>
</ul>
<p>
Ranking would do the following:
</p>
<ul>
<li>
Sort the scores
<ul>
<li>
A : 99
</li>
<li>
D : 71
</li>
<li>
E : 56
</li>
<li>
C : 42
</li>
<li>
B : 10
</li>
</ul>
</li>
<li>
Calculate the "interval" for each score by taking the maximum and dividing by the number of scores.
<ul>
<li>
In this case, this would be 20: 100 / 5
</li>
</ul>
</li>
<li>
Assign each score a new score based on its proportionate position in the sort.
The basis of XOWA's page score is <ahref="https://en.wikipedia.org/wiki/PageRank"rel="nofollow"class="external text">PageRank</a>.
</p>
<p>
In brief, PageRank will give high scores to pages which are:
</p>
<ol>
<li>
linked to by many pages
</li>
<li>
linked to by pages which have a high score.
</li>
</ol>
<p>
Note that #2 is recursive (a page will have a high score only if it is linked to by many pages). For more info, a good starting point is <ahref="https://en.wikipedia.org/wiki/PageRank"rel="nofollow"class="external text">the Wikipedia page on PageRank</a>.
</p>
<p>
After XOWA calculates the PageRank, XOWA then scales this score in a range of 0 to 1,000,000
XOWA penalizes short pages. This reduces the effect of small stub pages which are linked to by many articles, but mostly from boilerplate navigation boxes.
</p>
<p>
XOWA ranks pages based on page-length. The generated score is in a range from 0 to 1,000,000
</p>
<p>
Currently the method is:
</p>
<ul>
<li>
If the page is in the bottom 60% of page lengths...
</li>
<li>
Then multiply the page score by that percentage.
</li>
</ul>
<p>
For example, a page that has a length in the bottom 10% and a score of 1000, will have a score of 100 (1000 * 10%). In contrast, a page that has a length in the top 65% with a score of 9,000 will still have a score of 9,000.
</p>
<p>
Note that this calculation is an ad-hoc creation and will probably change in the future.
<li><ahref="http://dumps.wikimedia.org/backup-index.html"title="Get wiki datababase dumps directly from Wikimedia">Wikimedia dumps</a></li>
<li><ahref="https://archive.org/search.php?query=xowa"title="Search archive.org for XOWA files">XOWA @ archive.org</a></li>
<li><ahref="http://en.wikipedia.org"title="Visit Wikipedia (and compare to XOWA!)">English Wikipedia</a></li>
</ul>
</div>
</div>
<divclass="portal"id='xowa-portal-donate'>
<h3>Donate</h3>
<divclass="body">
<ul>
<li><ahref="https://archive.org/donate/index.php"title="Support archive.org!">archive.org</a></li><!-- listed first due to recent fire damages: http://blog.archive.org/2013/11/06/scanning-center-fire-please-help-rebuild/ -->