2016-04-17 15:48:03 +00:00
<!DOCTYPE html>
< html dir = "ltr" >
< head >
< meta http-equiv = "content-type" content = "text/html;charset=UTF-8" / >
< title > App/Search/Score - XOWA< / title >
< link rel = "shortcut icon" href = "https://gnosygnu.github.io/xowa/xowa_logo.png" / >
< link rel = "stylesheet" href = "https://gnosygnu.github.io/xowa/xowa_common.css" type = "text/css" >
< / head >
< body class = "mediawiki ltr sitedir-ltr ns-0 ns-subject skin-vector action-submit vector-animateLayout" spellcheck = "false" >
< div id = "mw-page-base" class = "noprint" > < / div >
< div id = "mw-head-base" class = "noprint" > < / div >
< div id = "content" class = "mw-body" >
< h1 id = "firstHeading" class = "firstHeading" > < span > App/Search/Score< / span > < / h1 >
< div id = "bodyContent" class = "mw-body-content" >
< div id = "siteSub" > From XOWA: the free, open-source, offline wiki application< / div >
< div id = "contentSub" > < / div >
< div id = "mw-content-text" lang = "en" dir = "ltr" class = "mw-content-ltr" >
< p >
XOWA calculates a score to every page for the purpose of ranking search results.
< / p >
< div id = "toc" class = "toc" >
2017-06-26 01:14:55 +00:00
< div id = "toctitle" class = "toctitle" >
2016-04-17 15:48:03 +00:00
< h2 >
Contents
< / h2 >
< / div >
< ul >
< li class = "toclevel-1 tocsection-1" >
< a href = "#Overview" > < span class = "tocnumber" > 1< / span > < span class = "toctext" > Overview< / span > < / a >
< / li >
< li class = "toclevel-1 tocsection-2" >
2019-07-25 11:26:36 +00:00
< a href = "#Scaling_/_Ranking" > < span class = "tocnumber" > 2< / span > < span class = "toctext" > Scaling / Ranking< / span > < / a >
2016-04-17 15:48:03 +00:00
< ul >
< li class = "toclevel-2 tocsection-3" >
< a href = "#Scaling" > < span class = "tocnumber" > 2.1< / span > < span class = "toctext" > Scaling< / span > < / a >
< / li >
< li class = "toclevel-2 tocsection-4" >
< a href = "#Ranking" > < span class = "tocnumber" > 2.2< / span > < span class = "toctext" > Ranking< / span > < / a >
< / li >
< / ul >
< / li >
< li class = "toclevel-1 tocsection-5" >
< a href = "#Calculation" > < span class = "tocnumber" > 3< / span > < span class = "toctext" > Calculation< / span > < / a >
< ul >
< li class = "toclevel-2 tocsection-6" >
< a href = "#PageRank" > < span class = "tocnumber" > 3.1< / span > < span class = "toctext" > PageRank< / span > < / a >
< / li >
< li class = "toclevel-2 tocsection-7" >
< a href = "#Short_pages_are_penalized" > < span class = "tocnumber" > 3.2< / span > < span class = "toctext" > Short pages are penalized< / span > < / a >
< / li >
< li class = "toclevel-2 tocsection-8" >
< a href = "#Scores_are_re-scaled" > < span class = "tocnumber" > 3.3< / span > < span class = "toctext" > Scores are re-scaled< / span > < / a >
< / li >
< / ul >
< / li >
< / ul >
< / div >
< h2 >
2017-04-28 01:02:09 +00:00
< span class = "mw-headline" id = "Overview" > Overview< / span >
2016-04-17 15:48:03 +00:00
< / h2 >
< p >
From a broad perspective, the following happens:
< / p >
< ul >
< li >
2020-10-19 13:45:45 +00:00
A < a href = "https://en.wikipedia.org/wiki/PageRank" rel = "nofollow" class = "external text" > PageRank< / a > score is calculated for a page. This score is < a href = "https://en.wikipedia.org/wiki/Feature_scaling" rel = "nofollow" class = "external text" > scaled< / a > from 0 to 1,000,000
2016-04-17 15:48:03 +00:00
< / li >
< li >
2020-10-19 13:45:45 +00:00
A page-length score is then calculated for the page. This score is < a href = "https://en.wikipedia.org/wiki/Percentile_rank" rel = "nofollow" class = "external text" > ranked< / a > from 0 to 1,000,000
2016-04-17 15:48:03 +00:00
< / li >
< li >
The PageRank score is then multiplied by a ratio if it has a low page-length score
< / li >
< li >
The resulting PageRank score is then ranked from 0 to 1,000,000.
< / li >
< / ul >
< p >
< br >
< / p >
< h2 >
2019-07-25 11:26:36 +00:00
< span class = "mw-headline" id = "Scaling_/_Ranking" > Scaling / Ranking< / span >
2016-04-17 15:48:03 +00:00
< / h2 >
< p >
XOWA uses "scaling" and "ranking" at various stages to calculate the score.
< / p >
< h3 >
2017-04-28 01:02:09 +00:00
< span class = "mw-headline" id = "Scaling" > Scaling< / span >
2016-04-17 15:48:03 +00:00
< / h3 >
< p >
2020-10-19 13:45:45 +00:00
A simplified definition of scaling is converting a number from one range to another range based on proportion. For a more thorough definition, see < a href = "https://en.wikipedia.org/wiki/Feature_scaling" rel = "nofollow" class = "external text" > the Wikipedia page on feature scaling< / a >
2016-04-17 15:48:03 +00:00
< / p >
< p >
For example, let's say you have a score of 100 in a range of 0 to 400 and want to scale it to 0 to 1000. The following steps would be involved:
< / p >
< ul >
< li >
Take 100 and divide it by 400. This yields .25
< / li >
< li >
Take .25 and multiply it by 1000. This yields 250.
< / li >
< / ul >
< p >
The following formula is the basis for scaling:
< / p >
< p >
2016-12-05 02:03:42 +00:00
< span id = 'xowa_math_txt_0' >
newScore = \frac{oldScore - \text{min}(oldRange)} {\text{max}(oldRange)-\text{min}(oldRange)} \cdot (\text{max}(newRange)-\text{min}(newRange))
< / span >
2016-04-17 15:48:03 +00:00
< / p >
< p >
Or, to use the example from above:
< / p >
< p >
2019-03-11 00:44:35 +00:00
< span id = 'xowa_math_txt_1' >
2016-12-05 02:03:42 +00:00
250 = \frac{100 - 0} {400-0} \cdot (1000-0)
< / span >
2016-04-17 15:48:03 +00:00
< / p >
< h3 >
2017-04-28 01:02:09 +00:00
< span class = "mw-headline" id = "Ranking" > Ranking< / span >
2016-04-17 15:48:03 +00:00
< / h3 >
< p >
2020-10-19 13:45:45 +00:00
A simplified definition of ranking is assigning a number based on its order in a population of numbers. For those familiar with a school setting, this is "grading on a curve". For a more thorough definition, see < a href = "https://en.wikipedia.org/wiki/Percentile_rank" rel = "nofollow" class = "external text" > the Wikipedia page on percentile ranks< / a >
2016-04-17 15:48:03 +00:00
< / p >
< p >
For example, let's say you have the following:
< / p >
< ul >
< li >
A minimum score of 0
< / li >
< li >
A maximum score of 100
< / li >
< li >
5 entities with the following scores
< ul >
< li >
A : 99
< / li >
< li >
B : 10
< / li >
< li >
C : 42
< / li >
< li >
D : 71
< / li >
< li >
E : 56
< / li >
< / ul >
< / li >
< / ul >
< p >
Ranking would do the following:
< / p >
< ul >
< li >
Sort the scores
< ul >
< li >
A : 99
< / li >
< li >
D : 71
< / li >
< li >
E : 56
< / li >
< li >
C : 42
< / li >
< li >
B : 10
< / li >
< / ul >
< / li >
< li >
Calculate the "interval" for each score by taking the maximum and dividing by the number of scores.
< ul >
< li >
In this case, this would be 20: 100 / 5
< / li >
< / ul >
< / li >
< li >
Assign each score a new score based on its proportionate position in the sort.
< ul >
< li >
A : 100
< / li >
< li >
D : 80
< / li >
< li >
E : 60
< / li >
< li >
C : 40
< / li >
< li >
B : 20
< / li >
< / ul >
< / li >
< / ul >
< p >
< br >
< / p >
< h2 >
2017-04-28 01:02:09 +00:00
< span class = "mw-headline" id = "Calculation" > Calculation< / span >
2016-04-17 15:48:03 +00:00
< / h2 >
< h3 >
2017-04-28 01:02:09 +00:00
< span class = "mw-headline" id = "PageRank" > PageRank< / span >
2016-04-17 15:48:03 +00:00
< / h3 >
< p >
2020-10-19 13:45:45 +00:00
The basis of XOWA's page score is < a href = "https://en.wikipedia.org/wiki/PageRank" rel = "nofollow" class = "external text" > PageRank< / a > .
2016-04-17 15:48:03 +00:00
< / p >
< p >
In brief, PageRank will give high scores to pages which are:
< / p >
< ol >
< li >
linked to by many pages
< / li >
< li >
linked to by pages which have a high score.
< / li >
< / ol >
< p >
2020-10-19 13:45:45 +00:00
Note that #2 is recursive (a page will have a high score only if it is linked to by many pages). For more info, a good starting point is < a href = "https://en.wikipedia.org/wiki/PageRank" rel = "nofollow" class = "external text" > the Wikipedia page on PageRank< / a > .
2016-04-17 15:48:03 +00:00
< / p >
< p >
After XOWA calculates the PageRank, XOWA then scales this score in a range of 0 to 1,000,000
< / p >
< h3 >
2017-04-28 01:02:09 +00:00
< span class = "mw-headline" id = "Short_pages_are_penalized" > Short pages are penalized< / span >
2016-04-17 15:48:03 +00:00
< / h3 >
< p >
XOWA penalizes short pages. This reduces the effect of small stub pages which are linked to by many articles, but mostly from boilerplate navigation boxes.
< / p >
< p >
XOWA ranks pages based on page-length. The generated score is in a range from 0 to 1,000,000
< / p >
< p >
Currently the method is:
< / p >
< ul >
< li >
If the page is in the bottom 60% of page lengths...
< / li >
< li >
Then multiply the page score by that percentage.
< / li >
< / ul >
< p >
For example, a page that has a length in the bottom 10% and a score of 1000, will have a score of 100 (1000 * 10%). In contrast, a page that has a length in the top 65% with a score of 9,000 will still have a score of 9,000.
< / p >
< p >
Note that this calculation is an ad-hoc creation and will probably change in the future.
< / p >
< h3 >
2017-04-28 01:02:09 +00:00
< span class = "mw-headline" id = "Scores_are_re-scaled" > Scores are re-scaled< / span >
2016-04-17 15:48:03 +00:00
< / h3 >
< p >
The final step is to take the modified score and rank it from 0 to 1,000,000. Note that this final score is an integer (not a decimal / float)
< / p >
2016-09-12 01:53:06 +00:00
2016-04-17 15:48:03 +00:00
< / div >
< / div >
< / div >
< div id = "mw-head" class = "noprint" >
< div id = "left-navigation" >
< div id = "p-namespaces" class = "vectorTabs" >
< h3 > Namespaces< / h3 >
< ul >
< li id = "ca-nstab-main" class = "selected" > < span > < a id = "ca-nstab-main-href" href = "index.html" > Page< / a > < / span > < / li >
< / ul >
< / div >
< / div >
< / div >
< div id = 'mw-panel' class = 'noprint' >
< div id = 'p-logo' >
< a style = "background-image: url(https://gnosygnu.github.io/xowa/xowa_logo.png);" href = "http://xowa.org/" title = "Visit the main page" > < / a >
< / div >
< div class = "portal" id = 'xowa-portal-home' >
< h3 > XOWA< / h3 >
< div class = "body" >
< ul >
< li > < a href = "http://xowa.org/index.html" title = 'Visit the main page' > Main page< / a > < / li >
< li > < a href = "http://xowa.org/screenshots.html" title = 'See screenshots of XOWA' > Screenshots< / a > < / li >
2016-06-26 06:10:12 +00:00
< li > < a href = "https://www.youtube.com/watch?v=q0qbXYXEH6M" title = "See a video of XOWA Desktop in action" > Video< / a > < / li >
2016-04-17 15:48:03 +00:00
< li > < a href = "http://xowa.org/home/wiki/Help/Download_XOWA.html" title = 'Download the XOWA application' > Download XOWA< / a > < / li >
< li > < a href = "http://xowa.org/home/wiki/Dashboard/Image_databases.html" title = 'Download offline wikis and image databases' > Download wikis< / a > < / li >
< / ul >
< / div >
< / div >
< div class = "portal" id = 'xowa-portal-started' >
< h3 > Getting started< / h3 >
< div class = "body" >
< ul >
< li > < a href = "http://xowa.org/home/wiki/App/Setup/System_requirements.html" title = 'Get XOWA's system requirements' > Requirements< / a > < / li >
< li > < a href = "http://xowa.org/home/wiki/App/Setup/Installation.html" title = 'Get instructions for installing XOWA' > Installation< / a > < / li >
< li > < a href = "http://xowa.org/home/wiki/App/Import/Simple_Wikipedia.html" title = 'Learn how to set up Simple Wikipedia' > Simple Wikipedia< / a > < / li >
< li > < a href = "http://xowa.org/home/wiki/App/Import/English_Wikipedia.html" title = 'Learn how to set up English Wikipedia' > English Wikipedia< / a > < / li >
< li > < a href = "http://xowa.org/home/wiki/App/Import/Other_wikis.html" title = 'Learn how to set up other Wikipedias' > Other Wikipedias< / a > < / li >
< / ul >
< / div >
< / div >
< div class = "portal" id = 'xowa-portal-android' >
< h3 > Android< / h3 >
< div class = "body" >
< ul >
< li > < a href = "http://xowa.org/home/wiki/Android/Setup.html" title = 'Setup XOWA on your Android device' > Setup< / a > < / li >
2016-06-26 06:10:12 +00:00
< li > < a href = "https://www.youtube.com/watch?v=jsMTBxGweUw" title = "See a video of XOWA Android in action" > Video< / a > < / li >
2016-04-17 15:48:03 +00:00
< / ul >
< / div >
< / div >
< div class = "portal" id = 'xowa-portal-help' >
< h3 > Help< / h3 >
< div class = "body" >
< ul >
< li > < a href = "http://xowa.org/home/wiki/Help/About.html" title = 'Get more information about XOWA' > About< / a > < / li >
< li > < a href = "http://xowa.org/home/wiki/Help/Contents.html" title = 'View a list of help topics' > Contents< / a > < / li >
< li > < a href = "http://xowa.org/home/wiki/Help/Media.html" title = 'Read what others have written about XOWA' > Media< / a > < / li >
< li > < a href = "http://xowa.org/home/wiki/Help/Feedback.html" title = 'Questions? Comments? Leave feedback for XOWA' > Feedback< / a > < / li >
< / ul >
< / div >
< / div >
< div class = "portal" id = 'xowa-portal-blog' >
< h3 > Blog< / h3 >
< div class = "body" >
< ul >
< li > < a href = "http://xowa.org/home/wiki/Blog.html" title = 'Follow XOWA' ' s development process ' > Current< / a > < / li >
< / ul >
< / div >
< / div >
< div class = "portal" id = 'xowa-portal-links' >
< h3 > Links< / h3 >
< div class = "body" >
< ul >
< li > < a href = "http://dumps.wikimedia.org/backup-index.html" title = "Get wiki datababase dumps directly from Wikimedia" > Wikimedia dumps< / a > < / li >
< li > < a href = "https://archive.org/search.php?query=xowa" title = "Search archive.org for XOWA files" > XOWA @ archive.org< / a > < / li >
< li > < a href = "http://en.wikipedia.org" title = "Visit Wikipedia (and compare to XOWA!)" > English Wikipedia< / a > < / li >
< / ul >
< / div >
< / div >
< div class = "portal" id = 'xowa-portal-donate' >
< h3 > Donate< / h3 >
< div class = "body" >
< ul >
< li > < a href = "https://archive.org/donate/index.php" title = "Support archive.org!" > archive.org< / a > < / li > <!-- listed first due to recent fire damages: http://blog.archive.org/2013/11/06/scanning - center - fire - please - help - rebuild/ -->
< li > < a href = "https://donate.wikimedia.org/wiki/Special:FundraiserRedirector" title = "Support Wikipedia!" > Wikipedia< / a > < / li >
2017-02-22 02:46:24 +00:00
< li > < a href = "http://xowa.org/home/wiki/Help/Donate.html" title = "Support XOWA!" > XOWA< / a > < / li >
2016-04-17 15:48:03 +00:00
< / ul >
< / div >
< / div >
2016-04-17 18:00:49 +00:00
2016-04-17 15:48:03 +00:00
< / div >
< / body >
< / html >