App/Xtn/Mediawiki/Tidy/JTidy
From XOWA: the free, open-source, offline wiki application
Contents
Source
The jtidy_xowa.jar was built using the source at https://sourceforge.net/projects/jtidy/files/JTidy/r938/.
Its source is not currently included with XOWA. It is available at the following location: https://sourceforge.net/projects/xowa/files/support/jtidy/
Modifications
The jtidy_xowa.jar was created for the following reasons:
- JTidy is not completely in sync with tidy:
- JTidy appears to have been built off an earlier version of tidy. tidy has since made a number of bug fixes that are not in JTidy
- JTidy has significant differences in translating tidy
- JTidy is a very close translation of tidy, but deviates from tidy in a number of places.
jtidy_xowa changes
The following is only a partial list of JTidy changes. Multiple changes were made for v1.6.2.1 of XOWA to have JTidy be more "tidy-like". In addition, more changes will probably occur in the future to close the gap in source code between tidy and JTidy.
ParseBlock should handle exiled variable during element reparenting
- purpose: <div> between <table> and <tr> not reparented correctly;
- example: fa.wikinews.org/wiki/Main_Page -> invalid table layout
- file: /jtidy-r938/src/main/java/org/w3c/tidy/ParserImpl.java
- proc: ParseBlock.Parse
- add:
else if ((node.tag.model & Dict.CM_TABLE) != 0 || (node.tag.model & Dict.CM_ROW) != 0) { // XOWA: DATE:2014-05-31 /* http://tidy.sf.net/issue/1316307 */ /* In exiled mode, return so table processing can continue. */ if (lexer.exiled) return;
Do not trim empty block element if it has attributes
- purpose: empty block elements should not be trimmed if they have attributes
- example: ko.wikisource.org/wiki/Main_Page -> invalid table layout
- file: /jtidy-r938/src/main/java/org/w3c/tidy/Lexer.java
- proc: canPrune
- add:
// XOWA: added to match tidy; DATE:2014-05-31 if ( ((element.tag.model & Dict.CM_BLOCK) != 0) && element.attributes != null) return false; if (element.tag == this.configuration.tt.tagA && element.attributes != null)
Do not convert empty <p> to <br>
- purpose: commented code to convert empty <p> to <br> because it is not in tidy
- example: none
- file: /jtidy-r938/src/main/java/org/w3c/tidy/Node.java
- proc: trimEmptyElement
- code:
// XOWA: DELETED: not in tidy, and don't really agree with intent; DATE:2014-05-31 // else if (element.tag == tt.tagP && element.content == null) // { // // replace <p></p> by <br><br> to preserve formatting // Node node = lexer.inferredTag("br"); // Node.coerceNode(lexer, element, tt.tagBr); // element.insertNodeAfterElement(node); // }
Do not add \n after <span> in <pre>
- purpose: JTidy was incorrectly adding \n to all block elements inside pre
- example: none
- file: /jtidy-r938/src/main/java/org/w3c/tidy/PPrint.java
- proc: printTag
- code:
if (indent + linelen < this.configuration.wraplen) { // wrap after start tag if is <br/> or if it's not inline // fix for [514348] if (!TidyUtils.toBoolean(mode & NOWRAP) && (!TidyUtils.toBoolean(node.tag.model & Dict.CM_INLINE) || (node.tag == tt.tagBr)) && afterSpace(node)) { wraphere = linelen; } } // XOWA: DATE:2014-06-01 /* flush the current buffer only if it is known to be safe, i.e. it will not introduce some spurious white spaces. See bug #996484 */ else if ( TidyUtils.toBoolean(mode & NOWRAP) || node.tag == tt.tagBr || afterSpace(node) ) { condFlushLine(fout, indent); }