App/Xtn/Mediawiki/Tidy/JTidy

From XOWA: the free, open-source, offline wiki application

Source

The jtidy_xowa.jar was built using the source at https://sourceforge.net/projects/jtidy/files/JTidy/r938/.

Its source is not currently included with XOWA. It is available at the following location: https://sourceforge.net/projects/xowa/files/support/jtidy/


Modifications

The jtidy_xowa.jar was created for the following reasons:

  • JTidy is not completely in sync with tidy:
JTidy appears to have been built off an earlier version of tidy. tidy has since made a number of bug fixes that are not in JTidy
  • JTidy has significant differences in translating tidy
JTidy is a very close translation of tidy, but deviates from tidy in a number of places.


jtidy_xowa changes

The following is only a partial list of JTidy changes. Multiple changes were made for v1.6.2.1 of XOWA to have JTidy be more "tidy-like". In addition, more changes will probably occur in the future to close the gap in source code between tidy and JTidy.

ParseBlock should handle exiled variable during element reparenting

  • purpose: <div> between <table> and <tr> not reparented correctly;
  • example: fa.wikinews.org/wiki/Main_Page -> invalid table layout
  • file: /jtidy-r938/src/main/java/org/w3c/tidy/ParserImpl.java
  • proc: ParseBlock.Parse
  • add:
                        else if ((node.tag.model & Dict.CM_TABLE) != 0 || (node.tag.model & Dict.CM_ROW) != 0)
                        {
                            // XOWA: DATE:2014-05-31
                            /* http://tidy.sf.net/issue/1316307 */
                            /* In exiled mode, return so table processing can 
                               continue. */
                            if (lexer.exiled)
                                return;


Do not trim empty block element if it has attributes

  • purpose: empty block elements should not be trimmed if they have attributes
  • example: ko.wikisource.org/wiki/Main_Page -> invalid table layout
  • file: /jtidy-r938/src/main/java/org/w3c/tidy/Lexer.java
  • proc: canPrune
  • add:
        // XOWA: added to match tidy; DATE:2014-05-31
        if ( ((element.tag.model & Dict.CM_BLOCK) != 0) && element.attributes != null)
            return false;

        if (element.tag == this.configuration.tt.tagA && element.attributes != null)


Do not convert empty <p> to <br>

  • purpose: commented code to convert empty <p> to <br> because it is not in tidy
  • example: none
  • file: /jtidy-r938/src/main/java/org/w3c/tidy/Node.java
  • proc: trimEmptyElement
  • code:
            // XOWA: DELETED: not in tidy, and don't really agree with intent; DATE:2014-05-31
//            else if (element.tag == tt.tagP && element.content == null)
//            {
//                // replace <p></p> by <br><br> to preserve formatting
//                Node node = lexer.inferredTag("br");
//                Node.coerceNode(lexer, element, tt.tagBr);
//                element.insertNodeAfterElement(node);
//            }


Do not add \n after <span> in <pre>

  • purpose: JTidy was incorrectly adding \n to all block elements inside pre
  • example: none
  • file: /jtidy-r938/src/main/java/org/w3c/tidy/PPrint.java
  • proc: printTag
  • code:
            if (indent + linelen < this.configuration.wraplen)
            {

                // wrap after start tag if is <br/> or if it's not inline
                // fix for [514348]
                if (!TidyUtils.toBoolean(mode & NOWRAP)
                    && (!TidyUtils.toBoolean(node.tag.model & Dict.CM_INLINE) || (node.tag == tt.tagBr))
                    && afterSpace(node))
                {
                    wraphere = linelen;
                }

            }
            // XOWA: DATE:2014-06-01
            /* flush the current buffer only if it is known to be safe,
            i.e. it will not introduce some spurious white spaces.
            See bug #996484 */
            else if     (       TidyUtils.toBoolean(mode & NOWRAP)
                        ||      node.tag == tt.tagBr
                        ||      afterSpace(node)
                        )
            {
                condFlushLine(fout, indent);
            }

Namespaces

XOWA

Getting started

Help

Blog

Donate