Javadoc comment is multiline comment that starts with * character and placed above class definition, interface definition, enum definition, method definition or field definition.
For example, here is java file:
/** * My <b>class</b>. * @see AbstractClass */ public class MyClass { }
* My <b>class</b>. * @see AbstractClass
Javadoc by specification could contain any HTML tags that to let user generate content he needs. Checkstyle can not parse something that looks like an HTML, so limitation appear. The comment should be written in XHTML to be correctly processed by Checkstyle. This means that every HTML tag should have matching closed HTML tag or it is self-closed one (singlton tag). The only exceptions are <p>, <li>, <tr>, <td>, <th>, <body>, <colgroup>, <dd>, <dt>, <head>, <html>, <option>, <tbody>, <thead>, <tfoot> and Checkstyle won't show error about missing closing tag, however, it leads to broken XHTML structure and, therefore, incorrect Abstract Syntax Tree of the Javadoc comment anyway. See examples at "HTML Code In Javadoc Comments" chapter.
Javadoc parser requires XHTML to be used in Javadoc comments, i.e. if there is some open tag(for example <div>) then there have to be its close tag </div>. This means that if Javadoc comment has incorrect XHTML structure then Javadoc Parser will fail processing the comment, therefore, your new Check can't get its parse tree and process anything from this Javadoc comment. For more details and examples go to "HTML code in Javadoc comments" section.
Javadoc grammar requires XHTML, but it can also parse some parts of HTML code (like some unclosed tags). However result tree will be unpredictable. It is done just to not fail on every Javadoc comment, because there are tons of using unclosed tags, etc.
To start implementing new Check create new class and extend AbstractJavadocCheck. It has two abstract methods you should implement:
In Javadoc comment every whitespace matters, so parse tree contains whitespace nodes (WS javadoc token type). So do CHAR javadoc token that presents single character. The only redundancy Javadoc tree has because of this is that TEXT node consists of CHAR and WS nodes which is useless, but it is implementation nuance. (In future we will try to resolve this).
Java grammar parses java file due to Java language specifications. So, there are singleline comments and multiline/block comments in it. Java compiler doesn't know about Javadoc because it is just a multiline comment. To parse multiline comment as a Javadoc comment, checkstyle has special Parser that is based on ANTLR Javadoc grammar. So, it's supposed to proccess block comments that start with Javadoc Identificator and parse them to Abstract Syntax Tree (AST).
The problem is that Java grammar is old one and uses ANTLR v2, while Javadoc grammar uses ANTLR v4. Because of that, these two grammars and their trees are not compatible. Java AST consists of DetailAST objects, while Javadoc AST consists of DetailNode objects.
Checkstyle can print Abstract Syntax Tree for Java and Javadoc trees. You need to run checkstyle jar file with -J argument, providing java file.
For example, here is MyClass.java file:
/** * My <b>class</b>. * @see AbstractClass */ public class MyClass { }
Command:
java -jar checkstyle-6.18-all.jar -J MyClass.java
Output:
CLASS_DEF -> CLASS_DEF [5:0] |--MODIFIERS -> MODIFIERS [5:0] | |--JAVADOC -> \r\n * My <b>class</b>.\r\n * @see AbstractClass\r\n <EOF> [1:0] | | |--NEWLINE -> \r\n [1:0] | | |--LEADING_ASTERISK -> * [2:0] | | |--TEXT -> My [2:2] | | | |--WS -> [2:2] | | | |--CHAR -> M [2:3] | | | |--CHAR -> y [2:4] | | | `--WS -> [2:5] | | |--HTML_ELEMENT -> <b>class</b> [2:6] | | | `--HTML_TAG -> <b>class</b> [2:6] | | | |--HTML_ELEMENT_OPEN -> <b> [2:6] | | | | |--OPEN -> < [2:6] | | | | |--HTML_TAG_NAME -> b [2:7] | | | | `--CLOSE -> > [2:8] | | | |--TEXT -> class [2:9] | | | | |--CHAR -> c [2:9] | | | | |--CHAR -> l [2:10] | | | | |--CHAR -> a [2:11] | | | | |--CHAR -> s [2:12] | | | | `--CHAR -> s [2:13] | | | `--HTML_ELEMENT_CLOSE -> </b> [2:14] | | | |--OPEN -> < [2:14] | | | |--SLASH -> / [2:15] | | | |--HTML_TAG_NAME -> b [2:16] | | | `--CLOSE -> > [2:17] | | |--TEXT -> . [2:18] | | | `--CHAR -> . [2:18] | | |--NEWLINE -> \r\n [2:19] | | |--LEADING_ASTERISK -> * [3:0] | | |--WS -> [3:2] | | |--JAVADOC_TAG -> @see AbstractClass\r\n [3:3] | | | |--SEE_LITERAL -> @see [3:3] | | | |--WS -> [3:7] | | | |--REFERENCE -> AbstractClass [3:8] | | | | `--CLASS -> AbstractClass [3:8] | | | |--NEWLINE -> \r\n [3:21] | | | `--WS -> [4:0] | | `--EOF -> <EOF> [4:1] | `--LITERAL_PUBLIC -> public [5:0] |--LITERAL_CLASS -> class [5:7] |--IDENT -> MyClass [5:13] `--OBJBLOCK -> OBJBLOCK [5:21] |--LCURLY -> { [5:21] `--RCURLY -> } [7:0]
As you see very small java file transforms to a huge Abstract Syntax Tree, because that is the most detailed tree including all components of the java file: classes, methods, comments, etc. But in most cases while developing Javadoc Check you need only parse tree of the exact Javadoc comment. To do that just copy Javadoc comment to separate file and remove /** at the begining and */ at the end. After that, run checkstyle with -j argument.
MyJavadocComment.javadoc file:
* My <b>class</b>. * @see AbstractClass
Command:
java -jar checkstyle-6.18-SNAPSHOT-all.jar -j MyJavadocComment.javadoc
Output:
JAVADOC -> * My <b>class</b>.\r\n * @see AbstractClass<EOF> [0:0] |--LEADING_ASTERISK -> * [0:0] |--TEXT -> My [0:2] | |--WS -> [0:2] | |--CHAR -> M [0:3] | |--CHAR -> y [0:4] | `--WS -> [0:5] |--HTML_ELEMENT -> <b>class</b> [0:6] | `--HTML_TAG -> <b>class</b> [0:6] | |--HTML_ELEMENT_OPEN -> <b> [0:6] | | |--OPEN -> < [0:6] | | |--HTML_TAG_NAME -> b [0:7] | | `--CLOSE -> > [0:8] | |--TEXT -> class [0:9] | | |--CHAR -> c [0:9] | | |--CHAR -> l [0:10] | | |--CHAR -> a [0:11] | | |--CHAR -> s [0:12] | | `--CHAR -> s [0:13] | `--HTML_ELEMENT_CLOSE -> </b> [0:14] | |--OPEN -> < [0:14] | |--SLASH -> / [0:15] | |--HTML_TAG_NAME -> b [0:16] | `--CLOSE -> > [0:17] |--TEXT -> . [0:18] | `--CHAR -> . [0:18] |--NEWLINE -> \r\n [0:19] |--LEADING_ASTERISK -> * [1:0] |--WS -> [1:2] |--JAVADOC_TAG -> @see AbstractClass [1:3] | |--SEE_LITERAL -> @see [1:3] | |--WS -> [1:7] | `--REFERENCE -> AbstractClass [1:8] | `--CLASS -> AbstractClass [1:8] `--EOF -> <EOF> [1:21]
For example, to write a JavadocCheck that verifies @param tags in Javadoc comment of a method definition, you also need all method's parameter names. To get method definition AST you should access main DetailAST tree throuth block comment AST. For this purpose use getBlockCommentAst() method that returns DetailAST node.
Example:
class MyCheck extends AbstractJavadocCheck { @Override public int[] getDefaultJavadocTokens() { return new int[]{JavadocTokenTypes.PARAMETER_NAME}; } @Override public void visitJavadocToken(DetailNode paramNameNode) { String javadocParamName = paramNameNode.getText(); DetailAST blockCommentAst = getBlockCommentAst(); if (BlockCommentPosition.isOnMethod(blockCommentAst)) { DetailAST methodDef = blockCommentAst.getParent(); DetailAST methodParam = findMethodParameter(methodDef); String methodParamName = methodParam.getText(); if (!javadocParamName.equals(methodParamName)) { log(methodParam, "params.dont.match"); } } } }
Examples:
1) Unclosed paragraph HTML tag. As you see in the tree, content of the paragraph tag is not nested to this tag. That is because HTML tags are not closed by pair tag </p>, and Checkstyle requires XHTML to predictably parse Javadoc comments. | 2) Here is correct version with open and closed HTML tags. |
<p> First <p> Second |
<p> First </p> <p> Second </p> |
JAVADOC -> <p> First\r\n<p> Second<EOF> [0:0] |--HTML_ELEMENT -> <p> [0:0] | `--P_TAG_OPEN -> <p> [0:0] | |--OPEN -> < [0:0] | |--P_HTML_TAG_NAME -> p [0:1] | `--CLOSE -> > [0:2] |--TEXT -> First [0:3] | |--WS -> [0:3] | |--CHAR -> F [0:4] | |--CHAR -> i [0:5] | |--CHAR -> r [0:6] | |--CHAR -> s [0:7] | `--CHAR -> t [0:8] |--NEWLINE -> \r\n [0:9] |--HTML_ELEMENT -> <p> [1:0] | `--P_TAG_OPEN -> <p> [1:0] | |--OPEN -> < [1:0] | |--P_HTML_TAG_NAME -> p [1:1] | `--CLOSE -> > [1:2] |--TEXT -> Second [1:3] | |--WS -> [1:3] | |--CHAR -> S [1:4] | |--CHAR -> e [1:5] | |--CHAR -> c [1:6] | |--CHAR -> o [1:7] | |--CHAR -> n [1:8] | `--CHAR -> d [1:9] `--EOF -> <EOF> [1:10] |
JAVADOC -> <p> First </p>\r\n<p> Second </p><EOF> [0:0] |--HTML_ELEMENT -> <p> First </p> [0:0] | `--PARAGRAPH -> <p> First </p> [0:0] | |--P_TAG_OPEN -> <p> [0:0] | | |--OPEN -> < [0:0] | | |--P_HTML_TAG_NAME -> p [0:1] | | `--CLOSE -> > [0:2] | |--TEXT -> First [0:3] | | |--WS -> [0:3] | | |--CHAR -> F [0:4] | | |--CHAR -> i [0:5] | | |--CHAR -> r [0:6] | | |--CHAR -> s [0:7] | | |--CHAR -> t [0:8] | | `--WS -> [0:9] | `--P_TAG_CLOSE -> </p> [0:10] | |--OPEN -> < [0:10] | |--SLASH -> / [0:11] | |--P_HTML_TAG_NAME -> p [0:12] | `--CLOSE -> > [0:13] |--NEWLINE -> \r\n [0:14] |--HTML_ELEMENT -> <p> Second </p> [1:0] | `--PARAGRAPH -> <p> Second </p> [1:0] | |--P_TAG_OPEN -> <p> [1:0] | | |--OPEN -> < [1:0] | | |--P_HTML_TAG_NAME -> p [1:1] | | `--CLOSE -> > [1:2] | |--TEXT -> Second [1:3] | | |--WS -> [1:3] | | |--CHAR -> S [1:4] | | |--CHAR -> e [1:5] | | |--CHAR -> c [1:6] | | |--CHAR -> o [1:7] | | |--CHAR -> n [1:8] | | |--CHAR -> d [1:9] | | `--WS -> [1:10] | `--P_TAG_CLOSE -> </p> [1:11] | |--OPEN -> < [1:11] | |--SLASH -> / [1:12] | |--P_HTML_TAG_NAME -> p [1:13] | `--CLOSE -> > [1:14] `--EOF -> <EOF> [1:15] |