Using Jtidy to parsing HTML



  • I am using Jtidy I pass an InputSteam and then I tried to validate whether or not the XHTML code in the InputSteam is valid.

    InputStream s = new ByteArrayInputStream(my_string).getBytes(StandardCharsets.UTF_8));
    
    Tidy tidy = new Tidy();
    tidy.setXHTML(true);
    
    tidy.parse(s, System.out);
    

    I would like to validate my test, true or false instead of printing this to the console: 5 warnings, no errors were found!

    If I get 1 error how do I make my JUnit reflect a fail?

    I noticed that no matter what my string looks like it gets parsed as valid always I tried this:

    String my_string = "TABLE width = 400 &&&& table height = ";
    

    That gets parsed as valid without any errors.



  • I had the same question/problem. This is how I solved it:

    String validHtml = "some valid html";
    InputStream inputStream = new ByteArrayInputStream(validHtml.getBytes(StandardCharsets.UTF_8));
    
    Tidy tidy = new Tidy();
    tidy.setXmlTags(true);
    
    tidy.parse(inputStream, null);
    
    System.out.println("errors "+tidy.getParseErrors());
    System.out.println("warnings "+tidy.getParseWarnings());
    assertTrue("HTML has errors and it should not.",tidy.getParseErrors()==0);
    

    The trick is in setXMLTags(true).


Log in to reply
 

Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2