ManticMoo.COM All Articles Jeff's Articles
Jeffrey P. Bigham

Getting all text below a given node with StringBean

Jeffrey P. Bigham

Related Ads

StringBean is a helpful little class that implements that NodeVisitor interface and allows you to extract text from a web page using htmlparser. While the StringBean docs show how to extract text from a whole web page by either using the StringBean to initialize the page or by using an already initialized Parser object, they neglect to show how you could extract only portions of the text on a web page.

To extract text from only a portion of a web page, specifically all nodes that have a given node as an ancestor, you can supply the StringBean to the node's accept method. The following code demonstrates this usage:


StringBean sb = new StringBean();
CompositeTag ct = getHtmlTag();
ct.accept(sb);

That's all there is to it. Note: getHtmlTag() isn't a predefined method and is just used here for brevity. You can do this with any CompositeTag.

Jeffrey P. Bigham
ManticMoo.COM All Articles Jeff's Articles