|
Getting all text below a given node with StringBeanJeffrey P. Bigham |
|
Related ArticlesRelated Ads |
StringBean is a helpful little class that implements that NodeVisitor interface and allows you to extract text from a web page using htmlparser. While the StringBean docs show how to extract text from a whole web page by either using the StringBean to initialize the page or by using an already initialized Parser object, they neglect to show how you could extract only portions of the text on a web page. To extract text from only a portion of a web page, specifically all nodes that have a given node as an ancestor, you can supply the StringBean to the node's accept method. The following code demonstrates this usage:
That's all there is to it. Note: getHtmlTag() isn't a predefined method and is just used here for brevity. You can do this with any CompositeTag. |
|
|
||