Class StringExtractor
java.lang.Object
org.htmlparser.parserapplications.StringExtractor
Extract plaintext strings from a web page.
Illustrative program to gather the textual contents of a web page.
Uses a
StringBean to accumulate
the user visible text (what a browser would display) into a single string.-
Constructor Summary
ConstructorsConstructorDescriptionStringExtractor(String resource) Construct a StringExtractor to read from the given resource. -
Method Summary
Modifier and TypeMethodDescriptionextractStrings(boolean links) Extract the text from a page.static voidMainline.
-
Constructor Details
-
StringExtractor
Construct a StringExtractor to read from the given resource.- Parameters:
resource- Either a URL or a file name.
-
-
Method Details
-
extractStrings
Extract the text from a page.- Parameters:
links- iftrueinclude hyperlinks in output.- Returns:
- The textual contents of the page.
- Throws:
ParserException- If a parse error occurs.
-
main
Mainline.- Parameters:
args- The command line arguments.
-