Package org.htmlparser.beans
Class FilterBean
java.lang.Object
org.htmlparser.beans.FilterBean
- All Implemented Interfaces:
Serializable
Extract nodes from a URL using a filter.
FilterBean fb = new FilterBean ("http://cbc.ca");
fb.setFilters (new NodeFilter[] { new TagNameFilter ("META") });
fb.setURL ("http://cbc.ca");
System.out.println (fb.getNodes ().toHtml ());
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected NodeFilter[]The filter set.protected NodeListThe nodes extracted from the URL.protected ParserThe parser used to filter.protected PropertyChangeSupportBound property support.protected booleanThe recursion behaviour for elements of the filter array.static final StringProperty name in event where the connection changes.static final StringProperty name in event where the URL contents changes.static final StringProperty name in event where the URL contents changes.static final StringProperty name in event where the URL changes. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidAdd a PropertyChangeListener to the listener list.protected NodeListApply each of the filters.Get the current connection.Get the current filter set.getNodes()Return the nodes of the URL matching the filter.Get the parser used to fetch nodes.booleanGet the current recursion behaviour.getText()Convenience method to apply aStringBeanto the filter results.getURL()Get the current URL.static voidUnit test.voidRemove a PropertyChangeListener from the listener list.voidsetConnection(URLConnection connection) Set the parser's connection.voidsetFilters(NodeFilter[] filters) Set the filters for the bean.protected voidsetNodes()Fetch the URL contents and filter it.voidSet the parser for the bean.voidsetRecursive(boolean recursive) Set the recursion behaviour.voidSet the URL to extract strings from.protected voidupdateNodes(NodeList nodes) Assign theNodesproperty, firing the property change.
-
Field Details
-
PROP_NODES_PROPERTY
Property name in event where the URL contents changes.- See Also:
-
PROP_TEXT_PROPERTY
Property name in event where the URL contents changes.- See Also:
-
PROP_URL_PROPERTY
Property name in event where the URL changes.- See Also:
-
PROP_CONNECTION_PROPERTY
Property name in event where the connection changes.- See Also:
-
mPropertySupport
Bound property support. -
mParser
The parser used to filter. -
mFilters
The filter set. -
mNodes
The nodes extracted from the URL. -
mRecursive
protected boolean mRecursiveThe recursion behaviour for elements of the filter array. Iftruethe filters are applied recursively.
-
-
Constructor Details
-
FilterBean
public FilterBean()Create a FilterBean object.
-
-
Method Details
-
updateNodes
Assign theNodesproperty, firing the property change.- Parameters:
nodes- The new value of theNodesproperty.
-
applyFilters
Apply each of the filters. The first filter is applied to the output of the parser. Subsequent filters are applied to the output of the prior filter.- Returns:
- A list of nodes passed through all filters. If there are no filters, returns the entire page.
- Throws:
ParserException- If an encoding change occurs or there is some other problem.
-
setNodes
protected void setNodes()Fetch the URL contents and filter it. Only do work if there is a valid parser with it's URL set. -
addPropertyChangeListener
Add a PropertyChangeListener to the listener list. The listener is registered for all properties.- Parameters:
listener- The PropertyChangeListener to be added.
-
removePropertyChangeListener
Remove a PropertyChangeListener from the listener list. This removes a registered PropertyChangeListener.- Parameters:
listener- The PropertyChangeListener to be removed.
-
getNodes
Return the nodes of the URL matching the filter. This is the primary output of the bean.- Returns:
- The nodes from the URL matching the current filter.
-
getURL
Get the current URL.- Returns:
- The URL from which text has been extracted, or
nullif this property has not been set yet.
-
setURL
Set the URL to extract strings from. The text from the URL will be fetched, which may be expensive, so this property should be set last.- Parameters:
url- The URL that text should be fetched from.
-
getConnection
Get the current connection.- Returns:
- The connection that the parser has or
nullif it hasn't been set or the parser hasn't been constructed yet.
-
setConnection
Set the parser's connection. The text from the URL will be fetched, which may be expensive, so this property should be set last.- Parameters:
connection- New value of property Connection.
-
getFilters
Get the current filter set.- Returns:
- The current filters.
-
setFilters
Set the filters for the bean. If the parser has been set, it is reset and the nodes are refetched with the new filters.- Parameters:
filters- The filter set to use.
-
getParser
Get the parser used to fetch nodes.- Returns:
- The parser used by the bean.
-
setParser
Set the parser for the bean. The parser is used immediately to fetch the nodes, which for a null filter means all the nodes- Parameters:
parser- The parser to use.
-
getText
Convenience method to apply aStringBeanto the filter results. This may yield duplicate or multiple text elements if the node list contains nodes from two or more levels in the same nested tag heirarchy, but if the node list contains only one tag, it provides access to the text within the node.- Returns:
- The textual contents of the nodes that pass through the filter set, as collected by the StringBean.
-
getRecursive
public boolean getRecursive()Get the current recursion behaviour.- Returns:
- The recursion (applies to children, children's children, etc) behavior currently being used.
-
setRecursive
public void setRecursive(boolean recursive) Set the recursion behaviour.- Parameters:
recursive- IftruetheextractAllNodesThatMatch()call is performed recursively.
-
main
Unit test.- Parameters:
args- Pass arg[0] as the URL to process, and optionally a node name for filtering.
-