Class URI
- java.lang.Object
-
- com.jfrog.bintray.client.impl.util.URI
-
public class URI extends Object
The interface for the URI(Uniform Resource Identifiers) version of RFC 2396. This class has the purpose of supportting of parsing a URI reference to extend any specific protocols, the character encoding of the protocol to be transported and the charset of the document. A URI is always in an "escaped" form, since escaping or unescaping a completed URI might change its semantics. Implementers should be careful not to escape or unescape the same string more than once, since unescaping an already unescaped string might lead to misinterpreting a percent data character as another escaped character, or vice versa in the case of escaping an already escaped string. In order to avoid these problems, data types used as follows:URI character sequence: char octet sequence: byte original character sequence: String
So, a URI is a sequence of characters as an array of a char type, which is not always represented as a sequence of octets as an array of byte. URI Syntactic Components- In general, written as follows: Absolute URI = <scheme>:<scheme-specific-part> Generic URI = <scheme>://<authority><path>?<query> - Syntax absoluteURI = scheme ":" ( hier_part | opaque_part ) hier_part = ( net_path | abs_path ) [ "?" query ] net_path = "//" authority [ abs_path ] abs_path = "/" path_segments
The following examples illustrate URI that are in common use.ftp://ftp.is.co.za/rfc/rfc1808.txt -- ftp scheme for File Transfer Protocol services gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles -- gopher scheme for Gopher and Gopher+ Protocol services http://www.math.uio.no/faq/compression-faq/part1.html -- http scheme for Hypertext Transfer Protocol services mailto:mduerst@ifi.unizh.ch -- mailto scheme for electronic mail addresses news:comp.infosystems.www.servers.unix -- news scheme for USENET news groups and articles telnet://melvyl.ucop.edu/ -- telnet scheme for interactive services via the TELNET ProtocolPlease, notice that there are many modifications from URL(RFC 1738) and relative URL(RFC 1808). The expressions for a URIFor escaped URI forms - URI(char[]) // constructor - char[] getRawXxx() // method - String getEscapedXxx() // method - String toString() // method For unescaped URI forms - URI(String) // constructor - String getXXX() // method
-
-
Field Summary
Fields Modifier and Type Field Description protected static BitSetabs_pathURI absolute path.protected static BitSetabsoluteURIBitSet for absoluteURI.static BitSetallowed_abs_pathThose characters that are allowed for the abs_path.static BitSetallowed_authorityThose characters that are allowed for the authority component.static BitSetallowed_fragmentThose characters that are allowed for the fragment component.static BitSetallowed_hostThose characters that are allowed for the host component.static BitSetallowed_IPv6referenceThose characters that are allowed for the IPv6reference component.static BitSetallowed_opaque_partThose characters that are allowed for the opaque_part.static BitSetallowed_queryThose characters that are allowed for the query component.static BitSetallowed_reg_nameThose characters that are allowed for the reg_name.static BitSetallowed_rel_pathThose characters that are allowed for the rel_path.static BitSetallowed_userinfoThose characters that are allowed for the userinfo component.static BitSetallowed_within_authorityThose characters that are allowed for the authority component.static BitSetallowed_within_pathThose characters that are allowed within the path.static BitSetallowed_within_queryThose characters that are allowed within the query component.static BitSetallowed_within_userinfoThose characters that are allowed for within the userinfo component.protected static BitSetalphaBitSet for alpha.protected static BitSetalphanumBitSet for alphanum (join of alpha & digit).protected static BitSetauthorityBitSet for authority.static BitSetcontrolBitSet for control.static BitSetdelimsBitSet for delims.protected static BitSetdigitBitSet for digit.static BitSetdisallowed_opaque_partDisallowed opaque_part before escaping.static BitSetdisallowed_rel_pathDisallowed rel_path before escaping.protected static BitSetescapedBitSet for escaped.protected static BitSetfragmentBitSet for fragment (alias for uric).protected static BitSethexBitSet for hex.protected static BitSethier_partBitSet for hier_part.protected static BitSethostBitSet for host.protected static BitSethostnameBitSet for hostname.protected static BitSethostportBitSet for hostport.protected static BitSetIPv4addressBitset that combines digit and dot fo IPv$address.protected static BitSetIPv6addressRFC 2373.protected static BitSetIPv6referenceRFC 2732, 2373.protected static BitSetmarkBitSet for mark.protected static BitSetnet_pathBitSet for net_path.protected static BitSetopaque_partURI bitset that combines uric_no_slash and uric.protected static BitSetparamBitSet for param (alias for pchar).protected static BitSetpathURI bitset that combines absolute path and opaque part.protected static BitSetpath_segmentsBitSet for path segments.protected static BitSetpcharBitSet for pchar.protected static BitSetpercentThe percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI.protected static BitSetportPort, a logical alias for digit.protected static BitSetqueryBitSet for query (alias for uric).protected static BitSetreg_nameBitSet for reg_name.protected static BitSetrel_pathBitSet for rel_path.protected static BitSetrel_segmentBitSet for rel_segment.protected static BitSetrelativeURIBitSet for relativeURI.protected static BitSetreservedBitSet for reserved.protected static BitSetschemeBitSet for scheme.protected static BitSetsegmentBitSet for segment.protected static BitSetserverBitset for server.static BitSetspaceBitSet for space.protected static BitSettoplabelBitSet for toplabel.protected static BitSetunreservedData characters that are allowed in a URI but do not have a reserved purpose are called unreserved.static BitSetunwiseBitSet for unwise.protected static BitSetURI_referenceBitSet for URI-reference.protected static BitSeturicBitSet for uric.protected static BitSeturic_no_slashURI bitset for encoding typical non-slash characters.protected static BitSetuserinfoBitset for userinfo.static BitSetwithin_userinfoBitSet for within the userinfo component like user and password.
-
Constructor Summary
Constructors Constructor Description URI()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description protected static Stringdecode(char[] component, String charset)Decodes URI encoded string.protected static Stringdecode(String component, String charset)Decodes URI encoded string.protected static char[]encode(String original, BitSet allowed, String charset)Encodes URI string.
-
-
-
Field Detail
-
within_userinfo
public static final BitSet within_userinfo
BitSet for within the userinfo component like user and password.
-
control
public static final BitSet control
BitSet for control.
-
space
public static final BitSet space
BitSet for space.
-
delims
public static final BitSet delims
BitSet for delims.
-
unwise
public static final BitSet unwise
BitSet for unwise.
-
disallowed_rel_path
public static final BitSet disallowed_rel_path
Disallowed rel_path before escaping.
-
disallowed_opaque_part
public static final BitSet disallowed_opaque_part
Disallowed opaque_part before escaping.
-
allowed_authority
public static final BitSet allowed_authority
Those characters that are allowed for the authority component.
-
allowed_opaque_part
public static final BitSet allowed_opaque_part
Those characters that are allowed for the opaque_part.
-
allowed_reg_name
public static final BitSet allowed_reg_name
Those characters that are allowed for the reg_name.
-
allowed_userinfo
public static final BitSet allowed_userinfo
Those characters that are allowed for the userinfo component.
-
allowed_within_userinfo
public static final BitSet allowed_within_userinfo
Those characters that are allowed for within the userinfo component.
-
allowed_IPv6reference
public static final BitSet allowed_IPv6reference
Those characters that are allowed for the IPv6reference component. The characters '[', ']' in IPv6reference should be excluded.
-
allowed_host
public static final BitSet allowed_host
Those characters that are allowed for the host component. The characters '[', ']' in IPv6reference should be excluded.
-
allowed_within_authority
public static final BitSet allowed_within_authority
Those characters that are allowed for the authority component.
-
allowed_abs_path
public static final BitSet allowed_abs_path
Those characters that are allowed for the abs_path.
-
allowed_rel_path
public static final BitSet allowed_rel_path
Those characters that are allowed for the rel_path.
-
allowed_within_path
public static final BitSet allowed_within_path
Those characters that are allowed within the path.
-
allowed_query
public static final BitSet allowed_query
Those characters that are allowed for the query component.
-
allowed_within_query
public static final BitSet allowed_within_query
Those characters that are allowed within the query component.
-
allowed_fragment
public static final BitSet allowed_fragment
Those characters that are allowed for the fragment component.
-
percent
protected static final BitSet percent
The percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI.
-
digit
protected static final BitSet digit
BitSet for digit.digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
-
alpha
protected static final BitSet alpha
BitSet for alpha.alpha = lowalpha | upalpha
-
alphanum
protected static final BitSet alphanum
BitSet for alphanum (join of alpha & digit).alphanum = alpha | digit
-
hex
protected static final BitSet hex
BitSet for hex.hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f"
-
escaped
protected static final BitSet escaped
BitSet for escaped.escaped = "%" hex hex
-
mark
protected static final BitSet mark
BitSet for mark.mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
-
unreserved
protected static final BitSet unreserved
Data characters that are allowed in a URI but do not have a reserved purpose are called unreserved.unreserved = alphanum | mark
-
reserved
protected static final BitSet reserved
BitSet for reserved.reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
-
uric
protected static final BitSet uric
BitSet for uric.uric = reserved | unreserved | escaped
-
fragment
protected static final BitSet fragment
BitSet for fragment (alias for uric).fragment = *uric
-
query
protected static final BitSet query
BitSet for query (alias for uric).query = *uric
-
pchar
protected static final BitSet pchar
BitSet for pchar.pchar = unreserved | escaped | ":" | "@" | "&" | "=" | "+" | "$" | ","
-
param
protected static final BitSet param
BitSet for param (alias for pchar).param = *pchar
-
segment
protected static final BitSet segment
BitSet for segment.segment = *pchar *( ";" param )
-
path_segments
protected static final BitSet path_segments
BitSet for path segments.path_segments = segment *( "/" segment )
-
abs_path
protected static final BitSet abs_path
URI absolute path.abs_path = "/" path_segments
-
uric_no_slash
protected static final BitSet uric_no_slash
URI bitset for encoding typical non-slash characters.uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
-
opaque_part
protected static final BitSet opaque_part
URI bitset that combines uric_no_slash and uric.opaque_part = uric_no_slash *uric
-
path
protected static final BitSet path
URI bitset that combines absolute path and opaque part.path = [ abs_path | opaque_part ]
-
port
protected static final BitSet port
Port, a logical alias for digit.
-
IPv4address
protected static final BitSet IPv4address
Bitset that combines digit and dot fo IPv$address.IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit
-
IPv6address
protected static final BitSet IPv6address
RFC 2373.IPv6address = hexpart [ ":" IPv4address ]
-
IPv6reference
protected static final BitSet IPv6reference
RFC 2732, 2373.IPv6reference = "[" IPv6address "]"
-
toplabel
protected static final BitSet toplabel
BitSet for toplabel.toplabel = alpha | alpha *( alphanum | "-" ) alphanum
-
hostname
protected static final BitSet hostname
BitSet for hostname.hostname = *( domainlabel "." ) toplabel [ "." ]
-
host
protected static final BitSet host
BitSet for host.host = hostname | IPv4address | IPv6reference
-
hostport
protected static final BitSet hostport
BitSet for hostport.hostport = host [ ":" port ]
-
userinfo
protected static final BitSet userinfo
Bitset for userinfo.userinfo = *( unreserved | escaped | ";" | ":" | "&" | "=" | "+" | "$" | "," )
-
server
protected static final BitSet server
Bitset for server.server = [ [ userinfo "@" ] hostport ]
-
reg_name
protected static final BitSet reg_name
BitSet for reg_name.reg_name = 1*( unreserved | escaped | "$" | "," | ";" | ":" | "@" | "&" | "=" | "+" )
-
authority
protected static final BitSet authority
BitSet for authority.authority = server | reg_name
-
scheme
protected static final BitSet scheme
BitSet for scheme.scheme = alpha *( alpha | digit | "+" | "-" | "." )
-
rel_segment
protected static final BitSet rel_segment
BitSet for rel_segment.rel_segment = 1*( unreserved | escaped | ";" | "@" | "&" | "=" | "+" | "$" | "," )
-
rel_path
protected static final BitSet rel_path
BitSet for rel_path.rel_path = rel_segment [ abs_path ]
-
net_path
protected static final BitSet net_path
BitSet for net_path.net_path = "//" authority [ abs_path ]
-
hier_part
protected static final BitSet hier_part
BitSet for hier_part.hier_part = ( net_path | abs_path ) [ "?" query ]
-
relativeURI
protected static final BitSet relativeURI
BitSet for relativeURI.relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
-
absoluteURI
protected static final BitSet absoluteURI
BitSet for absoluteURI.absoluteURI = scheme ":" ( hier_part | opaque_part )
-
URI_reference
protected static final BitSet URI_reference
BitSet for URI-reference.URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
-
-
Method Detail
-
encode
protected static char[] encode(String original, BitSet allowed, String charset) throws org.apache.http.HttpException
Encodes URI string. This is a two mapping, one from original characters to octets, and subsequently a second from octets to URI characters:original character sequence->octet sequence->URI character sequence
An escaped octet is encoded as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits representing the octet code. For example, "%20" is the escaped encoding for the US-ASCII space character. Conversion from the local filesystem character set to UTF-8 will normally involve a two step process. First convert the local character set to the UCS; then convert the UCS to UTF-8. The first step in the process can be performed by maintaining a mapping table that includes the local character set code and the corresponding UCS code. The next step is to convert the UCS character code to the UTF-8 encoding. Mapping between vendor codepages can be done in a very similar manner as described above. The only time escape encodings can allowedly be made is when a URI is being created from its component parts. The escape and validate methods are internally performed within this method.- Parameters:
original- the original character sequenceallowed- those characters that are allowed within a componentcharset- the protocol charset- Returns:
- URI character sequence
- Throws:
org.apache.http.HttpException- null component or unsupported character encoding
-
decode
protected static String decode(char[] component, String charset) throws org.apache.http.HttpException
Decodes URI encoded string. This is a two mapping, one from URI characters to octets, and subsequently a second from octets to original characters:URI character sequence->octet sequence->original character sequence
A URI must be separated into its components before the escaped characters within those components can be allowedly decoded. Notice that there is a chance that URI characters that are non UTF-8 may be parsed as valid UTF-8. A recent non-scientific analysis found that EUC encoded Japanese words had a 2.7% false reading; SJIS had a 0.0005% false reading; other encoding such as ASCII or KOI-8 have a 0% false reading. The percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI. The unescape method is internally performed within this method.- Parameters:
component- the URI character sequencecharset- the protocol charset- Returns:
- original character sequence
- Throws:
org.apache.http.HttpException- incomplete trailing escape pattern or unsupported character encoding
-
decode
protected static String decode(String component, String charset) throws org.apache.http.HttpException
Decodes URI encoded string. This is a two mapping, one from URI characters to octets, and subsequently a second from octets to original characters:URI character sequence->octet sequence->original character sequence
A URI must be separated into its components before the escaped characters within those components can be allowedly decoded. Notice that there is a chance that URI characters that are non UTF-8 may be parsed as valid UTF-8. A recent non-scientific analysis found that EUC encoded Japanese words had a 2.7% false reading; SJIS had a 0.0005% false reading; other encoding such as ASCII or KOI-8 have a 0% false reading. The percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI. The unescape method is internally performed within this method.- Parameters:
component- the URI character sequencecharset- the protocol charset- Returns:
- original character sequence
- Throws:
org.apache.http.HttpException- incomplete trailing escape pattern or unsupported character encoding- Since:
- 3.0
-
-