Class MatchStarTables
- java.lang.Object
-
- uk.ac.starlink.table.join.MatchStarTables
-
public class MatchStarTables extends java.lang.ObjectProvides factory methods for producing tables which represent the result of row matching.- Author:
- Mark Taylor (Starlink)
-
-
Field Summary
Fields Modifier and Type Field Description static ValueInfoGRP_ID_INFODefines the characteristics of a table column which represents the ID of a group of matched row objects.static ValueInfoGRP_SIZE_INFODefines the characteristics of a table column which represents the number of matched row objects in a given group (with the same group ID).
-
Constructor Summary
Constructors Constructor Description MatchStarTables()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static java.util.Map<RowLink,LinkGroup>findGroups(LinkSet links)static StarTablemakeInternalMatchTable(int iTable, LinkSet rowLinks, long rowCount)Analyses a set of RowLinks to mark as linked rows of a given table.static StarTablemakeJoinTable(StarTable[] tables, LinkSet rowLinks, boolean addGroups, JoinFixAction[] fixActs, ValueInfo matchScoreInfo)Constructs a table made out of a set of constituent tables joined together according to aLinkSetdescribing row matches.static StarTablemakeJoinTable(StarTable table1, StarTable table2, LinkSet pairs, JoinType joinType, boolean addGroups, JoinFixAction[] fixActs, ValueInfo matchScoreInfo)static StarTablemakeParallelMatchTable(StarTable table, int iTable, LinkSet links, int width, int minSize, int maxSize, JoinFixAction[] fixActs)Constructs a new wide table from a single given base table and a set of RowLinks.static StarTablemakeSequentialJoinTable(StarTable[] tables, LinkSet rowLinks, JoinFixAction[] fixActs, ValueInfo matchScoreInfo)Constructs a non-random table made out of a set of possibly non-random constituent tables joined together according to a LinkSet.
-
-
-
Field Detail
-
GRP_ID_INFO
public static final ValueInfo GRP_ID_INFO
Defines the characteristics of a table column which represents the ID of a group of matched row objects.
-
GRP_SIZE_INFO
public static final ValueInfo GRP_SIZE_INFO
Defines the characteristics of a table column which represents the number of matched row objects in a given group (with the same group ID).
-
-
Method Detail
-
makeJoinTable
public static StarTable makeJoinTable(StarTable table1, StarTable table2, LinkSet pairs, JoinType joinType, boolean addGroups, JoinFixAction[] fixActs, ValueInfo matchScoreInfo)
Constructs a table made out of two constituent tables joined together according to aLinkSetdescribing row matches and a flag determining what conditions on aRowLinkgive you an output row. The columns of the resulting table are made by appending the columns of the constituent tables side by side.The tables array determines which tables columns appear in the output table. It must have (at least) as many elements as the highest table index in the RowLink set. Table data will be picked from the n'th table in this array for RowRef elements with a tableIndex of n. If the nth element is null, the corresponding columns will not appear in the output table.
The
matchScoreInfoparameter is optional. If it is non-null, then an additional column, described bymatchScoreInfo, will be added to the table containing thescorevalues from anyRowLink2s inlinks. The content class ofmatchScoreInfoshould beNumberor one of its subclasses.This is a convenience method which calls the other
makeJoinTablemethod.- Parameters:
table1- first input tabletable2- second input tablepairs- set of links each representing a matched pair of rows betweentable1andtable2. Contents of this set may be modified by this routinejoinType- describes how the input list of matched pairs is used to generate an output sequence of rowsaddGroups- flag which indicates whether the output table should, if appropriate, includeGRP_ID_INFOandGRP_SIZE_INFOcolumnsfixActs- actions to take for deduplicating column names (array of the same length as tables)matchScoreInfo- may supply information about the meaning of the match scores- Returns:
- table representing the join
-
makeJoinTable
public static StarTable makeJoinTable(StarTable[] tables, LinkSet rowLinks, boolean addGroups, JoinFixAction[] fixActs, ValueInfo matchScoreInfo)
Constructs a table made out of a set of constituent tables joined together according to aLinkSetdescribing row matches. The columns of the resulting table are made by appending the columns of the constituent tables side by side. Each row in the resulting table corresponds to oneRowLinkentry in a set rowLinks; if that RowLink contains a row from one of the tables being joined here, the columns corresponding to that table are filled in. If it contains multiple rows from that table, an arbitrary one of them is filled in.The tables array determines which tables columns appear in the output table. It must have (at least) as many elements as the highest table index in the RowLink set. Table data will be picked from the n'th table in this array for RowRef elements with a tableIndex of n. If the nth element is null, the corresponding columns will not appear in the output table.
The
matchScoreInfoparameter is optional. If it is non-null, then an additional column, described bymatchScoreInfo, will be added to the table containing thescorevalues from theRowLinks inlinks. The content class ofmatchScoreInfoshould beNumberor one of its subclasses.- Parameters:
tables- array of constituent tablesrowLinks- set of RowLink objects which define which rows in one table are associated with which rows in the othersaddGroups- flag which indicates whether the output table should, if appropriate, includeGRP_ID_INFOandGRP_SIZE_INFOcolumnsfixActs- actions to take for deduplicating column names (array of the same length as tables)matchScoreInfo- may supply information about the meaning of the link scores
-
makeSequentialJoinTable
public static StarTable makeSequentialJoinTable(StarTable[] tables, LinkSet rowLinks, JoinFixAction[] fixActs, ValueInfo matchScoreInfo)
Constructs a non-random table made out of a set of possibly non-random constituent tables joined together according to a LinkSet. Any input tables which do not have random access must have row ordering consistent with (that is, monotonically increasing for) the ordering of the links in the LinkSet. In practice, this is only likely to be the case if all the input tables are random access except for (at most) one, and the links are ordered with reference to that one. If this requirement is not met, sequential access to the resulting table is likely to fail at some point.- Parameters:
tables- array of constituent tablesrowLinks- link set defining the matchfixActs- actions to take for deduplicating column names (array of the same size astables)matchScoreInfo- may suply information about the meaning of the match scores, if present
-
makeInternalMatchTable
public static StarTable makeInternalMatchTable(int iTable, LinkSet rowLinks, long rowCount)
Analyses a set of RowLinks to mark as linked rows of a given table. The result of this method is a two-column table whose rows correspond one-to-one with the rows of the table referenced in the link set. The output columns are defined by the constantsGRP_ID_INFOandGRP_SIZE_INFO. Rows of the table linked together by rowLinks are assigned the same integer value in the new GRP_ID_INFO column, and the GRP_SIZE_INFO column indicates how many rows are linked together in this way. Each group corresponds to a single RowLink; if a row is part of more than one RowLink then only one of them will be recorded in the new columns. Any rows linked in rowLinks which do not refer to table have null entries in these columns.- Parameters:
iTable- the index of the table in which internal matches are to be soughtrowLinks- a collection ofRowLinkobjects linking groups of rows togetherrowCount- number of rows in the returned table (must be large enough to accommodate the indices in rowLinks)- Returns:
- a new two-column table with a one-to-one row correspondance with the table describing internal row matches
-
makeParallelMatchTable
public static StarTable makeParallelMatchTable(StarTable table, int iTable, LinkSet links, int width, int minSize, int maxSize, JoinFixAction[] fixActs)
Constructs a new wide table from a single given base table and a set of RowLinks. The resulting table consists of a number of sections of the original table placed side by side, so it has width times the number of columns that table does. Each row is constructed from one or more rows of the original table; each output row corresponds to a single RowLink. Only row links which have at least minSize entries and no more than maxSize entries are converted into output rows; if there are more entries than the width of the table the extras are just discarded. Any row references in a RowLink not corresponding to table index iTable are ignored.- Parameters:
table- input tableiTable- index corresponding to this table in the rowLinks setlinks- collection ofRowLinkobjects describing the matches. This collection is modified on exitwidth- width of the output table as a multiple of the width of the input tableminSize- minimum number of entries in a RowLink to count as an output rowmaxSize- maximum number of entries in a RowLink to count as an output row; also the width of the output table (as a multiple of the width of the input table)fixActs- actions to take for deduplicating column names (width-element array, or null)
-
findGroups
public static java.util.Map<RowLink,LinkGroup> findGroups(LinkSet links)
Returns a mapping fromRowLinks toLinkGroups which describes connected groups of links in the input LinkSet. A related group is one in which the RowRefs of its constituent RowLinks form a connected graph in which RowRefs are the nodes and RowLinks are the edges. A LinkGroup with a link count of more than one therefore represents an ambiguous match, that is one in which one or more of its RowRefs is contained in more than one RowLink in the original LinkSet.The returned map contains entries only for non-trivial LinkGroups, that is ones which contain more than one link.
- Parameters:
links- link set representing a set of matches- Returns:
- RowLink -> LinkGroup mapping describing connected groups
in
links
-
-