Class SequenceDatabase
- All Implemented Interfaces:
Serializable
PrefixSpan algorithm.
An instance of this class is the input to
PrefixSpan which performs sequential pattern
mining. The output of PrefixSpan.getFrequentSequences(double)
is also an instance of SequenceDatabase that contains the frequent
subsequences.
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidcreateFromTransactions(double[][] trxDatabase, int[] columnIndex) Creates a sequence database from a transaction database.intReturns the length of the sequence data.intReturns the number of customers.intReturns the number of items.intReturns the number of original sequences for thisSequenceDatabase.intReturns the number of sequences contained in the sequence data.int[]Returns the sequence data.int[]Returns the support vector for the sequences in thisSequenceDatabase.voidprint()Prints the sequence database to the standard output.voidsetNumberOfCustomers(int numCustIds) Sets the number of customers represented in thisSequenceDatabase.voidsetNumberOfItems(int numberOfItems) Sets the number of items expected to occur in the sequence data.voidsetNumberOfOriginalSequences(int numOrigSequences) Sets the number of original sequences.voidsetSequenceData(int[] seqData) Sets the sequence data.voidsetSupportVector(int[] support) Sets the support vector for the sequences in thisSequenceDatabase.voidSerializes the sequence database to a file.
-
Constructor Details
-
SequenceDatabase
public SequenceDatabase()
-
-
Method Details
-
createFromTransactions
public void createFromTransactions(double[][] trxDatabase, int[] columnIndex) Creates a sequence database from a transaction database.- Parameters:
trxDatabase- adoublearray containing transactions.trxDatabasemust contain unique customer ids in columncolumnIndex[0], unique transaction ids in columncolumnIndex[1], and unique product ids in columncolumnIndex[2]. Customer, transaction, and item id's must be integers, \(\ge 0\).columnIndex- an int array of length 3 containing the column indices
-
print
public void print()Prints the sequence database to the standard output. -
write
Serializes the sequence database to a file.- Parameters:
fileName- aString, the name of the file- Throws:
IOException
-
getNumberOfOriginalSequences
public int getNumberOfOriginalSequences()Returns the number of original sequences for thisSequenceDatabase.- Returns:
- an
int, the number of original sequences
-
getNumberOfItems
public int getNumberOfItems()Returns the number of items.- Returns:
- an
int, the number of items in the sequence data
-
setNumberOfItems
public void setNumberOfItems(int numberOfItems) Sets the number of items expected to occur in the sequence data.Typically this value is set within
PrefixSpanalgorithm.- Parameters:
numberOfItems- anint, the number of items expected in the sequence data
-
getNumberOfCustomers
public int getNumberOfCustomers()Returns the number of customers.The number of customers is equal to the number of original sequences.
- Returns:
- an
int, the number of customers
-
setNumberOfCustomers
public void setNumberOfCustomers(int numCustIds) Sets the number of customers represented in thisSequenceDatabase.The number of customers is equal to the number of original sequences.
- Parameters:
numCustIds- anint, the number of customers
-
getNumberOfSequences
public int getNumberOfSequences()Returns the number of sequences contained in the sequence data.- Returns:
- an
int, the number of sequences.
-
getSupportVector
public int[] getSupportVector()Returns the support vector for the sequences in thisSequenceDatabase.- Returns:
- an
intarray containing support (count of occurrences) for each sequence
-
setNumberOfOriginalSequences
public void setNumberOfOriginalSequences(int numOrigSequences) Sets the number of original sequences.The number of original sequences within which frequent sequences are sought. It is equivalent to the number of customers.
- Parameters:
numOrigSequences- anint, the number of original sequences. It should be positive.
-
setSupportVector
public void setSupportVector(int[] support) Sets the support vector for the sequences in thisSequenceDatabase.Typically the support vector is populated in the method
PrefixSpan.getFrequentSequences(double), but may need to be initialized outside the sequence mining operation.- Parameters:
support- anintarray containing support (count of occurrences) for each sequence
-
getDataLength
public int getDataLength()Returns the length of the sequence data.- Returns:
- an
int, the length of the sequence data
-
getSequenceData
public int[] getSequenceData()Returns the sequence data.- Returns:
- an
intarray containing the sequence data
-
setSequenceData
public void setSequenceData(int[] seqData) Sets the sequence data.The sequence data is an integer array containing integer valued, non-negative item ids, followed by a -1 to indicate a separate transaction, and followed by -2 to separate sequences. Between -1’s, an item id can occur at most once, and all item id’s between -1’s must be in numerical order. For example, the following segment contains two sequences, the first having 5 transactions and the second having 4 transactions involving item id’s 1, 2, and 3:
{2,3,-1,1,-1,1,-1,2,-1,1,2,3,-1,-2,1,-1,2,-1,3,-1,2,-1,-2,…}
This method checks the data for this format and issues exceptions if the format is incorrect.- Parameters:
seqData- anintarray containing the sequence data
-