Class SequenceDatabase

java.lang.Object
com.imsl.datamining.SequenceDatabase
All Implemented Interfaces:
Serializable

public class SequenceDatabase extends Object implements Serializable
Defines a sequence database for use with the PrefixSpan algorithm.

An instance of this class is the input to PrefixSpan which performs sequential pattern mining. The output of PrefixSpan.getFrequentSequences(double) is also an instance of SequenceDatabase that contains the frequent subsequences.

See Also:
  • Constructor Details

    • SequenceDatabase

      public SequenceDatabase()
  • Method Details

    • createFromTransactions

      public void createFromTransactions(double[][] trxDatabase, int[] columnIndex)
      Creates a sequence database from a transaction database.
      Parameters:
      trxDatabase - a double array containing transactions.

      trxDatabase must contain unique customer ids in column columnIndex[0], unique transaction ids in column columnIndex[1], and unique product ids in column columnIndex[2]. Customer, transaction, and item id's must be integers, \(\ge 0\).

      columnIndex - an int array of length 3 containing the column indices
    • print

      public void print()
      Prints the sequence database to the standard output.
    • write

      public void write(String fileName) throws IOException
      Serializes the sequence database to a file.
      Parameters:
      fileName - a String, the name of the file
      Throws:
      IOException
    • getNumberOfOriginalSequences

      public int getNumberOfOriginalSequences()
      Returns the number of original sequences for this SequenceDatabase.
      Returns:
      an int, the number of original sequences
    • getNumberOfItems

      public int getNumberOfItems()
      Returns the number of items.
      Returns:
      an int, the number of items in the sequence data
    • setNumberOfItems

      public void setNumberOfItems(int numberOfItems)
      Sets the number of items expected to occur in the sequence data.

      Typically this value is set within PrefixSpan algorithm.

      Parameters:
      numberOfItems - an int, the number of items expected in the sequence data
    • getNumberOfCustomers

      public int getNumberOfCustomers()
      Returns the number of customers.

      The number of customers is equal to the number of original sequences.

      Returns:
      an int, the number of customers
    • setNumberOfCustomers

      public void setNumberOfCustomers(int numCustIds)
      Sets the number of customers represented in this SequenceDatabase.

      The number of customers is equal to the number of original sequences.

      Parameters:
      numCustIds - an int, the number of customers
    • getNumberOfSequences

      public int getNumberOfSequences()
      Returns the number of sequences contained in the sequence data.
      Returns:
      an int, the number of sequences.
    • getSupportVector

      public int[] getSupportVector()
      Returns the support vector for the sequences in this SequenceDatabase.
      Returns:
      an int array containing support (count of occurrences) for each sequence
    • setNumberOfOriginalSequences

      public void setNumberOfOriginalSequences(int numOrigSequences)
      Sets the number of original sequences.

      The number of original sequences within which frequent sequences are sought. It is equivalent to the number of customers.

      Parameters:
      numOrigSequences - an int, the number of original sequences. It should be positive.
    • setSupportVector

      public void setSupportVector(int[] support)
      Sets the support vector for the sequences in this SequenceDatabase.

      Typically the support vector is populated in the method PrefixSpan.getFrequentSequences(double), but may need to be initialized outside the sequence mining operation.

      Parameters:
      support - an int array containing support (count of occurrences) for each sequence
    • getDataLength

      public int getDataLength()
      Returns the length of the sequence data.
      Returns:
      an int, the length of the sequence data
    • getSequenceData

      public int[] getSequenceData()
      Returns the sequence data.
      Returns:
      an int array containing the sequence data
    • setSequenceData

      public void setSequenceData(int[] seqData)
      Sets the sequence data.

      The sequence data is an integer array containing integer valued, non-negative item ids, followed by a -1 to indicate a separate transaction, and followed by -2 to separate sequences. Between -1’s, an item id can occur at most once, and all item id’s between -1’s must be in numerical order. For example, the following segment contains two sequences, the first having 5 transactions and the second having 4 transactions involving item id’s 1, 2, and 3:

      {2,3,-1,1,-1,1,-1,2,-1,1,2,3,-1,-2,1,-1,2,-1,3,-1,2,-1,-2,…}

      This method checks the data for this format and issues exceptions if the format is incorrect.
      Parameters:
      seqData - an int array containing the sequence data