Class OpticalDuplicateFinder

java.lang.Object
picard.sam.util.ReadNameParser
picard.sam.markduplicates.util.OpticalDuplicateFinder
All Implemented Interfaces:
Serializable

public class OpticalDuplicateFinder extends ReadNameParser implements Serializable
Contains methods for finding optical/co-localized/sequencing duplicates.
See Also:
  • Field Details

    • opticalDuplicatePixelDistance

      public int opticalDuplicatePixelDistance
    • DEFAULT_OPTICAL_DUPLICATE_DISTANCE

      public static final int DEFAULT_OPTICAL_DUPLICATE_DISTANCE
      See Also:
    • DEFAULT_BIG_DUPLICATE_SET_SIZE

      public static final int DEFAULT_BIG_DUPLICATE_SET_SIZE
      See Also:
    • DEFAULT_MAX_DUPLICATE_SET_SIZE

      public static final int DEFAULT_MAX_DUPLICATE_SET_SIZE
      See Also:
  • Constructor Details

    • OpticalDuplicateFinder

      public OpticalDuplicateFinder()
      Uses the default duplicate distance DEFAULT_OPTICAL_DUPLICATE_DISTANCE (100) and the default read name regex ReadNameParser.DEFAULT_READ_NAME_REGEX.
    • OpticalDuplicateFinder

      public OpticalDuplicateFinder(String readNameRegex, int opticalDuplicatePixelDistance, htsjdk.samtools.util.Log log)
      Parameters:
      readNameRegex - see ReadNameParser.DEFAULT_READ_NAME_REGEX.
      opticalDuplicatePixelDistance - the optical duplicate pixel distance
      log - the log to which to write messages.
    • OpticalDuplicateFinder

      public OpticalDuplicateFinder(String readNameRegex, int opticalDuplicatePixelDistance, long maxDuplicateSetSize, htsjdk.samtools.util.Log log)
      Parameters:
      readNameRegex - see ReadNameParser.DEFAULT_READ_NAME_REGEX.
      opticalDuplicatePixelDistance - the optical duplicate pixel distance
      maxDuplicateSetSize - the size of a set that is too big enough to process
      log - the log to which to write messages.
  • Method Details

    • setBigDuplicateSetSize

      public void setBigDuplicateSetSize(int bigDuplicateSetSize)
      Sets the size of a set that is big enough to log progress about. Defaults to 1000
      Parameters:
      bigDuplicateSetSize - the size of a set that is big enough to log progress about
    • setMaxDuplicateSetSize

      public void setMaxDuplicateSetSize(long maxDuplicateSetSize)
      Sets the size of a set that is too big to process. Defaults to 300000
      Parameters:
      maxDuplicateSetSize - the size of a set that is too big enough to process
    • findOpticalDuplicates

      public boolean[] findOpticalDuplicates(List<? extends PhysicalLocation> list, PhysicalLocation keeper)
      Finds which reads within the list of duplicates that are likely to be optical/co-localized duplicates of one another. Within each cluster of optical duplicates that is found, one read remains un-flagged for optical duplication and the rest are flagged as optical duplicates. The set of reads that are considered optical duplicates are indicated by returning "true" at the same index in the resulting boolean[] as the read appeared in the input list of physical locations.
      Parameters:
      list - a list of reads that are determined to be duplicates of one another
      keeper - a single PhysicalLocation that is the one being kept as non-duplicate, and thus should never be annotated as an optical duplicate. May in some cases be null, or a PhysicalLocation not contained within the list!
      Returns:
      a boolean[] of the same length as the incoming list marking which reads are optical duplicates