Package picard.sam

Class AbstractAlignmentMerger

java.lang.Object
picard.sam.AbstractAlignmentMerger
Direct Known Subclasses:
SamAlignmentMerger

public abstract class AbstractAlignmentMerger extends Object
Abstract class that coordinates the general task of taking in a set of alignment information, possibly in SAM format, possibly in other formats, and merging that with the set of all reads for which alignment was attempted, stored in an unmapped SAM file.

The order of processing is as follows:

1. Get records from the unmapped SAM/BAM/CRAM and the alignment data 2. Merge the alignment information and public tags ONLY from the aligned SAMRecords 3. Do additional modifications -- handle clipping, trimming, etc. 4. Fix up mate information on paired reads 5. Do a final calculation of the NM and UQ tags (coordinate sorted only) 6. Write the records to the output file.

Concrete subclasses which extend AbstractAlignmentMerger should implement getQueryNameSortedAlignedRecords. If these records are not in queryname order, mergeAlignment will throw an IllegalStateException.

Subclasses may optionally implement ignoreAlignment(), which can be used to skip over certain alignments.

  • Field Details

    • MAX_RECORDS_IN_RAM

      public static final int MAX_RECORDS_IN_RAM
      See Also:
    • referenceFasta

      protected final File referenceFasta
  • Constructor Details

    • AbstractAlignmentMerger

      public AbstractAlignmentMerger(File unmappedBamFile, File targetBamFile, File referenceFasta, boolean clipAdapters, boolean bisulfiteSequence, boolean alignedReadsOnly, htsjdk.samtools.SAMProgramRecord programRecord, List<String> attributesToRetain, List<String> attributesToRemove, Integer read1BasesTrimmed, Integer read2BasesTrimmed, List<htsjdk.samtools.SamPairUtil.PairOrientation> expectedOrientations, htsjdk.samtools.SAMFileHeader.SortOrder sortOrder, PrimaryAlignmentSelectionStrategy primaryAlignmentSelectionStrategy, boolean addMateCigar, boolean unmapContaminantReads)
      constructor with a default setting for unmappingReadsStrategy. see full constructor for parameters
    • AbstractAlignmentMerger

      public AbstractAlignmentMerger(File unmappedBamFile, File targetBamFile, File referenceFasta, boolean clipAdapters, boolean bisulfiteSequence, boolean alignedReadsOnly, htsjdk.samtools.SAMProgramRecord programRecord, List<String> attributesToRetain, List<String> attributesToRemove, Integer read1BasesTrimmed, Integer read2BasesTrimmed, List<htsjdk.samtools.SamPairUtil.PairOrientation> expectedOrientations, htsjdk.samtools.SAMFileHeader.SortOrder sortOrder, PrimaryAlignmentSelectionStrategy primaryAlignmentSelectionStrategy, boolean addMateCigar, boolean unmapContaminantReads, AbstractAlignmentMerger.UnmappingReadStrategy unmappingReadsStrategy)
      Constructor
      Parameters:
      unmappedBamFile - The BAM file that was used as the input to the aligner, which will include info on all the reads that did not map. Required.
      targetBamFile - The file to which to write the merged SAM records. Required.
      referenceFasta - The reference sequence for the map files. Required.
      clipAdapters - Whether adapters marked in unmapped BAM file should be marked as soft clipped in the merged bam. Required.
      bisulfiteSequence - Whether the reads are bisulfite sequence (used when calculating the NM and UQ tags). Required.
      alignedReadsOnly - Whether to output only those reads that have alignment data
      programRecord - Program record for target file SAMRecords created.
      attributesToRetain - private attributes from the alignment record that should be included when merging. This overrides the exclusion of attributes whose tags start with the reserved characters of X, Y, and Z
      attributesToRemove - attributes from the alignment record that should be removed when merging. This overrides attributesToRetain if they share common tags.
      read1BasesTrimmed - The number of bases trimmed from start of read 1 prior to alignment. Optional.
      read2BasesTrimmed - The number of bases trimmed from start of read 2 prior to alignment. Optional.
      expectedOrientations - A List of SamPairUtil.PairOrientations that are expected for aligned pairs. Used to determine the properPair flag.
      sortOrder - The order in which the merged records should be output. If null, output will be coordinate-sorted
      primaryAlignmentSelectionStrategy - What to do when there are multiple primary alignments, or multiple alignments but none primary, for a read or read pair.
      addMateCigar - True if we are to add or maintain the mate CIGAR (MC) tag, false if we are to remove or not include.
      unmapContaminantReads - If true, identify reads having the signature of cross-species contamination (i.e. mostly clipped bases), and mark them as unmapped.
      unmappingReadsStrategy - An enum describing how to deal with reads whose mapping information are being removed (currently this happens due to cross-species contamination). Ignored unless unmapContaminantReads is true.
  • Method Details

    • getDictionaryForMergedBam

      protected abstract htsjdk.samtools.SAMSequenceDictionary getDictionaryForMergedBam()
    • getQuerynameSortedAlignedRecords

      protected abstract htsjdk.samtools.util.CloseableIterator<htsjdk.samtools.SAMRecord> getQuerynameSortedAlignedRecords()
    • ignoreAlignment

      protected boolean ignoreAlignment(htsjdk.samtools.SAMRecord sam)
    • isContaminant

      protected boolean isContaminant(picard.sam.HitsForInsert hits)
    • getAttributesToReverse

      public Set<String> getAttributesToReverse()
      Gets the set of attributes to be reversed on reads marked as negative strand.
    • setAttributesToReverse

      public void setAttributesToReverse(Set<String> attributesToReverse)
      Sets the set of attributes to be reversed on reads marked as negative strand.
    • getAttributesToReverseComplement

      public Set<String> getAttributesToReverseComplement()
      Gets the set of attributes to be reverse complemented on reads marked as negative strand.
    • setAttributesToReverseComplement

      public void setAttributesToReverseComplement(Set<String> attributesToReverseComplement)
      Sets the set of attributes to be reverse complemented on reads marked as negative strand.
    • setMaxRecordsInRam

      public void setMaxRecordsInRam(int maxRecordsInRam)
      Allows the caller to override the maximum records in RAM.
    • setAddPGTagToReads

      public void setAddPGTagToReads(boolean addPGTagToReads)
      Set addPGTagToReads. If true, the PG will be added to reads when applicable. If false, the PG tag will not be added. Default is true
    • mergeAlignment

      public void mergeAlignment(File referenceFasta)
      Merges the alignment data with the non-aligned records from the source BAM file.
    • fixNmMdAndUq

      public static void fixNmMdAndUq(htsjdk.samtools.SAMRecord record, htsjdk.samtools.reference.ReferenceSequenceFileWalker refSeqWalker, boolean isBisulfiteSequence)
      Calculates and sets the NM, MD, and and UQ tags from the record and the reference
      Parameters:
      record - the record to be fixed
      refSeqWalker - a ReferenceSequenceWalker that will be used to traverse the reference
      isBisulfiteSequence - a flag indicating whether the sequence came from bisulfite-sequencing which would imply a different calculation of the NM tag. No return value, modifies the provided record.
    • fixUq

      public static void fixUq(htsjdk.samtools.SAMRecord record, htsjdk.samtools.reference.ReferenceSequenceFileWalker refSeqWalker, boolean isBisulfiteSequence)
      Calculates and sets UQ tag from the record and the reference
      Parameters:
      record - the record to be fixed
      refSeqWalker - a ReferenceSequenceWalker that will be used to traverse the reference
      isBisulfiteSequence - a flag indicating whether the sequence came from bisulfite-sequencing. No return value, modifies the provided record.
    • encodeMappingInformation

      public static String encodeMappingInformation(htsjdk.samtools.SAMRecord rec)
      Encodes mapping information from a record into a string according to the format sepcified in the Sam-Spec under the SA tag. No protection against missing values (for cigar, and NM tag). (Might make sense to move this to htsJDK.)
      Parameters:
      rec - SAMRecord whose alignment information will be encoded
      Returns:
      String encoding rec's alignment information according to SA tag in the SAM spec
    • clipForOverlappingReads

      protected static void clipForOverlappingReads(htsjdk.samtools.SAMRecord read1, htsjdk.samtools.SAMRecord read2, boolean useHardClipping)
      Checks to see whether the ends of the reads overlap and clips reads if necessary. For inward facing read pairs, this method will soft clip the 5' end of each read so that the 5' aligned end of each read does not extend past the 3' aligned end of its mate. If useHardClipping is true, this method will additionally hard clip the 5' end of each read if necessary so that the 5' end of each read (including soft clipped bases) does not extend past the 3' end of its mate (including soft clipped bases). Some examples are illustrative: <-MMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMM-> will be soft-clipped to <-SSSMMMMMMMMMMMMMM MMMMMMMMMMMMMMSSS-> and with useHardClip true, this would then be hard-clipped to <-HHHMMMMMMMMMMMMMM MMMMMMMMMMMMMMHHH-> A more complicated example <-MMMMMMMMMMMMMMMSS MMMMMMMMMMMMMMMMM-> will be soft-clipped to <-SSSMMMMMMMMMMMMSS MMMMMMMMMMMMSSSSS-> and with useHardClip true, this would then be hard-clipped to <-HHHMMMMMMMMMMMMSS MMMMMMMMMMMMSSHHH-> Note that the soft-clipping is done such that the clipped starts and ends of each read are the same, and hard-clipping is done such that the unclipped starts and ends of each read are the same.
    • setValuesFromAlignment

      protected void setValuesFromAlignment(htsjdk.samtools.SAMRecord rec, htsjdk.samtools.SAMRecord alignment, boolean needsSafeReverseComplement)
      Sets the values from the alignment record on the unaligned BAM record. This preserves all data from the unaligned record (ReadGroup, NoiseRead status, etc) and adds all the alignment info
      Parameters:
      rec - The unaligned read record
      alignment - The alignment record
    • createNewCigarsIfMapsOffEndOfReference

      public static void createNewCigarsIfMapsOffEndOfReference(htsjdk.samtools.SAMRecord rec)
      Soft-clip an alignment that hangs off the end of its reference sequence. Checks both the read and its mate, if available.
      Parameters:
      rec -
    • updateCigarForTrimmedOrClippedBases

      protected void updateCigarForTrimmedOrClippedBases(htsjdk.samtools.SAMRecord rec, htsjdk.samtools.SAMRecord alignment)
    • getProgramRecord

      protected htsjdk.samtools.SAMProgramRecord getProgramRecord()
    • setProgramRecord

      protected void setProgramRecord(htsjdk.samtools.SAMProgramRecord pg)
    • isReservedTag

      protected boolean isReservedTag(String tag)
    • getHeader

      protected htsjdk.samtools.SAMFileHeader getHeader()
    • resetRefSeqFileWalker

      protected void resetRefSeqFileWalker()
    • isClipOverlappingReads

      public boolean isClipOverlappingReads()
    • setClipOverlappingReads

      public void setClipOverlappingReads(boolean clipOverlappingReads)
    • setHardClipOverlappingReads

      public void setHardClipOverlappingReads(boolean hardClipOverlappingReads)
    • isKeepAlignerProperPairFlags

      public boolean isKeepAlignerProperPairFlags()
    • setKeepAlignerProperPairFlags

      public void setKeepAlignerProperPairFlags(boolean keepAlignerProperPairFlags)
      If true, keep the aligner's idea of proper pairs rather than letting alignment merger decide.
    • setIncludeSecondaryAlignments

      public void setIncludeSecondaryAlignments(boolean includeSecondaryAlignments)
    • close

      public void close()