Class ClusterCrosscheckMetrics

java.lang.Object
picard.cmdline.CommandLineProgram
picard.fingerprint.ClusterCrosscheckMetrics

@DocumentedFeature public class ClusterCrosscheckMetrics extends CommandLineProgram

Summary

Clusters the results from a CrosscheckFingerprints run according to the LOD score. The resulting metric file can be used to assist diagnosing results from CrosscheckFingerprints. It clusters the connectivity graph between the different groups. Two groups are connected if they have a LOD score greater than the LOD_THRESHOLD.

Details

The results of running CrosscheckFingerprints can be difficult to analyze, especially when many groups are related (meaning LOD greater than LOD_THRESHOLD) in non-transitive manner (A is related to B, B is related to C, but A doesn't seem to be related to C.) ClusterCrosscheckMetrics clusters the metrics from CrosscheckFingerprints so that all the groups in a cluster are related to each other either directly, or indirectly (thus A, B and C would end up in one cluster.) Two samples can only be in two different clusters if all the samples from these two clusters do not get high LOD scores when compared to each other.

Example

     java -jar picard.jar ClusterCrosscheckMetrics \
              INPUT=sample.crosscheck_metrics \
              LOD_THRESHOLD=3 \
              OUTPUT=sample.clustered.crosscheck_metrics
 
The resulting file, consists of the ClusteredCrosscheckMetric class and contains the original crosscheck metric values, for groups that end-up in the same clusters (regardless of LOD score of each comparison). In addition it notes the ClusteredCrosscheckMetric.CLUSTER identifier and the size of the cluster (in ClusteredCrosscheckMetric.CLUSTER_SIZE.) Groups that do not have high LOD scores with any other group (including itself!) will not be included in the metric file. Note that cross-group comparisons are not included in the metric file.
  • Field Details

    • INPUT

      @Argument(shortName="I", doc="The cross-check metrics file to be clustered.") public File INPUT
    • OUTPUT

      @Argument(shortName="O", optional=true, doc="Output file to write metrics to. Will write to stdout if null.") public File OUTPUT
    • LOD_THRESHOLD

      @Argument(shortName="LOD", doc="LOD score to be used as the threshold for clustering.") public double LOD_THRESHOLD
  • Constructor Details

    • ClusterCrosscheckMetrics

      public ClusterCrosscheckMetrics()
  • Method Details

    • doWork

      protected int doWork()
      Description copied from class: CommandLineProgram
      Do the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately.
      Specified by:
      doWork in class CommandLineProgram
      Returns:
      program exit status.