mir_eval.hierarchy

Evaluation criteria for hierarchical structure analysis.

Hierarchical structure analysis seeks to annotate a track with a nested decomposition of the temporal elements of the piece, effectively providing a kind of “parse tree” of the composition. Unlike the flat segmentation metrics defined in mir_eval.segment, which can only encode one level of analysis, hierarchical annotations expose the relationships between short segments and the larger compositional elements to which they belong.

Conventions

Annotations are assumed to take the form of an ordered list of segmentations. As in the mir_eval.segment metrics, each segmentation itself consists of an n-by-2 array of interval times, so that the i th segment spans time intervals[i, 0] to intervals[i, 1].

Hierarchical annotations are ordered by increasing specificity, so that the first segmentation should contain the fewest segments, and the last segmentation contains the most.

Metrics

References

mir_eval.hierarchy.validate_hier_intervals(intervals_hier)

Validate a hierarchical segment annotation.

Parameters:
intervals_hierordered list of segmentations
Raises:
ValueError

If any segmentation does not span the full duration of the top-level segmentation.

If any segmentation does not start at 0.

mir_eval.hierarchy.tmeasure(reference_intervals_hier, estimated_intervals_hier, transitive=False, window=15.0, frame_size=0.1, beta=1.0)

Compute the tree measures for hierarchical segment annotations.

Parameters:
reference_intervals_hierlist of ndarray

reference_intervals_hier[i] contains the segment intervals (in seconds) for the i th layer of the annotations. Layers are ordered from top to bottom, so that the last list of intervals should be the most specific.

estimated_intervals_hierlist of ndarray

Like reference_intervals_hier but for the estimated annotation

transitivebool

whether to compute the t-measures using transitivity or not.

windowfloat > 0

size of the window (in seconds). For each query frame q, result frames are only counted within q +- window.

frame_sizefloat > 0

length (in seconds) of frames. The frame size cannot be longer than the window.

betafloat > 0

beta parameter for the F-measure.

Returns:
t_precisionnumber [0, 1]

T-measure Precision

t_recallnumber [0, 1]

T-measure Recall

t_measurenumber [0, 1]

F-beta measure for (t_precision, t_recall)

Raises:
ValueError

If either of the input hierarchies are inconsistent

If the input hierarchies have different time durations

If frame_size > window or frame_size <= 0

mir_eval.hierarchy.lmeasure(reference_intervals_hier, reference_labels_hier, estimated_intervals_hier, estimated_labels_hier, frame_size=0.1, beta=1.0)

Compute the tree measures for hierarchical segment annotations.

Parameters:
reference_intervals_hierlist of ndarray

reference_intervals_hier[i] contains the segment intervals (in seconds) for the i th layer of the annotations. Layers are ordered from top to bottom, so that the last list of intervals should be the most specific.

reference_labels_hierlist of list of str

reference_labels_hier[i] contains the segment labels for the i th layer of the annotations

estimated_intervals_hierlist of ndarray
estimated_labels_hierlist of ndarray

Like reference_intervals_hier and reference_labels_hier but for the estimated annotation

frame_sizefloat > 0

length (in seconds) of frames. The frame size cannot be longer than the window.

betafloat > 0

beta parameter for the F-measure.

Returns:
l_precisionnumber [0, 1]

L-measure Precision

l_recallnumber [0, 1]

L-measure Recall

l_measurenumber [0, 1]

F-beta measure for (l_precision, l_recall)

Raises:
ValueError

If either of the input hierarchies are inconsistent

If the input hierarchies have different time durations

If frame_size > window or frame_size <= 0

mir_eval.hierarchy.evaluate(ref_intervals_hier, ref_labels_hier, est_intervals_hier, est_labels_hier, **kwargs)

Compute all hierarchical structure metrics for the given reference and estimated annotations.

Parameters:
ref_intervals_hierlist of list-like
ref_labels_hierlist of list of str
est_intervals_hierlist of list-like
est_labels_hierlist of list of str

Hierarchical annotations are encoded as an ordered list of segmentations. Each segmentation itself is a list (or list-like) of intervals (*_intervals_hier) and a list of lists of labels (*_labels_hier).

**kwargs

additional keyword arguments to the evaluation metrics.

Returns:
scoresOrderedDict

Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.

T-measures are computed in both the “full” (transitive=True) and “reduced” (transitive=False) modes.

Raises:
ValueError

Thrown when the provided annotations are not valid.

Examples

A toy example with two two-layer annotations

>>> ref_i = [[[0, 30], [30, 60]], [[0, 15], [15, 30], [30, 45], [45, 60]]]
>>> est_i = [[[0, 45], [45, 60]], [[0, 15], [15, 30], [30, 45], [45, 60]]]
>>> ref_l = [ ['A', 'B'], ['a', 'b', 'a', 'c'] ]
>>> est_l = [ ['A', 'B'], ['a', 'a', 'b', 'b'] ]
>>> scores = mir_eval.hierarchy.evaluate(ref_i, ref_l, est_i, est_l)
>>> dict(scores)
{'T-Measure full': 0.94822745804853459,
 'T-Measure reduced': 0.8732458222764804,
 'T-Precision full': 0.96569179094693058,
 'T-Precision reduced': 0.89939075137018787,
 'T-Recall full': 0.93138358189386117,
 'T-Recall reduced': 0.84857799953694923}

A more realistic example, using SALAMI pre-parsed annotations

>>> def load_salami(filename):
...     "load SALAMI event format as labeled intervals"
...     events, labels = mir_eval.io.load_labeled_events(filename)
...     intervals = mir_eval.util.boundaries_to_intervals(events)[0]
...     return intervals, labels[:len(intervals)]
>>> ref_files = ['data/10/parsed/textfile1_uppercase.txt',
...              'data/10/parsed/textfile1_lowercase.txt']
>>> est_files = ['data/10/parsed/textfile2_uppercase.txt',
...              'data/10/parsed/textfile2_lowercase.txt']
>>> ref = [load_salami(fname) for fname in ref_files]
>>> ref_int = [seg[0] for seg in ref]
>>> ref_lab = [seg[1] for seg in ref]
>>> est = [load_salami(fname) for fname in est_files]
>>> est_int = [seg[0] for seg in est]
>>> est_lab = [seg[1] for seg in est]
>>> scores = mir_eval.hierarchy.evaluate(ref_int, ref_lab,
...                                      est_hier, est_lab)
>>> dict(scores)
{'T-Measure full': 0.66029225561405358,
 'T-Measure reduced': 0.62001868041578034,
 'T-Precision full': 0.66844764668949885,
 'T-Precision reduced': 0.63252297209957919,
 'T-Recall full': 0.6523334654992341,
 'T-Recall reduced': 0.60799919710921635}