Added a TODO to start implementation of HED support in annotations by VisLab · Pull Request #13059 · mne-tools/mne-python

VisLab · 2025-01-13T13:33:18Z

In response to #11519

Added a TODO to annotations.py as agreed to get started on implementation of basic-level HED (Hierarchical Event Descriptors) support in mne annotations to allow HED tags to be used in epoching.

welcome · 2025-01-13T13:33:21Z

Hello! 👋 Thanks for opening your first pull request here! ❤️ We will try to get back to you soon. 🚴

for more information, see https://pre-commit.ci

mne/annotations.py

drammock · 2025-01-13T22:43:08Z

mne/annotations.py

+                f"Number of HED tags ({len(hed_tags)}) must match the number of "
+                f"annotations ({len(self)})."
+            )
+        # TODO insert validation of HED tags here


I might need clarification here. My understanding is that this is at the stage of annotations for the continuous data before epoching. We would have the message "Number of HED strings..." since HED tags refer to the individual tags rather than the comma-separated list. Yes this is where it would occur. The validator would take strings in. If there are validation errors, how would they be reported?

The variable hed_tags is a list of strings. I'm assuming you would pass that to a validation func in your hedtools library, and it would raise an error if the strings represent tags that aren't in the hierarchy? Is that not how things work?

The validator returns a list of dicts with the issues. There is a function to get printable strings out of this. For your use case, we would probably want to validate the entire list. The validators take a ErrorHandler which manages the context so that it can identify which element had the error (and which file if this is applicable).

We could wrap this to raise an error. For the current situation are you validating each sublist individually or are you doing it by event?

(Answering more thoroughly now that I'm at my desk instead of my phone):

HED tags refer to the individual tags rather than the comma-separated list

OK, we should change the variable name from hed_tags to hed_strings then

My understanding is that this is at the stage of annotations for the continuous data before epoching.

yes. Here I'm assuming that each annotated segment will have a single string (containing comma-separated tags) associated with it (i.e., it's neither necessary nor allowed to associate a list of HED strings with a single annotation). Correct me if I'm wrong about that please.

The validator would take strings in [...] The validator returns a list of dicts with the issues.

OK, so maybe something like:

hed_results = func_that_validates_list_of_HED_strings(hed_strings) # or if the validator takes in single strings instead of a list, then maybe # hed_results = list(map(func_that_validates_one_HED_string, hed_strings)) if any(map(len, hed_results)): err_strings = list(map(func_that_gets_printable_strings, hed_results)) raise ValueError( "Some HED strings in your annotations failed to validate:\n" "\n - ".join(err_strings) )

For the current situation are you validating each sublist individually or are you doing it by event?

not sure what "sublist" is here, do you mean "list of HED strings, of same length as the number of annotated events"? We can structure it so that each hed_string is passed individually to a validator function, or so that the list of HED strings (one per annotation span) is passed to the validator all at once. On our end it doesn't matter, it's determined by what your API is designed to take in. If your API can handle either case, then I'd decide based on which option yields cleaner, clearer, simpler code for us. Can you point me to the validator function(s) that we're talking about here, so I can look at their params/returns?

yes. Here I'm assuming that each annotated segment will have a single string (containing comma-separated tags) associated with it (i.e., it's neither necessary nor allowed to associate a list of HED strings with a single annotation). Correct me if I'm wrong about that please.

Yes there will be a single string but I need clarification on what an annotated segment is. A HED annotation would ordinarily be associated with a single time marker. Assuming that you are not using the Onset, Inset, Offset mechanism, but rather sticking to Duration this would be fine.

NOTE: If you have a table of events (with onset and HED annotations), you can also compute the (Event-context, (tags representing ongoing event processes)). I assume we would not be using that -- but that MNE would take care of that.

Can you point me to the validator function(s) that we're talking about here, so I can look at their params/returns?

For individual strings it is the HedValidator class. The input string would be converted to a HedString before validation. Here is a rough example (you only need one HedValidator for all of the strings).

error_handler = ErrorHandler(check_for_warnings=False) validator = HedValidator(schema) hed_obj = HedString(mystring, schema) issues = validator.validate(hed_obj, False, error_handler=error_handler) issue_str = get_printable_issue_string(issues)

If you want to validate an entire column (with header "HED"), you would create a TabularInput and call its validate method.

I need clarification on what an annotated segment is. A HED annotation would ordinarily be associated with a single time marker. Assuming that you are not using the Onset, Inset, Offset mechanism, but rather sticking to Duration this would be fine.

in MNE Annotations (and by extension, HEDAnnotations), each "annotation span" has attributes onset, duration, description (and for HEDAnnotations, also hed_string). Duration may be zero.

If you have a table of events (with onset and HED annotations), you can also compute the (Event-context, (tags representing ongoing event processes)). I assume we would not be using that -- but that MNE would take care of that.

I think that is something we can handle later in a separate PR, once the HEDAnnotation class is in place. IIUC, that would be useful when creating epochs (i.e., create epochs around button press events, but only within an event context of "response window is open")

drammock · 2025-01-13T22:46:46Z

mne/annotations.py

+    def append(self, onset, duration, description, ch_names=None):
+        """TODO."""
+        pass
+
+    def count(self):
+        """TODO. Unlike Annotations.count, keys should be HED tags not descriptions."""
+        pass
+
+    def crop(
+        self, tmin=None, tmax=None, emit_warning=False, use_orig_time=True, verbose=None
+    ):
+        """TODO."""
+        pass
+
+    def delete(self, idx):
+        """TODO."""
+        pass
+
+    def to_data_frame(self, time_format="datetime"):
+        """TODO."""
+        pass


@VisLab these TODOs are for me. So as you can see some things aren't going to work yet, but we're already at least able to do:

$ ipython In [1]: import mne In [2]: foo = mne.HEDAnnotations([0, 1], [0.5, 1.2], ['foo', 'bar'], ['hed/foo', 'hed/ ...: bar']) In [3]: foo Out[3]: <HEDAnnotations | 2 segments: hed/bar (1), hed/foo (1)>

Not completely sure what these do, but would be willing to help as needed. Would the get_annotations_per_epoch then have an additional list for HED annotations in the list of lists?

Thanks @drammock

drammock

updated MWE:

$ ipython
In [1]: import mne
In [2]: foo = mne.HEDAnnotations([0, 1], [0.5, 1.2], ['foo', 'bar'], ['sensory-eve
   ...: nt, visual-presentation, (blue, square)', 'agent-action, (push, (left, mou
   ...: se-button))'])
In [3]: foo
Out[3]: <HEDAnnotations | 2 segments: agent-action, (push, (left, mouse-button)) ...>

@VisLab the HED strings are a bit longer than I expected, so in the HEDAnnotations object repr (last line of output above) we only see one before the ellipsis kicks in. Does your library have a built-in way to show a "compact" representation of a HED String? If so I'd like to use it in the repr

drammock · 2025-01-31T20:51:00Z

mne/annotations.py

+        duration,
+        description,
+        hed_strings,
+        hed_version="8.3.0",  # TODO @VisLab what is a sensible default here?


@VisLab what is a sensible default for schema version?

8.3.0 is good. We will be releasing 8.4.0 soon, which has better support for linking to other ontologies, but it is backwards compatible with 8.3.0 as far as actual annotations go.

drammock · 2025-01-31T20:55:01Z

mne/annotations.py

+        return (
+            super().__eq__(self, other)
+            and np.array_equal(self.hed_strings, other.hed_strings)
+            and self.hed_version == other.hed_version


@VisLab if we want to compare equality of two HEDAnnotations objects, and we know already that their HED Strings are equivalent, should we care that they were validated with different HED schema versions?

The tags are the same, but it should be re-validated using the latest version of the schema.

RE: Although once a tag is in the schema, it is always there (unless there is a major version change which we don't anticipate and even then -- every effort would be made to keep tags). This being said, the schema tags have attributes which may affect how they are validated -- also they might also have a different path in the hierarchy as upper level tags are added. (That is why the annotations should use the short form as much as possible and use tools to expand if needed.)

In other words -- if you have two datasets and they have different versions of the schema then I think it should work if you revalidate using the latest of the two versions of the schema. (Am I correctly understanding that within a given dataset the files would use a single version of HED?)

Maybe rephrasing my question will help: if I have 2 HED Strings, and as strings they are identical (i.e., they both say "sensory-event, visual-presentation, (blue, square)"), does it even make sense to say "these aren't equal" simply because one was validated against schema version X and another was validated against schema version X+1? When I phrase the question that way, the answer seems obvious to me: schema version doesn't matter when comparing equality of the strings (and thus there is no point to doing the extra computation of re-validating the strings against the newer schema version when testing their equality). But I'm still not clear on whether you'd agree with that.

(this might also help: MNE-Python does not deal with datasets. That is the job of MNE-BIDS. Within MNE-BIDS, I think it is safe to assert/require that only one schema version is used to do all validation of annotations within that dataset. So the question I'm asking is really about the collection of HED strings attached to a single recording and what counts as "the same" when talking about those HED strings)

All of the tools require a single non-conflicting schema version specification. So we are talking about whether two HedString are the same. In looking at the HedString code, it looks like it assumes that the tags have been "sorted" within the string. This is a method sort n the parent class HedGroup , which updates the order internally to put it in "canonical form". On this version the __eq__ method detects whether two HedString objects are the same.

we are talking about whether two HedString are the same.

not in this context! I think that is the root of our misunderstanding.

The code (in MNE-Python) that this question is attached to checks equality of HEDAnnotations objects. As part of that, I'm proposing that it should look at the hed_strings entries, and maybe also at the schema versions. In that context, the HED Strings are just plain python strings (type str), they are not hed.HedString instances, and the schema is stored only as a version string (e.g., "8.3.0").

So the question is, how should we assess "equality" of two HEDAnnotations instances? In particular, do we:

check equality of the strings as strings, and call it good?

also check equality of the schema version strings?

convert all entries in hed_strings from str type to HedString type using provided schema version(s), and use the hed library to then assess equality of the HedString instances?

My original question of "should we care about schema version when testing equality of HEDAnnotations objects?" could be rephrased as "should be just do (1) or should we also do (2)?" but I'm now adding option (3) for clarity, since you've explained how equality is tested in your library.

I'll note that it's not (yet) obvious to me that there's added value from the extra computations involved in (3) in the context of testing equality of HEDAnnotations objects, so if you think (3) is the best choice, could you explain why you think so (perhaps by giving an example where 2 identical strings would be parsed as meaningfully different under different schema versions)?

and 2) will definitely not work. HED strings are unordered, so (A, (B, C)) is the same as ((C,B), A). There is only one option. At some point you have to convert to HedString objects and apply sort. It is possible to dump out the sorted form as a "canonical" str. I would not recommend doing this until after it validates, since users like to see where in the string an error occurs. You have to convert to HedString to validate, so there might be a convenient time there to store the canonical form in the annotation.

The only option is to compare as HedString

(A, (B, C)) is the same as ((C,B), A)

OK! that's pretty clear motivation for (3).

One more question: by the time we're checking equality of HEDAnnotations objects, we'll already have validated the hed_strings variable (validation happens upon object creation, or whenever the strings are changed). As you say, at validation time we have the option of converting to or storing (in a separate attribute) the canonical strings and/or the HedString objects as part of the HEDAnnotations object. I lean away from storing the HedString objects inside the HEDAnnotations object because it will make saving to disk in .fif format much more complicated. But I like the idea of storing the canonical-form strings in a separate (private) attribute. Then, during equality checking, could we just compare the canonical strings and not need to re-convert to HedString (even if the schema versions used to validate the two objects were different)?

VisLab · 2025-04-29T11:22:37Z

mne/annotations.py

+            for hs in self._hed_strings
        ]
-        if any(map(len, error_strs)):
+        error_strings = [self.hed.get_printable_issue_string(issue) for issue in issues]


You could just do:
error_string = self.hed.get_printable_issue_string(issues)

Also, do you want to add links to the error descriptions in the HED specification? If so, you could do:

error_string = self.hed.get_printable_issue_string(issues, add_link=True)

Are there setting in MNE for just reporting errors and not warnings?

VisLab · 2025-04-29T12:48:11Z

Once this is merged, I would be happy to contribute to tests, documentation, and examples. (I'll need some guidance on where it should go and style.)

bruAristimunha · 2026-02-16T21:23:55Z

Hey @VisLab and @drammock, can I help somehow?

VisLab · 2026-02-17T11:52:53Z

@drammock - Could you provide an update on where we are with this.? HED is in the process of being added in the core NWB and is already in BIDS. I understand that there might have been some recent updates to event handling in mne-python. This might be a good time to revisit. ( @neuromechanist )

drammock · 2026-02-17T15:57:53Z

@bruAristimunha it's been a long time since I touched this (it ended up being more work than I expected). I think all that's left to implement is crop() and to_data_frame() so if you're up for it, feel free to PR-into-my-branch to add those (and update the test to test them). @VisLab I'm not sure what updates to event handling you're referring to, though perhaps @PierreGtch's recent MNE-BIDS changes re: metadata on annotation spans might need to be taken account of here.

bruAristimunha · 2026-02-17T22:02:17Z

I can work on this! @VisLab can you invite for your mne fork? This allow me to commit here

VisLab · 2026-02-17T22:59:41Z

I can work on this! @VisLab can you invite for your mne fork? This allow me to commit here](#13059 (comment))

I have invited you as a collaborator. Thanks!

# Conflicts: # mne/annotations.py # mne/tests/test_annotations.py

The try/except block was redundant because _check_o_d_s_c_e() validates all inputs before super().append() is called with the same pre-validated data. The rollback was also incomplete, only reverting hed_string while leaving base class arrays inconsistent.

bruAristimunha · 2026-02-20T22:57:48Z

ping @drammock and @VisLab, should be fine now :)

drammock

thanks @bruAristimunha
I had a quick look through the diff, leaving a few comments. Will try to do a more thorough test-drive / review next week.

drammock · 2026-02-20T23:06:27Z

mne/tests/test_annotations.py

+    # slice/list indexing should preserve HED string alignment
+    subset = ann[:2]
+    assert list(subset.hed_string) == [good_values["word"], good_values["tone"]]
+    picked = ann[[0, 2]]
+    assert list(picked.hed_string) == [good_values["word"], good_values["square"]]


cut this? it should be already guaranteed by the getitem test above, right? I mean: we don't do anything different/custom with how slices or lists of indices are handled, we just triage those to the superclass.

drammock · 2026-02-20T23:21:32Z

mne/annotations.py

+            hed_version=self._hed_version,
+            orig_time=self.orig_time,
+            ch_names=result.ch_names,
+            extras=result.extras,


will this fail if with_extras=False? (because result.extras won't exist?)

drammock · 2026-02-20T23:25:31Z

mne/annotations.py

+
+            .. versionadded:: 1.7


delete this; the versionadded is inaccurate for this class

drammock · 2026-02-20T23:26:46Z

mne/tests/test_annotations.py

+    # test to_data_frame()
+    ann_df = HEDAnnotations(
+        onset=[1, 3, 5],
+        duration=[0.5, 0.5, 0.5],
+        description=["a", "b", "c"],
+        hed_string=[good_values["press"], good_values["tone"], good_values["square"]],
+    )
+    pytest.importorskip("pandas")
+    df = ann_df.to_data_frame()
+    assert "hed_string" in df.columns
+    assert list(df["hed_string"]) == [
+        good_values["press"],
+        good_values["tone"],
+        good_values["square"],
+    ]
+    assert list(df["description"]) == ["a", "b", "c"]
+    assert_allclose(df["duration"], [0.5, 0.5, 0.5])


probably good to make the DataFrame stuff as a separate test, since it involves an importorskip

bruAristimunha · 2026-02-21T04:29:59Z

mne/annotations.py

+
+            .. versionadded:: 1.7


Suggested change

.. versionadded:: 1.7

Added a TODO to start implementation of HED support in annotations

8f0c018

pre-commit-ci bot and others added 2 commits January 13, 2025 13:33

[pre-commit.ci] auto fixes from pre-commit.com hooks

6f6ccdc

for more information, see https://pre-commit.ci

add sketch of HEDAnnotations [ci skip]

c31e837

drammock force-pushed the hed-annotations branch from 1d5fa0b to c31e837 Compare January 13, 2025 22:45

drammock reviewed Jan 13, 2025

View reviewed changes

VisLab and others added 6 commits January 31, 2025 06:25

Merge branch 'main' into hed-annotations

a593841

rename hed_tags -> hed_strings

b3183d3

remove unnecessary TODOs

4065433

add basic validation

5240041

fix err message indentation

40311f0

don't call it foo

1485356

drammock reviewed Jan 31, 2025

View reviewed changes

drammock mentioned this pull request Feb 25, 2025

Consider reverting lazy imports #13121

Open

drammock added 13 commits April 9, 2025 17:41

store HedString objects; use short form for repr

75a4d4f

get validation working on __setitem__

fd7933d

better repr, better test, use singular form

456830a

Merge branch 'main' into hed-annotations

6d08c67

simplify

233f515

make it xrefable

5248e9e

importorskip

7c55d59

docstring

f99ac08

load schema only once

1565956

clean up test

aed0b69

fill doc

5959c64

codecomment

30aaade

serialization

0c79618

drammock requested review from agramfort, dengemann and larsoner as code owners April 16, 2025 22:32

use internal version comparator

a2c01fc

VisLab commented Apr 29, 2025

View reviewed changes

drammock added 3 commits April 29, 2025 10:51

get append and sort working

ef9d563

get count working

4a5364e

get delete() working

308b933

bruAristimunha added 4 commits February 20, 2026 23:13

Move HED crop logic to subclass and fix test lint

bd77fc9

Merge remote-tracking branch 'upstream/main' into hed-annotations

b2f870a

# Conflicts: # mne/annotations.py # mne/tests/test_annotations.py

Align HEDAnnotations extras behavior with base annotations

a3d1436

drammock reviewed Feb 20, 2026

View reviewed changes

bruAristimunha reviewed Feb 21, 2026

View reviewed changes

mne/annotations.py

Comment on lines +1327 to +1328

.. versionadded:: 1.7

Copy link

Contributor

bruAristimunha Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

.. versionadded:: 1.7

Uh oh!

Comments

Conversation

VisLab commented Jan 13, 2025

Uh oh!

welcome bot commented Jan 13, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VisLab Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drammock left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VisLab Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VisLab commented Apr 29, 2025

Uh oh!

bruAristimunha commented Feb 16, 2026

Uh oh!

VisLab commented Feb 17, 2026

Uh oh!

drammock commented Feb 17, 2026

Uh oh!

bruAristimunha commented Feb 17, 2026

Uh oh!

VisLab commented Feb 17, 2026

Uh oh!

bruAristimunha commented Feb 20, 2026

Uh oh!

drammock left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

VisLab Jan 31, 2025 •

edited

Loading

VisLab Feb 3, 2025 •

edited

Loading