Skip to content

Conversation

@ElliottKasoar
Copy link
Collaborator

@ElliottKasoar ElliottKasoar commented Dec 5, 2025

As discussed in aiidateam/aiida-workgraph#735, adds deserializer for TrajectoryData, allowing this to be passed to a task.

Also adds a Trajectory data class, which allows a list of Atoms to be returned. For example:

from pathlib import Path

from aiida.orm import StructureData, TrajectoryData
from aiida import load_profile
from aiida_workgraph import task
from ase.io import read
from aiida_pythonjob.data.atoms import Trajectory
from aiida_workgraph import WorkGraph, task
from aiida import load_profile

load_profile()

struct = read("../structures/NaCl-traj.xyz")
traj = [struct, struct]
struct_data = StructureData(ase=struct)
traj_data = TrajectoryData([struct_data, struct_data])
trajectory = Trajectory(traj)

wg = WorkGraph(name='test')

@task
def test_func(x):
    print(x)
    if isinstance(x, list):
        x = Trajectory(x)

    return x

wg = WorkGraph("test_wg")

# wg.inputs.x = struct
# wg.inputs.x = struct_data
# wg.inputs.x = traj
# wg.inputs.x = traj_data
wg.inputs.x = trajectory

wg.add_task(test_func, "test", x=wg.inputs.x)
wg.outputs.result = wg.tasks.test.outputs.result

# Run the WorkGraph
wg.run()

Of these wg.inputs.x, everything apart from the raw list of ASE Atoms (traj) can be successfully input and output from the task.

I've tried to add tests for these, but couldn't find a way to test the serialisation of the StructureData and TrajectoryData, as these seem to shortcut to just returning the same data: https://github.com/aiidateam/aiida-pythonjob/blob/main/src/aiida_pythonjob/data/serializer.py#L115

@codecov-commenter
Copy link

codecov-commenter commented Dec 5, 2025

Codecov Report

❌ Patch coverage is 94.44444% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.26%. Comparing base (99d5cd6) to head (c8893e7).

Files with missing lines Patch % Lines
src/aiida_pythonjob/data/atoms.py 97.05% 1 Missing ⚠️
src/aiida_pythonjob/data/deserializer.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #63      +/-   ##
==========================================
+ Coverage   90.14%   90.26%   +0.11%     
==========================================
  Files          22       22              
  Lines        1228     1263      +35     
==========================================
+ Hits         1107     1140      +33     
- Misses        121      123       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@superstar54 superstar54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ElliottKasoar , thanks for the work. This is indeed important for the community.

I have one suggestion. In pythonjob, we keep one raw Python class <–-> one Data node. In this PR, I only see a Trajectory class that inherits the AiiDA Data class. So I think you still need to add a raw Python class to represent a list of Atoms, e.g.,

from typing import Iterable
from ase import Atoms

class AtomsTrajectory(list):
    """List of ASE Atoms representing a trajectory."""

    def __init__(self, frames: Iterable[Atoms] = ()):
        super().__init__(frames)

    def append(self, item: Atoms) -> None:
        if not isinstance(item, Atoms):
            raise TypeError(f'AtomsTrajectory only accepts ase.Atoms, got {type(item)}')
        super().append(item)

    def extend(self, items: Iterable[Atoms]) -> None:
        for item in items:
            self.append(item)

The name Trajectory is too broad, and may be confused with the TrjaeoctryData from aiida-core. So I suggest using AtomsTrajectoryData. So in the entry point, we write this:

[project.entry-points."aiida.data"]
"pythonjob.ase.trajectory.AtomsTrajectory": "aiida_pythonjob.data.trajectory:AtomsTrajectoryData"

Here is an example of using it.

@task
def make_supercell(trajectory: AtomsTrajectory):
    return AtomsTrajectory([atoms*[2, 2, 2] for atoms in trajectory])

Comment on lines +80 to +82
data = Trajectory([Atoms("C"), Atoms("C")])
serialized_data = general_serializer(data, serializers=all_serializers)
assert isinstance(serialized_data, Trajectory)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused on this part. The input data is Trajectory, and then it is serialized to a Trajectory again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants