Skip to content

WIP: Numerical sorting of all types#1

Open
jotelha wants to merge 5 commits intoNanoCIPHER-Lab:masterfrom
jotelha:2026-02-05-numerical-sorting-of-all-types
Open

WIP: Numerical sorting of all types#1
jotelha wants to merge 5 commits intoNanoCIPHER-Lab:masterfrom
jotelha:2026-02-05-numerical-sorting-of-all-types

Conversation

@jotelha
Copy link

@jotelha jotelha commented Feb 5, 2026

Dear topotool maintainers,

I am addressing the following issue:

When processing LAMMPS data files that contain more than 9 types of an entity (e.g. atom types, bond types, angle types, dihedral types, improper types), topotools does not preserve the types, but reassigns them in alphabetical order. As an example, I use the following representation of a single DSPC molecule, solute.data:

LAMMPS data file via write_data, version 22 Jul 2025, timestep = 105000, units = real

142 atoms
13 atom types
141 bonds
9 bond types
274 angles
16 angle types
385 dihedrals
22 dihedral types
2 impropers
1 improper types

0 39.4932 xlo xhi
0 39.4932 ylo yhi
12.424915 88.029929 zlo zhi

Masses

1 15.9994
2 12.011
3 12.011
4 12.011
5 15.9994
6 12.011
7 12.011
8 15.9994
9 30.97376
10 14.0067
11 15.9994
12 1.008
13 1.008

Pair Coeffs # lj/cut/coul/long

1 0.21 2.96
2 0.105 3.75
3 0.066 3.5
4 0.066 3.5
5 0.17 3
6 0.066 3.5
7 0.066 3.5
8 0.14 2.9
9 0.2 3.74
10 0.17 3.25
11 0.2 3.15
12 0.03 2.5
13 0.03 2.5

Bond Coeffs # harmonic

1 570 1.229
2 268 1.529
3 340 1.09
4 214 1.327
5 317 1.522
6 320 1.41
7 230 1.61
8 525 1.48
9 367 1.471

Angle Coeffs # harmonic

1 83 123.4
2 80 120.4
3 58.35 112.7
4 37.5 110.7
5 33 107.8
6 83 116.9
7 63 111.1
8 35 109.5
9 50 109.5
10 81 111.4
11 100 120.5
12 45 102.6
13 100 108.23
14 80 111.2
15 50 113
16 140 119.9

Dihedral Coeffs # opls

1 0 5.124 0 0
2 -0.276999 1.228 -0.694 0
3 0 0 0 0
4 1.3 -0.05 0.2 0
5 0 0 0.3 0
6 2.39006e-06 0 0.357 0
7 -3.58509e-06 0 0.384 0
8 2.39006e-06 0 0.302 0
9 -1.22 -0.126001 0.422 0
10 -2.39006e-06 0 0.198 0
11 -1.697 -0.455999 0.585 0
12 4.78011e-06 0 -0.075999 0
13 2.39006e-06 0 -0.553 0
14 -2.39006e-06 0 0.132 0
15 -0.55 0 0 0
16 1.711 -0.5 0.663001 0
17 2.39006e-06 0 0.468 0
18 4.669 5.124 0 0
19 0 2.99 0 0
20 0 -2.4 0.5 0
21 -1.19503e-06 0 0.561999 0
22 1.438 -0.123999 0.264 0

Improper Coeffs # cvff

1 10.5 -1 2

...

When processed with

package require topotools

topo readlammpsdata solute.data

topo writelammpsdata solute_sorted.data

in VMD, the processed result in solute_sorted.data is

LAMMPS data file. CGCMM style. atom_style full generated by VMD/TopoTools v1.10 on Thu Feb 05 13:41:43 JST 2026
 142 atoms
 141 bonds
 274 angles
 385 dihedrals
 2 impropers
 13 atom types
 9 bond types
 16 angle types
 22 dihedral types
 1 improper types
 -0.176319 39.316879  xlo xhi
 3.519344 43.012542  ylo yhi
 12.307331 87.912342  zlo zhi

# Pair Coeffs
#
# 1  1
# 2  10
# 3  11
# 4  12
# 5  13
# 6  2
# 7  3
# 8  4
# 9  5
# 10  6
# 11  7
# 12  8
# 13  9

# Bond Coeffs
#
# 1  1
# 2  2
# 3  3
# 4  4
# 5  5
# 6  6
# 7  7
# 8  8
# 9  9

# Angle Coeffs
#
# 1  1
# 2  10
# 3  11
# 4  12
# 5  13
# 6  14
# 7  15
# 8  16
# 9  2
# 10  3
# 11  4
# 12  5
# 13  6
# 14  7
# 15  8
# 16  9

# Dihedral Coeffs
#
# 1  1
# 2  10
# 3  11
# 4  12
# 5  13
# 6  14
# 7  15
# 8  16
# 9  17
# 10  18
# 11  19
# 12  2
# 13  20
# 14  21
# 15  22
# 16  3
# 17  4
# 18  5
# 19  6
# 20  7
# 21  8
# 22  9

# Improper Coeffs
#
# 1  1

...

i.e types that were originally numbered 10, 11 ..., are now numbered 2, 3, ..., in any Masses and Coeffs section, and the subsequent types are offset accordingly.

Thus, it becomes very cumbersome to reassign non-bonded and bonded parameters correctly.

Ideally, I would like the above topotools read/write snippet to act idempotently, preserving types.

With the help of Claude, see https://claude.ai/share/a54d8e69-f336-4e1e-9bbf-3152fbfefa96, I have applied the changes in this commit to preserve the original numbering of types.

I attach the files in question for reference.

sample_data_files.zip

I suspect that there has been a specific intention in sorting types alphabetically in topotools initially, and thus these changes might break desired behavior elsewhere. I have hence marked this PR as work in progress and hope it can serve to start a discussion and document the somewhat counter-intuitive sorting behavior.

Best,

Johannes

@jotelha
Copy link
Author

jotelha commented Feb 5, 2026

The last commit, 156faf9, addresses another issue that I came across. Missing $ signs led to all masses being set to 1 in the output.

@jvermaas
Copy link

Hi @jotelha , can I ask why you are using number instead of alphanumerics here? I do most of my work with biomolecular force fields that benefit from non-numeric atomtypes, and where the sorting is actually desirable to sort by ascii instead of integers, since we absolutely cannot assume that atomtypes are integers for the general user. If you try list sorting by -integer for a non numeric list, it will immediately fail, so while this may work for just integer types in LAMMPS, it will break for other users.

@jvermaas
Copy link

I also see now that LAMMPS allows alphanumeric types, so topotools looks to be doing things the right way right now. I don't think I'd be opposed to adding a flag to topo writelammpsdata that changes the sorting behavior, but we cannot change the defaults, as it would break most other workflows.

@jotelha
Copy link
Author

jotelha commented Feb 19, 2026

Hi @jvermaas, I see, to be honest, I have never encountered any LAMMPS data file that uses type labels other than integers. I was not aware that this is even possible with LAMMPS. It appears that the feature to use alphanumeric type labels is comparably new (2022, https://docs.lammps.org/Howto_type_labels.html). I am in the niche of computational nanotribology, and we usually deal with mixed systems that model solid-liquid-solid interfaces, e.g. probe in solvent sliding across adsorbed solute on substrate. Here, it's common to have all atom types (and other bonded interaction types) numbered from 1 to N in the LAMMPS data files, and as soon as N > 9, topotools messes up the ordering with its current alphanumeric sorting behavior.

I think introducing a flag to topo writelammpsdata would be the best way. Let's keep this PR open as it is and I will come back to implementing a behavior switch flag when I find some time, if you agree.

Best,

Johannes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants