Add support for parsing Gregorian dates in standard text formats #160

rlskoeser · 2026-02-12T19:45:26Z

resolves #127

Uses a code-generation script to create the grammar for month names (full and abbreviated) based on a list of languages. The approach for the hatch codegen script is based on a conversation with claude.ai

Summary by CodeRabbit

New Features
- Gregorian date parsing added to the combined parser with flexible day/month/year orders; recognizes full and abbreviated month names (English, French, German, Spanish, Kinyarwanda, Ganda, Tigrinya) and ignores periods/commas.
Tests
- New tests covering parsing, parsing errors, transformation, and converter behavior across precisions and locales.
Documentation
- Added instructions for regenerating the multilingual month-name grammar.
Chores
- Added a generation script and build configuration to produce the multilingual month grammar.

coderabbitai · 2026-02-12T19:45:48Z

Warning

Rate limit exceeded

@rlskoeser has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 58 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

Adds Gregorian date parsing and integration: new Lark grammar and autogenerated multilingual month tokens, a module-level parser, transformer, and converter with parse(), combined-parser registration, tests, a codegen script, and a hatch env entry to regenerate month grammar.

Changes

Cohort / File(s)	Summary
Gregorian parser & transformer `src/undate/converters/calendars/gregorian/parser.py`, `src/undate/converters/calendars/gregorian/transformer.py`, `src/undate/converters/calendars/gregorian/converter.py`, `src/undate/converters/calendars/gregorian/__init__.py`	Adds a module-level Lark parser, a GregorianDateTransformer (tree→Undate), a GregorianDateConverter with `parse()` and transformer initialization, and exports the converter.
Gregorian grammars `src/undate/converters/grammars/gregorian.lark`, `src/undate/converters/grammars/gregorian_multilang.lark`	Adds a new gregorian grammar supporting flexible token orders and day/year rules; replaces month tokens with multiline regex-based multilingual month rules (month_1…month_12).
Combined parser integration `src/undate/converters/combined.py`, `src/undate/converters/grammars/combined.lark`	Registers Gregorian transformer in the combined transformer, imports the gregorian_date rule into the omnibus grammar, adds a PUNCTUATION token/ignore, and re-enables Hebrew/Islamic overrides.
Codegen & build config `scripts/generate_gregorian_grammar.py`, `pyproject.toml`	Adds a script to generate `gregorian_multilang.lark` using Babel month names and a Hatch env `[tool.hatch.envs.codegen]` to run it.
Tests `tests/test_converters/test_calendars/test_gregorian/*`, `tests/test_converters/test_combined_parser.py`	Adds parser/transformer/converter unit tests for Gregorian inputs (multilingual months, precisions, label preservation) and appends Gregorian cases to combined parser tests.
Changelog & docs `CHANGELOG.md`, `DEVELOPER_NOTES.md`	Adds a 0.7 changelog entry describing month-name multilingual support and dev notes describing how to regenerate the multilingual month grammar.
New file (generated) `src/undate/converters/grammars/gregorian_multilang.lark`	Autogenerated month token regex rules covering English, French, German, Spanish, Kinyarwanda, Ganda, Tigrinya (month_1…month_12).

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant Converter as GregorianDateConverter
    participant Parser as gregorian_parser
    participant Transformer as GregorianDateTransformer
    participant UndateObj as Undate

    User->>Converter: parse("26 Ugushyingo 2022")
    Converter->>Parser: parse(value)
    Parser->>Parser: tokenize & match gregorian_date
    Parser-->>Converter: parse_tree
    Converter->>Transformer: transform(parse_tree)
    Transformer->>Transformer: extract year, month, day
    Transformer-->>Converter: UndateObj
    Converter-->>User: return Undate (label set)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Release v0.6 #150 — Adds combined parser/transformer framework components that this Gregorian integration builds upon.
Experimental combined / omnibus date parser #112 — Related omnibus/combined parser and transformer infrastructure extended by these changes.
Rename formatters submodule and classes to converters #101 — Introduces converters package patterns and registrations that the new Gregorian converter follows.

Suggested reviewers

robcast
ColeDCrawford
jdamerow

Poem

🐰
I hopped through months in tongues and scripts,
From Jan to ጥሪ with tiny skips,
I stitched a grammar, neat and light—
Now dates from many lands take flight! 🥕📅

🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 21.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection	⚠️ Warning	❌ Merge conflicts detected (8 files): ⚔️ `CHANGELOG.md` (content) ⚔️ `DEVELOPER_NOTES.md` (content) ⚔️ `pyproject.toml` (content) ⚔️ `src/undate/converters/calendars/hebrew/transformer.py` (content) ⚔️ `src/undate/converters/calendars/islamic/transformer.py` (content) ⚔️ `src/undate/converters/combined.py` (content) ⚔️ `src/undate/converters/grammars/combined.lark` (content) ⚔️ `tests/test_converters/test_combined_parser.py` (content) These conflicts must be resolved before merging into `develop`.	Resolve conflicts locally and push changes to this branch.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main feature: adding Gregorian date parsing in standard text formats with multilingual month name support.
Linked Issues check	✅ Passed	The PR implements all coding requirements from issue `#127`: multilingual month name parsing for Kinyarwanda, Ganda, and Tigrinya languages alongside English, French, German, and Spanish.
Out of Scope Changes check	✅ Passed	All changes directly support Gregorian date parsing with multilingual month names; no unrelated modifications detected outside the scope of issue `#127`.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/gregorian-human-lang-parsing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-02-12T19:46:02Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.00%. Comparing base (0691b12) to head (3efce6d).
⚠️ Report is 7 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #160      +/-   ##
===========================================
+ Coverage    98.96%   99.00%   +0.04%     
===========================================
  Files           40       45       +5     
  Lines         2121     2216      +95     
===========================================
+ Hits          2099     2194      +95     
  Misses          22       22

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/undate/converters/combined.py (1)
1-5: ⚠️ Potential issue | 🟡 Minor

Module docstring not updated to include Gregorian.

The module-level docstring on line 2 still says "Supports EDTF, Hebrew, and Hijri" while the class docstring on line 50 was correctly updated to include Gregorian.
Proposed fix
 """
-**Experimental** combined parser. Supports EDTF, Hebrew, and Hijri
+**Experimental** combined parser. Supports EDTF, Gregorian, Hebrew, and Hijri
 where dates are unambiguous. (Year-only dates are parsed as EDTF in
 Gregorian calendar.)
 """

🤖 Fix all issues with AI agents

In `@scripts/generate_gregorian_grammar.py`:
- Around line 37-46: The grammar currently preserves Babel's exact month casing
(from get_month_names) causing case-sensitive parsing; normalize month names to
a consistent case before adding to all_month_names and emitting the grammar:
when iterating months in the nested loops, apply month_name =
month_name.strip(".").lower() (or .upper() if you prefer) and deduplicate using
that normalized value so the generated terminals are lowercase and the parser
accepts case-insensitive inputs; update any downstream places that consume
all_month_names to expect the normalized form.

In `@src/undate/converters/calendars/gregorian/converter.py`:
- Line 103: The inline comment in
src/undate/converters/calendars/gregorian/converter.py incorrectly mentions
"Hebrew" (copy-paste error); update the comment near the Gregorian parser
invocation (the parsing block in the converter module where the string is
parsed) to correctly say "Gregorian" (e.g., replace "# parse the string with our
Hebrew date parser" with a comment referencing the Gregorian date parser) so it
accurately reflects the behavior of the parse/convert routine in this converter.
- Around line 110-111: The except clauses in the calendar converters currently
catch only UnexpectedCharacters; update them to catch
lark.exceptions.UnexpectedInput instead (or import UnexpectedInput from
lark.exceptions) so UnexpectedToken and UnexpectedEOF are also handled and
re-raised as ValueError; apply this change in the Gregorian converter
(converter.py) and the same pattern in the hebrew, islamic, edtf, and combined
converter modules, ensuring the corresponding except blocks (the ones that
currently read "except UnexpectedCharacters as err") are replaced and that the
module imports include UnexpectedInput where needed.

In `@src/undate/converters/grammars/combined.lark`:
- Around line 4-7: The global `%ignore PUNCTUATION` directive in combined.lark
removes the comma token needed by the Hebrew and Islamic grammars (they define
`comma: ","` and use it in their date rules), so restore comma token visibility
by removing or narrowing that ignore: either delete `PUNCTUATION`/`%ignore
PUNCTUATION` from combined.lark, or change the approach so commas are not
ignored (e.g., apply punctuation-ignore only inside the Gregorian subgrammar or
adjust the PUNCTUATION pattern to exclude ","), ensuring the `comma` rule in the
Hebrew and Islamic grammar can be lexed and matched.

In `@src/undate/converters/grammars/gregorian_multilang.lark`:
- Around line 4-15: The grammar is case-sensitive so lowercase user input like
"january" won't match; update the generator or parsing flow: modify
scripts/generate_gregorian_grammar.py to emit month literals in a single case
(e.g., all lowercase) OR (simpler) normalize text before parsing by lowercasing
inputs in the converter that invokes the Lark parser for
gregorian_multilang.lark (i.e., call .lower() on the raw date string before
passing it to Lark), and ensure the generated grammar's month alternatives are
also lowercased (regenerate the .lark file) so names line up.

🧹 Nitpick comments (2)

tests/test_converters/test_calendars/test_gregorian/test_gregorian_transformer.py (1)

21-31: Consider adding a test case for year+month order (e.g., "1960 Jan").

The transformer tests don't cover the year-before-month ordering that the parser tests include (e.g., "1900 Feb" in the parser tests). Adding one here would validate the transformer handles all orderings correctly end-to-end.

src/undate/converters/grammars/combined.lark (1)

42-43: Consider uncommenting the Gregorian override to avoid year-only ambiguity.

The comment acknowledges that year-only parsing should be covered by EDTF, yet the gregorian__gregorian_date import on line 32 brings in the full grammar including the year-only alternative. A standalone year like "932" would be ambiguous between edtf__start and gregorian__gregorian_date in the combined parser. The Hebrew and Islamic grammars have their %override directives active (lines 37, 40) for the same reason.

Is there a specific reason this override is commented out? If not, activating it would be consistent with the approach for the other calendars.

scripts/generate_gregorian_grammar.py

src/undate/converters/calendars/gregorian/converter.py

src/undate/converters/grammars/combined.lark

src/undate/converters/grammars/gregorian_multilang.lark

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@scripts/generate_gregorian_grammar.py`:
- Around line 1-11: The docstring in the script generate_gregorian_grammar.py
contains a typo "regeneate" on the line describing how to run the script; update
that word to "regenerate" in the module docstring so the usage text reads "Run
this script with hatch to regenerate the file::".

🧹 Nitpick comments (1)

DEVELOPER_NOTES.md (1)
94-103: Use a fenced code block for consistency with the rest of the file.

The indented code block on Line 101 and the trailing :: on Line 99 (an RST convention) are inconsistent with the fenced (```) style used elsewhere in this Markdown file. Also flagged by markdownlint (MD046).
Suggested fix
-library.  To regenerate, run the script with hatch (which should
-be installed globally)::
+library.  To regenerate, run the script with hatch (which should
+be installed globally):
 
-    hatch run codegen:generate
-    
+```sh
+hatch run codegen:generate
+```
+
 When the `.lark` file is modified by the script, it must be committed to git.

scripts/generate_gregorian_grammar.py

rlskoeser added 10 commits February 12, 2026 12:30

Preliminary gregorian grammer, parser, and tests

c37a1fe

Fully implement script to generate month names for Gregorian parser

3c58c2e

Grammar with month names in multiple languages

01ebe5e

Import and use all month names

fb29061

Don't repeat month names / abbreviations

6d6259b

Add more test cases in multiple languages

294e573

Test gregorian parser transformer; refine parsing logic

f654e83

Connect parsing to gregorian converter class and test

5b5d89f

Add Gregorian to omnibus parser

6600f58

Document Gregorian parser & languages in change log

2bd8c23

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

Add dev notes for codegen script; drop uvx from hatch run command

9ca8424

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

scripts/generate_gregorian_grammar.py Show resolved Hide resolved

rlskoeser added 6 commits February 12, 2026 15:17

Make Gregorian parser case-insensitive

e4c468d

Test error handling in gregorian converter parse method

a29a5a4

Catch more generic Lark exception per @coderabbitai

b9c2bf6

Ignore commas and periods across all grammars

bb1d724

Use markdown formatting instead of rst for hatch run command

e16f4d2

Add new undate_common lark grammar to version control

3efce6d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for parsing Gregorian dates in standard text formats #160

Add support for parsing Gregorian dates in standard text formats #160

Uh oh!

rlskoeser commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

codecov bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add support for parsing Gregorian dates in standard text formats #160

Are you sure you want to change the base?

Add support for parsing Gregorian dates in standard text formats #160

Uh oh!

Conversation

rlskoeser commented Feb 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

codecov bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rlskoeser commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

codecov bot commented Feb 12, 2026 •

edited

Loading