Skip to content

Conversation

@rlskoeser
Copy link
Member

@rlskoeser rlskoeser commented Feb 12, 2026

resolves #127

Uses a code-generation script to create the grammar for month names (full and abbreviated) based on a list of languages. The approach for the hatch codegen script is based on a conversation with claude.ai

Summary by CodeRabbit

  • New Features

    • Gregorian date parsing added to the combined parser with flexible day/month/year orders; recognizes full and abbreviated month names (English, French, German, Spanish, Kinyarwanda, Ganda, Tigrinya) and ignores periods/commas.
  • Tests

    • New tests covering parsing, parsing errors, transformation, and converter behavior across precisions and locales.
  • Documentation

    • Added instructions for regenerating the multilingual month-name grammar.
  • Chores

    • Added a generation script and build configuration to produce the multilingual month grammar.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

Warning

Rate limit exceeded

@rlskoeser has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 58 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

Adds Gregorian date parsing and integration: new Lark grammar and autogenerated multilingual month tokens, a module-level parser, transformer, and converter with parse(), combined-parser registration, tests, a codegen script, and a hatch env entry to regenerate month grammar.

Changes

Cohort / File(s) Summary
Gregorian parser & transformer
src/undate/converters/calendars/gregorian/parser.py, src/undate/converters/calendars/gregorian/transformer.py, src/undate/converters/calendars/gregorian/converter.py, src/undate/converters/calendars/gregorian/__init__.py
Adds a module-level Lark parser, a GregorianDateTransformer (tree→Undate), a GregorianDateConverter with parse() and transformer initialization, and exports the converter.
Gregorian grammars
src/undate/converters/grammars/gregorian.lark, src/undate/converters/grammars/gregorian_multilang.lark
Adds a new gregorian grammar supporting flexible token orders and day/year rules; replaces month tokens with multiline regex-based multilingual month rules (month_1…month_12).
Combined parser integration
src/undate/converters/combined.py, src/undate/converters/grammars/combined.lark
Registers Gregorian transformer in the combined transformer, imports the gregorian_date rule into the omnibus grammar, adds a PUNCTUATION token/ignore, and re-enables Hebrew/Islamic overrides.
Codegen & build config
scripts/generate_gregorian_grammar.py, pyproject.toml
Adds a script to generate gregorian_multilang.lark using Babel month names and a Hatch env [tool.hatch.envs.codegen] to run it.
Tests
tests/test_converters/test_calendars/test_gregorian/*, tests/test_converters/test_combined_parser.py
Adds parser/transformer/converter unit tests for Gregorian inputs (multilingual months, precisions, label preservation) and appends Gregorian cases to combined parser tests.
Changelog & docs
CHANGELOG.md, DEVELOPER_NOTES.md
Adds a 0.7 changelog entry describing month-name multilingual support and dev notes describing how to regenerate the multilingual month grammar.
New file (generated)
src/undate/converters/grammars/gregorian_multilang.lark
Autogenerated month token regex rules covering English, French, German, Spanish, Kinyarwanda, Ganda, Tigrinya (month_1…month_12).

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant Converter as GregorianDateConverter
    participant Parser as gregorian_parser
    participant Transformer as GregorianDateTransformer
    participant UndateObj as Undate

    User->>Converter: parse("26 Ugushyingo 2022")
    Converter->>Parser: parse(value)
    Parser->>Parser: tokenize & match gregorian_date
    Parser-->>Converter: parse_tree
    Converter->>Transformer: transform(parse_tree)
    Transformer->>Transformer: extract year, month, day
    Transformer-->>Converter: UndateObj
    Converter-->>User: return Undate (label set)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • robcast
  • ColeDCrawford
  • jdamerow

Poem

🐰
I hopped through months in tongues and scripts,
From Jan to ጥሪ with tiny skips,
I stitched a grammar, neat and light—
Now dates from many lands take flight! 🥕📅

🚥 Pre-merge checks | ✅ 4 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (8 files):

⚔️ CHANGELOG.md (content)
⚔️ DEVELOPER_NOTES.md (content)
⚔️ pyproject.toml (content)
⚔️ src/undate/converters/calendars/hebrew/transformer.py (content)
⚔️ src/undate/converters/calendars/islamic/transformer.py (content)
⚔️ src/undate/converters/combined.py (content)
⚔️ src/undate/converters/grammars/combined.lark (content)
⚔️ tests/test_converters/test_combined_parser.py (content)

These conflicts must be resolved before merging into develop.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main feature: adding Gregorian date parsing in standard text formats with multilingual month name support.
Linked Issues check ✅ Passed The PR implements all coding requirements from issue #127: multilingual month name parsing for Kinyarwanda, Ganda, and Tigrinya languages alongside English, French, German, and Spanish.
Out of Scope Changes check ✅ Passed All changes directly support Gregorian date parsing with multilingual month names; no unrelated modifications detected outside the scope of issue #127.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/gregorian-human-lang-parsing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Feb 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.00%. Comparing base (0691b12) to head (3efce6d).
⚠️ Report is 7 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #160      +/-   ##
===========================================
+ Coverage    98.96%   99.00%   +0.04%     
===========================================
  Files           40       45       +5     
  Lines         2121     2216      +95     
===========================================
+ Hits          2099     2194      +95     
  Misses          22       22              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/undate/converters/combined.py (1)

1-5: ⚠️ Potential issue | 🟡 Minor

Module docstring not updated to include Gregorian.

The module-level docstring on line 2 still says "Supports EDTF, Hebrew, and Hijri" while the class docstring on line 50 was correctly updated to include Gregorian.

Proposed fix
 """
-**Experimental** combined parser. Supports EDTF, Hebrew, and Hijri
+**Experimental** combined parser. Supports EDTF, Gregorian, Hebrew, and Hijri
 where dates are unambiguous. (Year-only dates are parsed as EDTF in
 Gregorian calendar.)
 """
🤖 Fix all issues with AI agents
In `@scripts/generate_gregorian_grammar.py`:
- Around line 37-46: The grammar currently preserves Babel's exact month casing
(from get_month_names) causing case-sensitive parsing; normalize month names to
a consistent case before adding to all_month_names and emitting the grammar:
when iterating months in the nested loops, apply month_name =
month_name.strip(".").lower() (or .upper() if you prefer) and deduplicate using
that normalized value so the generated terminals are lowercase and the parser
accepts case-insensitive inputs; update any downstream places that consume
all_month_names to expect the normalized form.

In `@src/undate/converters/calendars/gregorian/converter.py`:
- Line 103: The inline comment in
src/undate/converters/calendars/gregorian/converter.py incorrectly mentions
"Hebrew" (copy-paste error); update the comment near the Gregorian parser
invocation (the parsing block in the converter module where the string is
parsed) to correctly say "Gregorian" (e.g., replace "# parse the string with our
Hebrew date parser" with a comment referencing the Gregorian date parser) so it
accurately reflects the behavior of the parse/convert routine in this converter.
- Around line 110-111: The except clauses in the calendar converters currently
catch only UnexpectedCharacters; update them to catch
lark.exceptions.UnexpectedInput instead (or import UnexpectedInput from
lark.exceptions) so UnexpectedToken and UnexpectedEOF are also handled and
re-raised as ValueError; apply this change in the Gregorian converter
(converter.py) and the same pattern in the hebrew, islamic, edtf, and combined
converter modules, ensuring the corresponding except blocks (the ones that
currently read "except UnexpectedCharacters as err") are replaced and that the
module imports include UnexpectedInput where needed.

In `@src/undate/converters/grammars/combined.lark`:
- Around line 4-7: The global `%ignore PUNCTUATION` directive in combined.lark
removes the comma token needed by the Hebrew and Islamic grammars (they define
`comma: ","` and use it in their date rules), so restore comma token visibility
by removing or narrowing that ignore: either delete `PUNCTUATION`/`%ignore
PUNCTUATION` from combined.lark, or change the approach so commas are not
ignored (e.g., apply punctuation-ignore only inside the Gregorian subgrammar or
adjust the PUNCTUATION pattern to exclude ","), ensuring the `comma` rule in the
Hebrew and Islamic grammar can be lexed and matched.

In `@src/undate/converters/grammars/gregorian_multilang.lark`:
- Around line 4-15: The grammar is case-sensitive so lowercase user input like
"january" won't match; update the generator or parsing flow: modify
scripts/generate_gregorian_grammar.py to emit month literals in a single case
(e.g., all lowercase) OR (simpler) normalize text before parsing by lowercasing
inputs in the converter that invokes the Lark parser for
gregorian_multilang.lark (i.e., call .lower() on the raw date string before
passing it to Lark), and ensure the generated grammar's month alternatives are
also lowercased (regenerate the .lark file) so names line up.
🧹 Nitpick comments (2)
tests/test_converters/test_calendars/test_gregorian/test_gregorian_transformer.py (1)

21-31: Consider adding a test case for year+month order (e.g., "1960 Jan").

The transformer tests don't cover the year-before-month ordering that the parser tests include (e.g., "1900 Feb" in the parser tests). Adding one here would validate the transformer handles all orderings correctly end-to-end.

src/undate/converters/grammars/combined.lark (1)

42-43: Consider uncommenting the Gregorian override to avoid year-only ambiguity.

The comment acknowledges that year-only parsing should be covered by EDTF, yet the gregorian__gregorian_date import on line 32 brings in the full grammar including the year-only alternative. A standalone year like "932" would be ambiguous between edtf__start and gregorian__gregorian_date in the combined parser. The Hebrew and Islamic grammars have their %override directives active (lines 37, 40) for the same reason.

Is there a specific reason this override is commented out? If not, activating it would be consistent with the approach for the other calendars.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@scripts/generate_gregorian_grammar.py`:
- Around line 1-11: The docstring in the script generate_gregorian_grammar.py
contains a typo "regeneate" on the line describing how to run the script; update
that word to "regenerate" in the module docstring so the usage text reads "Run
this script with hatch to regenerate the file::".
🧹 Nitpick comments (1)
DEVELOPER_NOTES.md (1)

94-103: Use a fenced code block for consistency with the rest of the file.

The indented code block on Line 101 and the trailing :: on Line 99 (an RST convention) are inconsistent with the fenced (```) style used elsewhere in this Markdown file. Also flagged by markdownlint (MD046).

Suggested fix
-library.  To regenerate, run the script with hatch (which should
-be installed globally)::
+library.  To regenerate, run the script with hatch (which should
+be installed globally):
 
-    hatch run codegen:generate
-    
+```sh
+hatch run codegen:generate
+```
+
 When the `.lark` file is modified by the script, it must be committed to git.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant