Skip to content
/ server Public

MDEV-17677: Fix keyword parsing that treated as identifier when immediately followed by dot#4713

Open
Mahmoud-kh1 wants to merge 1 commit intoMariaDB:12.3from
Mahmoud-kh1:kw-dot
Open

MDEV-17677: Fix keyword parsing that treated as identifier when immediately followed by dot#4713
Mahmoud-kh1 wants to merge 1 commit intoMariaDB:12.3from
Mahmoud-kh1:kw-dot

Conversation

@Mahmoud-kh1
Copy link

@Mahmoud-kh1 Mahmoud-kh1 commented Feb 28, 2026

problem :
Keywords immediately followed by a dot ('.') were incorrectly parsed as identifiers instead of keywords. This caused syntax errors for "SELECT.1" which should be parsed as keyword SELECT followed by decimal .1
image

How we fix it :
we check if a token is a keyword before skipping keyword lookup for qualified identifiers. This
allows keywords to still treated as keywords when followed by dot.
Now it works
image

Test:
Add some cases in Parser test

bug :
MDEV-17677

@Mahmoud-kh1 Mahmoud-kh1 marked this pull request as draft February 28, 2026 20:56
@Mahmoud-kh1 Mahmoud-kh1 force-pushed the kw-dot branch 3 times, most recently from ee0329f to 362831f Compare February 28, 2026 23:26
@Mahmoud-kh1 Mahmoud-kh1 marked this pull request as ready for review March 1, 2026 00:17
@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Mar 2, 2026
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! This is a preliminary review.

Please have a commit message to your commit that complies with CODING_STANDARDS.md.

yylineno++;
}
}
// here we check if the current token is a keyword followed by a dot and then digit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert, so please consider this optional until the final review:
I believe this is the right place to fix it, but I do not think the fix is the right one.

Before your change this is what it looked like:

 if (start == get_ptr() && c == '.' && ident_map[(uchar) yyPeek()])
    next_state= MY_LEX_IDENT_SEP;

This is basically saying the following:
if (the symbol after the current identifier is not prefixed with whitespace, is a dot (.) and the letter after it is a valid identifier letter) then expect identifer.

Therein lies the issue.

According to the SQL standard:

An identifier is defined as:

<regular identifier> ::=
  <identifier body>
<identifier body> ::=
  <identifier start> [ <identifier part>... ]
<identifier part> ::=
  <identifier start>
  | <identifier extend>

An <identifier start> is any character in the Unicode General Category classes “Lu”, “Ll”, “Lt”, “Lm”,
“Lo”, or “Nl”.
NOTE 94 — The Unicode General Category classes “Lu”, “Ll”, “Lt”, “Lm”, “Lo”, and “Nl” are assigned to Unicode characters
that are, respectively, upper-case letters, lower-case letters, title-case letters, modifier letters, other letters, and letter numbers.

This means that a valid identifier cannot start with a digit!

However, ident_map[] is true for the standard's <identifier part> production, See above:

This is how the extra between <identifier start> and <identifier part> (i.e. <identifier extend>) is defined by the standard:

An <identifier extend> is U+00B7, “Middle Dot”, or any character in the Unicode General Category classes
“Mn”, “Mc”, “Nd”, “Pc”, or “Cf”.
NOTE 95 — The Unicode General Category classes “Mn”, “Mc”, “Nd”, “Pc”, and “Cf” are assigned to Unicode characters that are, respectively, nonspacing marks, spacing combining marks, decimal numbers, connector punctuations, and formatting
codes.

Here, in this if() the last condition should be extended to exclude <identifier extend>.

This has nothing to do with the leading identifier being a keyword or not. This will be handled by the grammar, according to the context.

#
# End of 11.7 tests
#
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please re-base to 10.11. This is a bug in MDEV-16020 that was pushed to 10.3.7.

--error ER_PARSE_ERROR
SELECT.abc;

--echo : expressions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also consider testing for the other classes of symbols in <identifier extend> in addition to the .dot. One for each class should suffice I believe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

2 participants