MDEV-17677: Fix keyword parsing that treated as identifier when immediately followed by dot#4713
MDEV-17677: Fix keyword parsing that treated as identifier when immediately followed by dot#4713Mahmoud-kh1 wants to merge 1 commit intoMariaDB:12.3from
Conversation
ee0329f to
362831f
Compare
…iately followed by dot
gkodinov
left a comment
There was a problem hiding this comment.
Thank you for your contribution! This is a preliminary review.
Please have a commit message to your commit that complies with CODING_STANDARDS.md.
| yylineno++; | ||
| } | ||
| } | ||
| // here we check if the current token is a keyword followed by a dot and then digit |
There was a problem hiding this comment.
I'm not an expert, so please consider this optional until the final review:
I believe this is the right place to fix it, but I do not think the fix is the right one.
Before your change this is what it looked like:
if (start == get_ptr() && c == '.' && ident_map[(uchar) yyPeek()])
next_state= MY_LEX_IDENT_SEP;
This is basically saying the following:
if (the symbol after the current identifier is not prefixed with whitespace, is a dot (.) and the letter after it is a valid identifier letter) then expect identifer.
Therein lies the issue.
According to the SQL standard:
An identifier is defined as:
<regular identifier> ::=
<identifier body>
<identifier body> ::=
<identifier start> [ <identifier part>... ]
<identifier part> ::=
<identifier start>
| <identifier extend>
An <identifier start> is any character in the Unicode General Category classes “Lu”, “Ll”, “Lt”, “Lm”,
“Lo”, or “Nl”.
NOTE 94 — The Unicode General Category classes “Lu”, “Ll”, “Lt”, “Lm”, “Lo”, and “Nl” are assigned to Unicode characters
that are, respectively, upper-case letters, lower-case letters, title-case letters, modifier letters, other letters, and letter numbers.
This means that a valid identifier cannot start with a digit!
However, ident_map[] is true for the standard's <identifier part> production, See above:
This is how the extra between <identifier start> and <identifier part> (i.e. <identifier extend>) is defined by the standard:
An <identifier extend> is U+00B7, “Middle Dot”, or any character in the Unicode General Category classes
“Mn”, “Mc”, “Nd”, “Pc”, or “Cf”.
NOTE 95 — The Unicode General Category classes “Mn”, “Mc”, “Nd”, “Pc”, and “Cf” are assigned to Unicode characters that are, respectively, nonspacing marks, spacing combining marks, decimal numbers, connector punctuations, and formatting
codes.
Here, in this if() the last condition should be extended to exclude <identifier extend>.
This has nothing to do with the leading identifier being a keyword or not. This will be handled by the grammar, according to the context.
| # | ||
| # End of 11.7 tests | ||
| # | ||
| # |
There was a problem hiding this comment.
Please re-base to 10.11. This is a bug in MDEV-16020 that was pushed to 10.3.7.
| --error ER_PARSE_ERROR | ||
| SELECT.abc; | ||
|
|
||
| --echo : expressions |
There was a problem hiding this comment.
Please also consider testing for the other classes of symbols in <identifier extend> in addition to the .dot. One for each class should suffice I believe.
problem :

Keywords immediately followed by a dot ('.') were incorrectly parsed as identifiers instead of keywords. This caused syntax errors for "SELECT.1" which should be parsed as keyword SELECT followed by decimal .1
How we fix it :

we check if a token is a keyword before skipping keyword lookup for qualified identifiers. This
allows keywords to still treated as keywords when followed by dot.
Now it works
Test:
Add some cases in Parser test
bug :
MDEV-17677