Skip to content
/ server Public
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions mysql-test/main/parser.result
Original file line number Diff line number Diff line change
Expand Up @@ -2315,3 +2315,32 @@ nocopy
#
# End of 11.7 tests
#
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please re-base to 10.11. This is a bug in MDEV-16020 that was pushed to 10.3.7.

# MDEV-17677 : Keywords are parsed as identifiers when followed by a dot
#
: dot after SELECT
SELECT.1;
.1
0.1
SELECT .1;
.1
0.1
SELECT-.1;
-.1
-0.1
SELECT.abc;
ERROR 42000: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'SELECT.abc' at line 1
: expressions
SELECT.123+0;
.123+0
0.123
SELECT.5 * 2;
.5 * 2
1.0
: still work as identifier
CREATE TABLE `SELECT` (a INT);
INSERT INTO `SELECT` VALUES (5);
SELECT `SELECT`.a FROM `SELECT`;
a
5
DROP TABLE `SELECT`;
21 changes: 21 additions & 0 deletions mysql-test/main/parser.test
Original file line number Diff line number Diff line change
Expand Up @@ -2114,3 +2114,24 @@ DELIMITER ;$$
--echo #
--echo # End of 11.7 tests
--echo #

--echo #
--echo # MDEV-17677 : Keywords are parsed as identifiers when followed by a dot
--echo #

--echo : dot after SELECT
SELECT.1;
SELECT .1;
SELECT-.1;
--error ER_PARSE_ERROR
SELECT.abc;

--echo : expressions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also consider testing for the other classes of symbols in <identifier extend> in addition to the .dot. One for each class should suffice I believe.

SELECT.123+0;
SELECT.5 * 2;

--echo : still work as identifier
CREATE TABLE `SELECT` (a INT);
INSERT INTO `SELECT` VALUES (5);
SELECT `SELECT`.a FROM `SELECT`;
DROP TABLE `SELECT`;
16 changes: 13 additions & 3 deletions sql/sql_lex.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2899,16 +2899,26 @@ int Lex_input_stream::scan_ident_middle(THD *thd, Lex_ident_cli_st *str,
yylineno++;
}
}
// here we check if the current token is a keyword followed by a dot and then digit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert, so please consider this optional until the final review:
I believe this is the right place to fix it, but I do not think the fix is the right one.

Before your change this is what it looked like:

 if (start == get_ptr() && c == '.' && ident_map[(uchar) yyPeek()])
    next_state= MY_LEX_IDENT_SEP;

This is basically saying the following:
if (the symbol after the current identifier is not prefixed with whitespace, is a dot (.) and the letter after it is a valid identifier letter) then expect identifer.

Therein lies the issue.

According to the SQL standard:

An identifier is defined as:

<regular identifier> ::=
  <identifier body>
<identifier body> ::=
  <identifier start> [ <identifier part>... ]
<identifier part> ::=
  <identifier start>
  | <identifier extend>

An <identifier start> is any character in the Unicode General Category classes “Lu”, “Ll”, “Lt”, “Lm”,
“Lo”, or “Nl”.
NOTE 94 — The Unicode General Category classes “Lu”, “Ll”, “Lt”, “Lm”, “Lo”, and “Nl” are assigned to Unicode characters
that are, respectively, upper-case letters, lower-case letters, title-case letters, modifier letters, other letters, and letter numbers.

This means that a valid identifier cannot start with a digit!

However, ident_map[] is true for the standard's <identifier part> production, See above:

This is how the extra between <identifier start> and <identifier part> (i.e. <identifier extend>) is defined by the standard:

An <identifier extend> is U+00B7, “Middle Dot”, or any character in the Unicode General Category classes
“Mn”, “Mc”, “Nd”, “Pc”, or “Cf”.
NOTE 95 — The Unicode General Category classes “Mn”, “Mc”, “Nd”, “Pc”, and “Cf” are assigned to Unicode characters that are, respectively, nonspacing marks, spacing combining marks, decimal numbers, connector punctuations, and formatting
codes.

Here, in this if() the last condition should be extended to exclude <identifier extend>.

This has nothing to do with the leading identifier being a keyword or not. This will be handled by the grammar, according to the context.

// if it's the case, we want to return it as keyword not identifier
int tokval= find_keyword(str, length, false);
if (tokval && c == '.' && my_isdigit(cs, yyPeek()))
{
yyUnget();
return tokval;
}
if (start == get_ptr() && c == '.' && ident_map[(uchar) yyPeek()])
{
next_state= MY_LEX_IDENT_SEP;
}
else
{ // '(' must follow directly if function
int tokval;
int tokval2;
yyUnget();
if ((tokval= find_keyword(str, length, c == '(')))
if ((tokval2= find_keyword(str, length, c == '(')))
{
next_state= MY_LEX_START; // Allow signed numbers
return(tokval); // Was keyword
return(tokval2); // Was keyword
}
yySkip(); // next state does a unget
}
Expand Down