[Lunr] Gravity: adding canonical query

Review Request #785 — Created July 23, 2021 and submitted

d.mendes
Lunr
gravity/database/mysql/canonicalquery
cde92c3...
lunr

Gravity: adding canonical query

Thought process,
First approach, i wrote one main loop that it would check each index of the string and choose which path to take depending on character found, my problem with it was that i had to keep verifying previous and next characters with diferent conditions and child loops, so i felt that the code became a bit complex and hard to read.

Second approach, functions like strpos can search for more than a single character and this way i can clean up a lot of the conditions, but the problem is that i we'll have multiples loops instead of one main loop to the entire string, and i'm not sure if the impact is noticible or not and in case it is, if it is better to assume it or not, also for numeric values i still used the first approach (with some cases jumping a few ranges) because there was no clear pattern to search for.

Removes Single line comments "#" or "--"
Removes Block comments "/*" and "*/"
Jumps (Ignores) characters in between:
- Executable comments ("/*M","*/") or ("/*!,*/")
- Backquotes "`"
- Escaped caracters "\"

Replaces in betweeen characters:
- Double quotes
- Single quotes

Replaces finds and replaces numeric characters:
- Numberic "123"
- Negative "-123"
- Decimals "12.34"
- Exponentials "12.34E11" or "12.34E-11"
- Hexadecimals "0x43"

Removes:
- End of Line characters
- Excessive blank spaces



  • 0
  • 0
  • 36
  • 1
  • 37
Description From Last Updated
smillernl
  1. 
      
  2. Did you check this with phpcs? For example the uncommented private class variables should have been disallowed by it.

    1. Yes, i used Lunr standard

  3. 
      
d.mendes
d.mendes
d.mendes
smillernl
  1. 
      
  2. This should indicate the type of variable (as you did below), not the name.

    1. fixed it, also line 20 with the code " * @var $query"

  3. Please be consistent in usage of return types

    1. Changed this, can you confirm?

    2. This is fixed, but I made some new comments for types.

  4. There should be no spaces between the name and the ()

    1. fixed this, i also fixed when the function was being called

  5. You're checking if $end_pos is false above, it'll never be equal to strlen($string ) - 1 - $start_pos and even if it is, you're not using the result of the comparison.

    1. fixed it, and re-tested it

  6. null isn't allowed in the parameter type

  7. I don't think this needs to be defined twice?

    1. no need, removed it

    1. No reason, changed it to null and re-tested

  8. How does this find any digits? It's just checking string length right?

    1. Artefact from development process old code.
      Comment Removed.

  9. It doesn't seem to do this

  10. 
      
d.mendes
pprkut
  1. 
      
  2. src/Lunr/Gravity/Database/MySQL/MySQLCanonicalQuery.php (Diff revision 3)
     
     
     
     
     
     
     
     
     
     

    Don't think this is needed. We can already get the query from other places.

  3. src/Lunr/Gravity/Database/MySQL/MySQLCanonicalQuery.php (Diff revision 3)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    You can combine these two, by checking whether the query canonicalization was already done ($this->canonical_query !== NULL).

    I would also rename the method to get and make a __toString() magic method that calls it.

    That way we can get the canonical query using

    $canonical_query = (string) $result->canonical_query();
    
    1. There is a call to canonical_query() in the __contructor, i also removed it
      this way only executes if __toString() or ->canonical_query() are called.

  4. src/Lunr/Gravity/Database/MySQL/MySQLCanonicalQuery.php (Diff revision 3)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    I don't think we need those wrappers.

  5. src/Lunr/Gravity/Database/MySQL/MySQLQueryResult.php (Diff revision 3)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    I'm afraid, doing it like this isn't good.

    Dependency Injection demands that we can somehow mock the canonical_query() method of the MySQLCanonicalQuery class. But we can't do that if class instantitation and method call are in the same location. So we have to split this.

    The best scenario I could think of is if the just return the instantiated class here (still considering that we should only instantiate it on the first call). Combined with the __toString() comment I made above this should still make for a nice API.

    1. So if i followed right, just remove the line $this->canonical_query = $canonical_query->canonical_query(); (assuming __toString() is implemented) should fix this right?
      also to execute only the first time that canonical_query() from MySQLQueryResult is executed

    2. Yes, that's good :)

  6. 
      
d.mendes
pprkut
  1. 
      
  2. Some things aren't in the codestyle apparently. We should make a list :D

    There should be an empty line between the open tag and the file comment.

  3. src/Lunr/Gravity/Database/MySQL/MySQLCanonicalQuery.php (Diff revision 4)
     
     
     
     
     
     
     
     

    empty line between @param and @return

    1. changed, also did it in other locations

  4. leftover test code

  5. src/Lunr/Gravity/Database/MySQL/MySQLQueryResult.php (Diff revision 4)
     
     
     
     
     

    common practice for us is that we initialize the property in the constructor. So a simple

    $this->canonical_query = NULL;
    

    should be fine here.

  6. I prefer a !== NULL check

  7. Put the expected value in the data provider, even if it's always the same. Makes it easier to extend in the future in case we do find a situation where it isn't.

    1. yes, i also added ReplaceBetweenDataProvider for the same reason

  8. This shouldn't be needed. $this->class is supposed to be the class we're testing.

  9. src/Lunr/Gravity/Database/MySQL/Tests/MySQLCanonicalQueryTest.php (Diff revision 4)
     
     
     
     
     
     
     
     

    you can name the test scenarios by setting a string key, like

    $data_provider['first argument not string'] = [['013456789013456789','23','56'], []];
    
  10. src/Lunr/Gravity/Database/MySQL/Tests/MySQLCanonicalQueryTest.php (Diff revision 4)
     
     
     
     
     
     
     
     
     
     
     
     
     

    Please change this to read input and expected output from test files, and add more possible scenarios.
    There's two possible structure we typically use (in tests/statics/<namespace>/):

    • foo_input.sql and foo_output.sql
    • input/foo.sql and output/foo.sql

    Whichever you prefer.

    Additional cases here that'd be worth considering are INSERT, UPDATE, REPLACE queries. SHOW would also be nice. Perhaps upserts, CTEs, etc.

  11. src/Lunr/Gravity/Database/MySQL/Tests/MySQLQueryResultBaseTest.php (Diff revision 4)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    one test case per test :)

  12. 
      
smillernl
  1. 
      
  2. Try to avoid empty() since it'll also match things like 0, which you probably don't want here. is_null is a better check I think

    1. yes, also need to check if $to === ''

  3. missing param types

  4. this is equal to parent::tearDown() after you implement heinz' comments

  5. 
      
d.mendes
pprkut
  1. 
      
  2. Perhaps add an example for a query using maxscale query hints: https://mariadb.com/kb/en/mariadb-maxscale-14/maxscale-hint-syntax/

    1. I added a new file with a simple example,
      The code right now removes comments "-- ", but this seems to be something that may have some interest to distinguish
      So i'm not sure if i should change the code to keep this hints in the canonical query, or add it as an option

    2. No that's fine. For all intents and purposes it's a comment, since it doesn't control the query, it controls maxscale.

  3. tests/statics/Gravity/Database/MySQL/input_backup.sql (Diff revision 5)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    I'm not sure about this. On the one hand, it's nice to test that the code would support his.

    On the other hand, we'd never get to a query like this. We don't support either running multiple queries at once nor do we support fetching results for multiple queries issued at once.

    1. i think this probably should be removed, because the code will assume that it is 1 query, and it would probably need some new development and tests to fully support it

    2. Then let's remove it :)

  4. This is where things get tricky I suppose.

    Are these two queries canonically different?

    INSERT INTO `database`.`table` (`param1`, `param2`, `param3`) VALUES (?, '?', '?'), (?, '?', '?'), (?, '?', '?'), (?, '?', '?'), (?, '?', '?');
    

    INSERT INTO `database`.`table` (`param1`, `param2`, `param3`) VALUES (?, '?', '?'), (?, '?', '?'), (?, '?', '?'), (?, '?', '?');
    
    1. Yes they are different, it will be a different canonical for the same insert, case the number of rows to insert change

    2. https://github.com/mariadb-corporation/MaxScale/commit/75fadd711a7bf8895f1a7c407d010b89601afc69

      Checked with maxscale devs :)

  5. 
      
d.mendes
d.mendes
pprkut
  1. 
      
  2. src/Lunr/Gravity/Database/MySQL/MySQLCanonicalQuery.php (Diff revisions 6 - 7)
     
     
     
     

    You need to handle the case where $offset is an empty array (if I understood the code in find_positions correctly).

    This can be the case in INSERT/REPLACE statements that don't use VALUES, like

    INSERT INTO table SELECT FROM table2

  3. 
      
d.mendes
pprkut
  1. 
      
  2. I think it would be a better test if we had an actual value here in the WHERE clause :-)

    1. Please check the other input test files as well

    2. i will add new tests files to the test function of canonical_query,
      because this test is for the collapse function, and the rows need to be in canonical formart already

    3. gotcha

  3. 
      
d.mendes
d.mendes
  1. 
      
  2. this is the same row, but will return failed

    1. I'm not sure what you mean

    2. i was wandering about the case sensitive, for example null versus NULL, if i should treat it as different rows or the same

    3. the same, in this case

  3. 
      
pprkut
  1. Ship It!
  2. 
      
pprkut
  1. 
      
  2. Branch not pushed

    1. Pushed it to my fork: https://github.com/dmendesm2mobi/lunr

  3. 
      
pprkut
  1. 
      
  2. Unit tests are failing:

    There was 1 error:
    
    1) Lunr\Gravity\Database\MySQL\Tests\MySQLQueryResultBaseTest::testCanonicalQuery
    Undefined property: Lunr\Gravity\Database\MySQL\Tests\MySQLQueryResultBaseTest::$result_reflection
    
    /mnt/progs/projects/m2mobi/backend/lunr/src/Lunr/Gravity/Database/MySQL/Tests/MySQLQueryResultBaseTest.php:182
    
    --
    
    There was 1 warning:
    
    1) Warning
    No tests found in class "Lunr\Gravity\Database\MySQL\Tests\MySQLCanonicalQueryTest".
    
    --
    
    There were 2 failures:
    
    1) Lunr\Gravity\Filesystem\Tests\PhysicalFilesystemAccessObjectListDirectoriesTest::testGetListOfDirectoriesInAccessibleDirectory
    Failed asserting that two arrays are equal.
    --- Expected
    +++ Actual
    @@ @@
     Array (
    -    0 => 'folder1'
    -    1 => 'folder2'
    +    0 => 'Database'
    +    1 => 'folder1'
    +    2 => 'folder2'
     )
    
    /mnt/progs/projects/m2mobi/backend/lunr/src/Lunr/Gravity/Filesystem/Tests/PhysicalFilesystemAccessObjectListDirectoriesTest.php:40
    
    2) Lunr\Gravity\Filesystem\Tests\PhysicalFilesystemAccessObjectDirectoryTest::testGetListingOfAccessibleDirectory
    Failed asserting that two arrays are equal.
    --- Expected
    +++ Actual
    @@ @@
     Array (
    -    0 => 'file1'
    -    1 => 'file2'
    -    2 => 'file3'
    -    3 => 'folder1'
    -    4 => 'folder2'
    +    0 => 'Database'
    +    1 => 'file1'
    +    2 => 'file2'
    +    3 => 'file3'
    +    4 => 'folder1'
    +    5 => 'folder2'
     )
    
    /mnt/progs/projects/m2mobi/backend/lunr/src/Lunr/Gravity/Filesystem/Tests/PhysicalFilesystemAccessObjectDirectoryTest.php:41
    
  3. 
      
d.mendes
d.mendes
pprkut
  1. 
      
  2. When I rebase the latest code in the branch against master I still get:

    There was 1 error:
    
    1) Lunr\Gravity\Database\MySQL\Tests\MySQLQueryResultBaseTest::testCanonicalQuery
    Undefined property: Lunr\Gravity\Database\MySQL\Tests\MySQLQueryResultBaseTest::$result_reflection
    
  3. 
      
d.mendes
pprkut
Review request changed

Status: Closed (submitted)

Change Summary:

Merged into master

Loading...