How to query MySQL for exact length and exact UTF-8 characters
- by oskarae
I have table with words dictionary in my language (latvian).
CREATE TABLE words (
value varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
And let's say it has 3 words inside:
INSERT INTO words (value) VALUES ('teja');
INSERT INTO words (value) VALUES ('vejš');
INSERT INTO words (value) VALUES ('feja');
What I want to do is I want to find all words that is exactly 4 characters long and where second character is 'e' and third character is 'j'
For me it feels that correct query would be:
SELECT * FROM words WHERE value LIKE '_ej_';
But problem with this query is that it returs not 2 entries ('teja','vejš') but all three.
As I understand it is because internally MySQL converts strings to some ASCII representation?
Then there is BINARY addition possible for LIKE
SELECT * FROM words WHERE value LIKE BINARY '_ej_';
But this also does not return 2 entries ('teja','vejš') but only one ('teja'). I believe this has something to do with UTF-8 2 bytes for non ASCII chars?
So question:
What MySQL query would return my exact two words ('teja','vejš')?
Thank you in advance