|
PITS /
00682Summary: Search results when using UTF-8 and no-Latin characters
Created: 2006-03-03 01:29
Status: Closed: added in PmWiki 2.2.x beta versions.
Category: Bug
From: Athan
Assigned:
Priority: 54
Version: 2.1 b33
OS: Any
Description: When using UTF-8, searching for a non-Latin string only returns case sensitive results. It is a known PHP limitation, but a utf8tolower conversion using xlpage-utf-8.php would be a solution. Just for completeness, here are my comments from the mailing list... Unfortunately, at the moment there's not really a good way for us to do this -- it's a limitation of PHP. The basic functions available in PHP to perform case-insensitive searches in substrings aren't really aware of uppercase and lowercase distinctions for utf-8 encoded strings. One approach would be to convert all terms to lowercase when doing the string search, but even here PHP's support is limited. To convert utf-8 to lowercase we'd have to use something like PHP's mb_strtolower function, but a lot of PHP installations don't have the mb_* available by default. Also, we have to be careful that we don't perform utf-8 lowercase conversions on sites that are using iso-8859-1 or other character encodings. On the other hand, the xlpage-utf-8.php script is already defining a table of case conversions, so maybe I can get the search script to use that. I've put this on my ToDo list, so maybe I can come up with a fix reasonably soon. Here is little patch for utf-8 case insensitive search: It successfully works at www.pmwiki.ru. Tested on pmwiki-2.1.5. The patch works but current $CaseConversions table is l=>u (lower to upper) thus some uppercase characters map to wrong lowercase. A complete u=>l table can be added easily though.
I'm sure Patric will implement a complete utf8 solution in pmwiki core soon. Even better, pmwiki should turn to full utf-8 by default. New version of the patch: pmwiki-utf8-search.zip. Tested on PmWiki 2.2.0-beta16. By the way, when mbstring extension installed and modern version of PHP is used, there is no need of $CaseConversions array at all. satrap December 21, 2006, at 04:49 PM
It works, thanks satrap. However patching every new pmwiki build isn't the best practice imho. |