Merging two Regular Expressions to Truncate Words in Strings
Posted
by Alix Axel
on Stack Overflow
See other posts from Stack Overflow
or by Alix Axel
Published on 2010-04-21T12:35:27Z
Indexed on
2010/04/21
13:23 UTC
Read the original article
Hit count: 390
I'm trying to come up with the following function that truncates string to whole words (if possible, otherwise it should truncate to chars):
function Text_Truncate($string, $limit, $more = '...')
{
$string = trim(html_entity_decode($string, ENT_QUOTES, 'UTF-8'));
if (strlen(utf8_decode($string)) > $limit)
{
$string = preg_replace('~^(.{1,' . intval($limit) . '})(?:\s.*|$)~su', '$1', $string);
if (strlen(utf8_decode($string)) > $limit)
{
$string = preg_replace('~^(.{' . intval($limit) . '}).*~su', '$1', $string);
}
$string .= $more;
}
return trim(htmlentities($string, ENT_QUOTES, 'UTF-8', true));
}
Here are some tests:
// Iñtërnâtiônàlizætiøn and then the quick brown fox... (49 + 3 chars)
echo dyd_Text_Truncate('Iñtërnâtiônàlizætiøn and then the quick brown fox jumped overly the lazy dog and one day the lazy dog humped the poor fox down until she died.', 50, '...');
// Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_... (50 + 3 chars)
echo dyd_Text_Truncate('Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_jumped_overly_the_lazy_dog and one day the lazy dog humped the poor fox down until she died.', 50, '...');
They both work as it is, however if I drop the second preg_replace()
I get the following:
Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_jumped_overly_the_lazy_dog and one day the lazy dog humped the poor fox down until she died....
I can't use substr()
because it only works on byte level and I don't have access to mb_substr()
ATM, I've made several attempts to join the second regex with the first one but without success.
Please help S.M.S., I've been struggling with this for almost an hour.
EDIT: I'm sorry, I've been awake for 40 hours and I shamelessly missed this:
$string = preg_replace('~^(.{1,' . intval($limit) . '})(?:\s.*|$)?~su', '$1', $string);
Still, if someone has a more optimized regex (or one that ignores the trailing space) please share:
"Iñtërnâtiônàlizætiøn and then "
"Iñtërnâtiônàlizætiøn_and_then_"
EDIT 2: I still can't get rid of the trailing whitespace, can someone help me out?
© Stack Overflow or respective owner