I am using the following code to calculate text to code ratio. I think it is crazy that no one can agree on how to properly calculate the result. I am looking any suggestions or ideas to improve this code that may make it more accurate.
<?php
// Returns the size of the content in bytes
function findKb($content){
$count=0;
$order = array("\r\n", "\n", "\r", "chr(13)", "\t", "\0", "\x0B");
$content = str_replace($order, "12", $content);
for ($index = 0; $index < strlen($content); $index ++){
$byte = ord($content[$index]);
if ($byte <= 127) { $count++; }
else if ($byte >= 194 && $byte <= 223) { $count=$count+2; }
else if ($byte >= 224 && $byte <= 239) { $count=$count+3; }
else if ($byte >= 240 && $byte <= 244) { $count=$count+4; }
}
return $count;
}
// Collect size of entire code
$filesize = findKb($content);
// Remove anything within script tags
$code = preg_replace("@<script[^>]*>.+</script[^>]*>@i", "", $content);
// Remove anything within style tags
$code = preg_replace("@<style[^>]*>.+</style[^>]*>@i", "", $content);
// Remove all tags from the system
$code = strip_tags($code);
// Remove Extra whitespace from the content
$code = preg_replace( '/\s+/', ' ', $code );
// Find the size of the remaining code
$codesize = findKb($code);
// Calculate Percentage
$percent = $codesize/$filesize;
$percentage = $percent*100;
echo $percentage;
?>
I don't know the exact calculations that are used so this function is just my guess. Does anyone know what the proper calculations are or if my functions are close enough for a good judgement.