Google indexed the same page under two URLs (despite rel-canonical)
- by unor
The Super User question "Playing mp3 in quodlibet displays “GStreamer output pipeline could not be initialized” error" is indexed under two URLs in Google:
http://superuser.com/questions/651591/playing-mp3-in-quodlibet-displays-gstreamer-output-pipeline-could-not-be-initia
http://superuser.com/questions/651591/playing-mp3-in-quodlibet-displays-gstreamer-output-pipeline-could-not-be-initia/652058
The first one is the canonical one; the corresponding rel-canonical is included in both pages:
<link rel="canonical" href="http://superuser.com/questions/651591/playing-mp3-in-quodlibet-displays-gstreamer-output-pipeline-could-not-be-initia" />
Google also indexed http://superuser.com/a/652058, which redirects to the answer:
http://superuser.com/questions/651591/playing-mp3-in-quodlibet-displays-gstreamer-output-pipeline-could-not-be-initia/652058#652058
Now, the second URL from above is the same as this one minus the fragment #652058.
So Google seems to strip the fragment, which results in exactly the same page under another URL (= containing the answer ID /652058 as suffix), and indexes it, too -- despite rel-canonical and duplicate content.
Shouldn’t Google recognize this and only index the canonical variant?
And what could be the reason why Stack Exchange includes the answer ID in the URL path, and not only in the fragment (resulting in various URL variants for the same page)?