I've got a lot of text, similar to the following paragraph, which I'd like to split into words without punctuation (', ", ,, ., newline etc).. with a few exceptions.
Initially considered endemic to the Chalakudy River system in Kerala state, southern India, but now recognised to have a wider distribution in surrounding drainages including the Periyar, Manimala, and Pamba river though the Manimala data may be questionable given it seems to be the type locality of P. denisonii.
In the Achankovil River basin it occurs sympatrically, and sometimes syntopically, with P. denisonii.
Wild stocks may have dwindled by as much as 50% in the last 15 years or so with collection for the aquarium trade largely held responsible although habitats are also being degraded by pollution from agricultural and domestic sources, plus destructive fishing methods involving explosives or organic toxins.
The text refers to P. denisonii which is a species of fish. It's an abbreviation of Genus species. I would like this reference to be one word.
So, for instance, this is the kind of array I'd like to see:
Array
(
...
[44] given
[45] it
[46] seems
[47] to
[48] be
[49] the
[50] type
[51] locality
[52] of
[53] P. denisonii
[54] In
[55] the
...
)
The only things that distinguish these species references such as P. denisonii from a new sentence like end. New are:
The P (for Puntius, as in the P. in the aforementioned example) is only ever one letter, always a capital
the d (as in . denisonii) is always either a lower case letter or an apostrophe (')
What regexp can I use with preg_split to give me such an array? I've tried a simple explode( " ", $array ) but it doesn't do the job at all.
Thanks in advance,