Split a string by comma, quote and full-stop.. with a few exceptions
Posted
by
dunc
on Stack Overflow
See other posts from Stack Overflow
or by dunc
Published on 2012-06-16T14:55:58Z
Indexed on
2012/06/16
15:16 UTC
Read the original article
Hit count: 172
I've got a lot of text, similar to the following paragraph, which I'd like to split into words without punctuation ('
, "
, ,
, .
, newline
etc).. with a few exceptions.
Initially considered endemic to the Chalakudy River system in Kerala state, southern India, but now recognised to have a wider distribution in surrounding drainages including the Periyar, Manimala, and Pamba river though the Manimala data may be questionable given it seems to be the type locality of P. denisonii.
In the Achankovil River basin it occurs sympatrically, and sometimes syntopically, with P. denisonii.
Wild stocks may have dwindled by as much as 50% in the last 15 years or so with collection for the aquarium trade largely held responsible although habitats are also being degraded by pollution from agricultural and domestic sources, plus destructive fishing methods involving explosives or organic toxins.
The text refers to P. denisonii
which is a species of fish. It's an abbreviation of Genus species
. I would like this reference to be one word.
So, for instance, this is the kind of array I'd like to see:
Array
(
...
[44] given
[45] it
[46] seems
[47] to
[48] be
[49] the
[50] type
[51] locality
[52] of
[53] P. denisonii
[54] In
[55] the
...
)
The only things that distinguish these species references such as P. denisonii
from a new sentence like end. New
are:
- The P (for Puntius, as in the P. in the aforementioned example) is only ever one letter, always a capital
- the d (as in . denisonii) is always either a lower case letter or an apostrophe (
'
)
What regexp can I use with preg_split
to give me such an array? I've tried a simple explode( " ", $array )
but it doesn't do the job at all.
Thanks in advance,
© Stack Overflow or respective owner