overriding ctype<wchar_t>
Posted
by Potatoswatter
on Stack Overflow
See other posts from Stack Overflow
or by Potatoswatter
Published on 2010-02-26T05:00:28Z
Indexed on
2010/03/25
0:23 UTC
Read the original article
Hit count: 741
I'm writing a lambda calculus interpreter for fun and practice. I got iostreams to properly tokenize identifiers by adding a ctype
facet which defines punctuation as whitespace:
struct token_ctype : ctype<char> {
mask t[ table_size ];
token_ctype()
: ctype<char>( t ) {
for ( size_t tx = 0; tx < table_size; ++ tx ) {
t[tx] = isalnum( tx )? alnum : space;
}
}
};
(classic_table()
would probably be cleaner but that doesn't work on OS X!)
And then swap the facet in when I hit an identifier:
locale token_loc( in.getloc(), new token_ctype );
…
locale const &oldloc = in.imbue( token_loc );
in.unget() >> token;
in.imbue( oldloc );
There seems to be surprisingly little lambda calculus code on the Web. Most of what I've found so far is full of unicode ?
characters. So I thought to try adding Unicode support.
But ctype<wchar_t>
works completely differently from ctype<char>
. There is no master table; there are four methods do_is
x2, do_scan_is
, and do_scan_not
. So I did this:
struct token_ctype : ctype< wchar_t > {
typedef ctype<wchar_t> base;
bool do_is( mask m, char_type c ) const {
return base::do_is(m,c)
|| (m&space) && ( base::do_is(punct,c) || c == L'?' );
}
const char_type* do_is
(const char_type* lo, const char_type* hi, mask* vec) const {
base::do_is(lo,hi,vec);
for ( mask *vp = vec; lo != hi; ++ vp, ++ lo ) {
if ( *vp & punct || *lo == L'?' ) *vp |= space;
}
return hi;
}
const char_type *do_scan_is
(mask m, const char_type* lo, const char_type* hi) const {
if ( m & space ) m |= punct;
hi = do_scan_is(m,lo,hi);
if ( m & space ) hi = find( lo, hi, L'?' );
return hi;
}
const char_type *do_scan_not
(mask m, const char_type* lo, const char_type* hi) const {
if ( m & space ) {
m |= punct;
while ( * ( lo = base::do_scan_not(m,lo,hi) ) == L'?' && lo != hi )
++ lo;
return lo;
}
return base::do_scan_not(m,lo,hi);
}
};
(Apologies for the flat formatting; the preview converted the tabs differently.)
The code is WAY less elegant. I does better express the notion that only punctuation is additional whitespace, but that would've been fine in the original had I had classic_table
.
Is there a simpler way to do this? Do I really need all those overloads? (Testing showed do_scan_not
is extraneous here, but I'm thinking more broadly.) Am I abusing facets in the first place? Is the above even correct? Would it be better style to implement less logic?
© Stack Overflow or respective owner