Struct bstr::WordIndices [−][src]
An iterator over words in a byte string and their byte index positions.
This iterator is typically constructed by
ByteSlice::word_indices
.
This is similar to the
WordsWithBreakIndices
iterator,
except it only returns elements that contain a “word” character. A
word character is defined by UTS #18 (Annex C) to be the combination
of the Alphabetic
and Join_Control
properties, along with the
Decimal_Number
, Mark
and Connector_Punctuation
general categories.
Since words are made up of one or more codepoints, this iterator
yields &str
elements (along with their start and end byte offsets).
When invalid UTF-8 is encountered, replacement codepoints are
substituted. Because of this, the
indices yielded by this iterator may not correspond to the length of the
word yielded with those indices. For example, when this iterator encounters
\xFF
in the byte string, then it will yield a pair of indices ranging
over a single byte, but will provide an &str
equivalent to "\u{FFFD}"
,
which is three bytes in length. However, when given only valid UTF-8, then
all indices are in exact correspondence with their paired word.
This iterator yields words in accordance with the default word boundary rules specified in UAX #29. In particular, this may not be suitable for Japanese and Chinese scripts that do not use spaces between words.
Implementations
impl<'a> WordIndices<'a>
[src][−]
pub fn as_bytes(&self) -> &'a [u8]ⓘ
[src][−]
View the underlying data as a subslice of the original data.
The slice returned has the same lifetime as the original slice, and so the iterator can continue to be used while this exists.
Examples
use bstr::ByteSlice; let mut it = b"foo bar baz".word_indices(); assert_eq!(b"foo bar baz", it.as_bytes()); it.next(); it.next(); assert_eq!(b" baz", it.as_bytes()); it.next(); it.next(); assert_eq!(b"", it.as_bytes());
Trait Implementations
impl<'a> Clone for WordIndices<'a>
[src][+]
impl<'a> Debug for WordIndices<'a>
[src][+]
impl<'a> Iterator for WordIndices<'a>
[src][+]
Auto Trait Implementations
impl<'a> RefUnwindSafe for WordIndices<'a>
impl<'a> Send for WordIndices<'a>
impl<'a> Sync for WordIndices<'a>
impl<'a> Unpin for WordIndices<'a>
impl<'a> UnwindSafe for WordIndices<'a>
Blanket Implementations
impl<T> Any for T where
T: 'static + ?Sized,
[src][+]
T: 'static + ?Sized,
impl<T> Borrow<T> for T where
T: ?Sized,
[src][+]
T: ?Sized,
impl<T> BorrowMut<T> for T where
T: ?Sized,
[src][+]
T: ?Sized,
impl<T> From<T> for T
[src][+]
impl<T, U> Into<U> for T where
U: From<T>,
[src][+]
U: From<T>,
impl<I> IntoIterator for I where
I: Iterator,
[src][+]
I: Iterator,
impl<T> ToOwned for T where
T: Clone,
[src][+]
T: Clone,
impl<T, U> TryFrom<U> for T where
U: Into<T>,
[src][+]
U: Into<T>,
impl<T, U> TryInto<U> for T where
U: TryFrom<T>,
[src][+]
U: TryFrom<T>,