Class CodePointTrie

java.lang.Object
com.ibm.icu.util.CodePointMap
com.ibm.icu.util.CodePointTrie
All Implemented Interfaces:
Iterable<CodePointMap.Range>
Direct Known Subclasses:
CodePointTrie.Fast, CodePointTrie.Small

public abstract class CodePointTrie extends CodePointMap
Immutable Unicode code point trie. Fast, reasonably compact, map from Unicode code points (U+0000..U+10FFFF) to integer values. For details see https://icu.unicode.org/design/struct/utrie

This class is not intended for public subclassing.

See Also:
  • Field Details

    • MAX_UNICODE

      private static final int MAX_UNICODE
      See Also:
    • ASCII_LIMIT

      private static final int ASCII_LIMIT
      See Also:
    • FAST_SHIFT

      static final int FAST_SHIFT
      See Also:
    • FAST_DATA_BLOCK_LENGTH

      static final int FAST_DATA_BLOCK_LENGTH
      Number of entries in a data block for code points below the fast limit. 64=0x40 @internal
      See Also:
    • FAST_DATA_MASK

      private static final int FAST_DATA_MASK
      Mask for getting the lower bits for the in-fast-data-block offset. @internal
      See Also:
    • SMALL_MAX

      private static final int SMALL_MAX
      See Also:
    • ERROR_VALUE_NEG_DATA_OFFSET

      private static final int ERROR_VALUE_NEG_DATA_OFFSET
      Offset from dataLength (to be subtracted) for fetching the value returned for out-of-range code points and ill-formed UTF-8/16.
      See Also:
    • HIGH_VALUE_NEG_DATA_OFFSET

      private static final int HIGH_VALUE_NEG_DATA_OFFSET
      Offset from dataLength (to be subtracted) for fetching the value returned for code points highStart..U+10FFFF.
      See Also:
    • BMP_INDEX_LENGTH

      private static final int BMP_INDEX_LENGTH
      The length of the BMP index table. 1024=0x400
      See Also:
    • SMALL_LIMIT

      static final int SMALL_LIMIT
      See Also:
    • SMALL_INDEX_LENGTH

      private static final int SMALL_INDEX_LENGTH
      See Also:
    • SHIFT_3

      static final int SHIFT_3
      Shift size for getting the index-3 table offset.
      See Also:
    • SHIFT_2

      private static final int SHIFT_2
      Shift size for getting the index-2 table offset.
      See Also:
    • SHIFT_1

      private static final int SHIFT_1
      Shift size for getting the index-1 table offset.
      See Also:
    • SHIFT_2_3

      static final int SHIFT_2_3
      Difference between two shift sizes, for getting an index-2 offset from an index-3 offset. 5=9-4
      See Also:
    • SHIFT_1_2

      static final int SHIFT_1_2
      Difference between two shift sizes, for getting an index-1 offset from an index-2 offset. 5=14-9
      See Also:
    • OMITTED_BMP_INDEX_1_LENGTH

      private static final int OMITTED_BMP_INDEX_1_LENGTH
      Number of index-1 entries for the BMP. (4) This part of the index-1 table is omitted from the serialized form.
      See Also:
    • INDEX_2_BLOCK_LENGTH

      static final int INDEX_2_BLOCK_LENGTH
      Number of entries in an index-2 block. 32=0x20
      See Also:
    • INDEX_2_MASK

      static final int INDEX_2_MASK
      Mask for getting the lower bits for the in-index-2-block offset.
      See Also:
    • CP_PER_INDEX_2_ENTRY

      static final int CP_PER_INDEX_2_ENTRY
      Number of code points per index-2 table entry. 512=0x200
      See Also:
    • INDEX_3_BLOCK_LENGTH

      static final int INDEX_3_BLOCK_LENGTH
      Number of entries in an index-3 block. 32=0x20
      See Also:
    • INDEX_3_MASK

      private static final int INDEX_3_MASK
      Mask for getting the lower bits for the in-index-3-block offset.
      See Also:
    • SMALL_DATA_BLOCK_LENGTH

      static final int SMALL_DATA_BLOCK_LENGTH
      Number of entries in a small data block. 16=0x10
      See Also:
    • SMALL_DATA_MASK

      static final int SMALL_DATA_MASK
      Mask for getting the lower bits for the in-small-data-block offset.
      See Also:
    • OPTIONS_DATA_LENGTH_MASK

      private static final int OPTIONS_DATA_LENGTH_MASK
      See Also:
    • OPTIONS_DATA_NULL_OFFSET_MASK

      private static final int OPTIONS_DATA_NULL_OFFSET_MASK
      See Also:
    • OPTIONS_RESERVED_MASK

      private static final int OPTIONS_RESERVED_MASK
      See Also:
    • OPTIONS_VALUE_BITS_MASK

      private static final int OPTIONS_VALUE_BITS_MASK
      See Also:
    • NO_INDEX3_NULL_OFFSET

      static final int NO_INDEX3_NULL_OFFSET
      Value for index3NullOffset which indicates that there is no index-3 null block. Bit 15 is unused for this value because this bit is used if the index-3 contains 18-bit indexes.
      See Also:
    • NO_DATA_NULL_OFFSET

      static final int NO_DATA_NULL_OFFSET
      See Also:
    • ascii

      private final int[] ascii
    • index

      private final char[] index
    • data

      @Deprecated protected final CodePointTrie.Data data
      Deprecated.
      This API is ICU internal only.
    • dataLength

      @Deprecated protected final int dataLength
      Deprecated.
      This API is ICU internal only.
    • highStart

      @Deprecated protected final int highStart
      Deprecated.
      This API is ICU internal only.
      Start of the last range which ends at U+10FFFF.
    • index3NullOffset

      private final int index3NullOffset
      Internal index-3 null block offset. Set to an impossibly high value (e.g., 0xffff) if there is no dedicated index-3 null block.
    • dataNullOffset

      private final int dataNullOffset
      Internal data null block offset, not shifted. Set to an impossibly high value (e.g., 0xfffff) if there is no dedicated data null block.
    • nullValue

      private final int nullValue
  • Constructor Details

    • CodePointTrie

      private CodePointTrie(char[] index, CodePointTrie.Data data, int highStart, int index3NullOffset, int dataNullOffset)
  • Method Details

    • fromBinary

      public static CodePointTrie fromBinary(CodePointTrie.Type type, CodePointTrie.ValueWidth valueWidth, ByteBuffer bytes)
      Creates a trie from its binary form, stored in the ByteBuffer starting at the current position. Advances the buffer position to just after the trie data. Inverse of toBinary(OutputStream).

      The data is copied from the buffer; later modification of the buffer will not affect the trie.

      Parameters:
      type - selects the trie type; this method throws an exception if the type does not match the binary data; use null to accept any type
      valueWidth - selects the number of bits in a data value; this method throws an exception if the valueWidth does not match the binary data; use null to accept any data value width
      bytes - a buffer containing the binary data of a CodePointTrie
      Returns:
      the trie
      See Also:
    • getType

      public abstract CodePointTrie.Type getType()
      Returns the trie type.
      Returns:
      the trie type
    • getValueWidth

      public final CodePointTrie.ValueWidth getValueWidth()
      Returns the number of bits in a trie data value.
      Returns:
      the number of bits in a trie data value
    • get

      public int get(int c)
      Returns the value for a code point as stored in the map, with range checking. Returns an implementation-defined error value if c is not in the range 0..U+10FFFF.
      Specified by:
      get in class CodePointMap
      Parameters:
      c - the code point
      Returns:
      the map value, or an implementation-defined error value if the code point is not in the range 0..U+10FFFF
    • asciiGet

      public final int asciiGet(int c)
      Returns a trie value for an ASCII code point, without range checking.
      Parameters:
      c - the input code point; must be U+0000..U+007F
      Returns:
      The ASCII code point's trie value.
    • maybeFilterValue

      private static final int maybeFilterValue(int value, int trieNullValue, int nullValue, CodePointMap.ValueFilter filter)
    • getRange

      public final boolean getRange(int start, CodePointMap.ValueFilter filter, CodePointMap.Range range)
      Sets the range object to a range of code points beginning with the start parameter. The range start is the same as the start input parameter (even if there are preceding code points that have the same value). The range end is the last code point such that all those from start to there have the same value. Returns false if start is not 0..U+10FFFF. Can be used to efficiently iterate over all same-value ranges in a map. (This is normally faster than iterating over code points and get()ting each value, but may be much slower than a data structure that stores ranges directly.)

      If the CodePointMap.ValueFilter parameter is not null, then the value to be delivered is passed through that filter, and the return value is the end of the range where all values are modified to the same actual value. The value is unchanged if that parameter is null.

      Example:

       int start = 0;
       CodePointMap.Range range = new CodePointMap.Range();
       while (map.getRange(start, null, range)) {
           int end = range.getEnd();
           int value = range.getValue();
           // Work with the range start..end and its value.
           start = end + 1;
       }
       
      Specified by:
      getRange in class CodePointMap
      Parameters:
      start - range start
      filter - an object that may modify the map data value, or null if the values from the map are to be used unmodified
      range - the range object that will be set to the code point range and value
      Returns:
      true if start is 0..U+10FFFF; otherwise no new range is fetched
    • toBinary

      public final int toBinary(OutputStream os)
      Parameters:
      os - the output stream
      Returns:
      the number of bytes written
    • fastIndex

      @Deprecated protected final int fastIndex(int c)
      Deprecated.
      This API is ICU internal only.
    • smallIndex

      @Deprecated protected final int smallIndex(CodePointTrie.Type type, int c)
      Deprecated.
      This API is ICU internal only.
    • internalSmallIndex

      private final int internalSmallIndex(CodePointTrie.Type type, int c)
    • cpIndex

      @Deprecated protected abstract int cpIndex(int c)
      Deprecated.
      This API is ICU internal only.