Mega Code Archive

 
Categories / Delphi / System
 

Determining the actual length of a DBCS string

Title: Determining the actual length of a DBCS string Question: How can I get the length in characters of a multibyte-character string? Function Length returns the length in bytes, but in Eastern languages some characters may take more than one byte... Answer: Introduction ------------ The Length function returns the length of a string, but it behaves differently according to the type of the string. For the old short strings (ShortString) and for long strings (AnsiString), Length returns the number of bytes they take, while for wide (Unicode) strings (WideString) it returns the number of wide characters (WideChar), that is, the number of bytes divided by two. In the case of short and long strings, in Western languages one character takes one byte, while for example in Asian languages some characters take one and others two bytes. For this reason, there are two versions of almost all string functions, one of great performance that only works with single-byte character strings (SBCS) and another -less performant- one that also works with strings where a character can take one or two bytes (DBCS) that are used in applications distributed internationally. This way we have functions like Pos, LowerCase and UpperCase on one side and AnsiPos, AnsiLowerCase and AnsiUpperCase on the other. Curiosly there is no AnsiLength function that returns the number of characters in a DBCS. AnsiLength (Draft) ------------------ Then here it goes a function that returns the number of characters in a double-byte character string: function AnsiLength(const s: string): integer; var i, n: integer; begin Result := 0; n := Length(s); i := 1; while i inc(Result); if s[i] in LeadBytes then inc(i); inc(i); end; end; AnsiLength (Final) ------------------ Naturally, this function is not optimized. We are not going to mess with assembler, but at least we can use pointers: function AnsiLength(const s: string): integer; var p, q: pchar; begin Result := 0; p := PChar(s); q := p + Length(s); while p inc(Result); if p^ in LeadBytes then inc(p, 2) else inc(p); end; end;