Mega Code Archive

 
Categories / Delphi / Examples
 

Validating email addresses in delphi

Is an email address valid? Validating email addresses in Delphi Nowadays it's very common that our programs store email addresses in databases as part of the data of personnel, customers, providers, etc. When prompting the user for an email address, how do we know if the entered value is formally correct? In this article I'll show you how to validate email addresses using a variation of the RFC #822. The RFC #822 rules the "STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES". According to this rule, the following are valid email addresses: John Doe johndoe@server.com John Doe <johndoe@server.com> "John Doe" johndoe@server.com "John Doe" <johndoe@server.com> The purpose of my code is not to validate such things, but strictly what is necessary to reach a single recipient (like "johndoe@server.com"), that in the specification is referred as an "addr-spec", which has the form: local-part@domain local-part = one "word" or more, separated by periods domain = one "sub-domain" or more, separated by periods A "word" can be an "atom" or a "quoted-string": atom = one or more chars in the range #33..#126 except ()<>@,;:\/".[] quoted-string = A text enclosed in double quotes that can contain 0 or more characters (#0..#127) except '"' and #13. A backslash ('\') quotes the next character. A "sub-domain" can be a "domain-ref" (an "atom") or a "domain-literal": domain-literal = A text enclosed in brackets that can contain 0 or more characters (#0..#127) except '[', ']' and #13. A backslash ('\') quotes the next character. According to the RFC 822, extended characters (#128..#255) cannot be part of an email address, however many mail servers accept them and people use them, so I'm going to take them into account. The RFC 822 is very open about domain names. For a real Internet email address maybe we should restrict the domain part. You can read more about domain names in the RFC #1034 and RFC #1035. For the RFC 1034 and the RFC 1035, a domain name is formed by "sub-domains" separated by periods, and each subdomain starts with a letter ('a'..'z', 'A'..'Z') and should be followed by zero or more letters, digits and hyphens, but cannot end with a hyphen. We are going to consider that a valid domain should have at least two "sub-domains" (like "host.com"). Now that we have the rules clear, let's get to the work. The algorithm for the function resembles a states-transition machine. Characters of the string are processed in a loop, and for each character first we determine in which state the machine is and then we process the character accordingly, to determine if the machine should continue in that state, switch to a different state or produce an error (breaking the loop). These kind of algorithms are extensively treated in programming-algorithms textbooks, so let's get right to the code: function ValidEmail(email: string): boolean; // Returns True if the email address is valid // Author: Ernesto D'Spirito const // Valid characters in an "atom" atom_chars = [#33..#255] - ['(', ')', '<', '>', '@', ',', ';', ':', '\', '/', '"', '.', '[', ']', #127]; // Valid characters in a "quoted-string" quoted_string_chars = [#0..#255] - ['"', #13, '\']; // Valid characters in a subdomain letters = ['A'..'Z', 'a'..'z']; letters_digits = ['0'..'9', 'A'..'Z', 'a'..'z']; subdomain_chars = ['-', '0'..'9', 'A'..'Z', 'a'..'z']; type States = (STATE_BEGIN, STATE_ATOM, STATE_QTEXT, STATE_QCHAR, STATE_QUOTE, STATE_LOCAL_PERIOD, STATE_EXPECTING_SUBDOMAIN, STATE_SUBDOMAIN, STATE_HYPHEN); var State: States; i, n, subdomains: integer; c: char; begin State := STATE_BEGIN; n := Length(email); i := 1; subdomains := 1; while (i <= n) do begin c := email[i]; case State of STATE_BEGIN: if c in atom_chars then State := STATE_ATOM else if c = '"' then State := STATE_QTEXT else break; STATE_ATOM: if c = '@' then State := STATE_EXPECTING_SUBDOMAIN else if c = '.' then State := STATE_LOCAL_PERIOD else if not (c in atom_chars) then break; STATE_QTEXT: if c = '\' then State := STATE_QCHAR else if c = '"' then State := STATE_QUOTE else if not (c in quoted_string_chars) then break; STATE_QCHAR: State := STATE_QTEXT; STATE_QUOTE: if c = '@' then State := STATE_EXPECTING_SUBDOMAIN else if c = '.' then State := STATE_LOCAL_PERIOD else break; STATE_LOCAL_PERIOD: if c in atom_chars then State := STATE_ATOM else if c = '"' then State := STATE_QTEXT else break; STATE_EXPECTING_SUBDOMAIN: if c in letters then State := STATE_SUBDOMAIN else break; STATE_SUBDOMAIN: if c = '.' then begin inc(subdomains); State := STATE_EXPECTING_SUBDOMAIN end else if c = '-' then State := STATE_HYPHEN else if not (c in letters_digits) then break; STATE_HYPHEN: if c in letters_digits then State := STATE_SUBDOMAIN else if c <> '-' then break; end; inc(i); end; if i <= n then Result := False else Result := (State = STATE_SUBDOMAIN) and (subdomains >= 2); end;