Mega Code Archive

 
Categories / Delphi / Algorithm Math
 

Extended E mail Address Verification and Correction

Title: Extended E-mail Address Verification and Correction Question: Have you ever needed to verify that an e-mail address is correct, or have you had to work with a list of e-mail addresses and realized that some had simple problems that you could easily correct by hand? Answer: Article Updated 9/20/2000 Please note: I have made a correction to the e-mail validation function to correct a bug with handling small e-mail addresses like a@a.com (not a very common address). If you would like to be notified of any future enhancements or bug fixes, please e-mail me. Thanks to Simon for catching this bug! Thanks, enjoy!!! Extended E-mail Address Verification and Correction Have you ever needed to verify that an e-mail address is correct, or have you had to work with a list of e-mail addresses and realized that some had simple problems that you could easily correct by hand? Well the functions I present here are designed to do just that. In this article I present two functions, one to check that an e-mail address is valid, and another to try to correct an incorrect e-mail address. Just what is a correct e-mail address? The majority of articles Ive seen on e-mail address verification use an over-simplified approach. For example, the most common approach Ive seen is to ensure that an @ symbol is present, or that its a minimum size (ex. 7 characters), or a combination of both. And a better, but less used method is to verify that only allowed characters (based on the SMTP standard) are in the address. The problem with these approaches is that they only can tell you at the highest level that an address is POSSIBLY correct, for example: The address: ------@-------- Can be considered a valid e-mail address, as it does contain an @, is at least 7 characters long and contains valid characters. To ensure an address is truly correct, you must verify that all portions of the e-mail address are valid. The function I present performs the following checks: a) Ensure an address is not blank b) Ensure an @ is present c) Ensure that only valid characters are used Then splits the validation to the two individual sections: username (or mailbox) and domain Validation for the username: a) Ensure it is not blank b) Ensure the username is not longer than the current standard (RFC 821) c) Ensures that periods (.) are used properly, specifically there can not be sequential periods (ex. David..Lederman is not valid) nor can there be a period in the first or last character of an e-mail address Validation for the domain name: a) Ensure it is not blank b) Ensure the domain name is not longer than the current standard d) Ensure that periods (.) are used properly, specifically there can not be sequential periods (ex. World..net is not valid) nor can there a period in the first or last character of the domain segment e) Domain segments need to be checked (ex. in someplace.somewhere.com, someplace, somewhere, and com are considered segments) to ensure that they do not start or end with a hyphen (-) (ex. somewhere.-someplace.com, is not valid) f) Ensure that at least two domain segments exists (ex. someplace.com is valid, .com is not valid) g) Ensure that there are no additional @ symbols in the domain portion With the steps above most syntactically valid e-mail address that are not correct can be detected and invalidated. The VerifyEmailAddress function: This function takes 3 parameters: Email The e-mail address to check FailCode The error code reported by the function if it cant validate an address FailPosition The position of the character (if available) where the validation failure occurred The function returns a Boolean value that returns True if the address is valid, and False if it is invalid. If a failure does occur the FailCode can be used to determine the exact error that caused the problem: flUnknown An unknown error occurred, and was trapped by the exception handler. flNoSeperator No @ symbol was found. flToSmall The email address was blank. flUserNameToLong The user name was longer than the SMTP standard allows. flDomainNameToLong The domain name was longer than the SMTP standard allows. flInvalidChar An invalid character was found. (FailPosition returns the location of the character) flMissingUser The username section is not present. flMissingDomain The domain name section is not present flMissingDomainSeperator No domain segments where found flMissingGeneralDomain No top-level domain was found flToManyAtSymbols More than one @ symbol was found For simple validation there is no use for FailCode and FailPosition, but can be used to display an error using the ValidationErrorString which takes the FailCode as a parameter and returns a text version of the error which can then be displayed. E-mail Address Correction Since the e-mail validation routine returns detailed error information an automated system to correct common e-mail address mistakes can be easily created. The following common mistakes can all be corrected automatically: example2.aol.com The most common error (at least in my experience) is when entering an e-mail address a user doesnt hold shift properly and instead enters a 2. example@.aol.com - This error is just an extra character entered by the user, of course example@aol.com was the intended e-mail address. example8080 @ aol .com In this case another common error, spaces. A Cool Screen name@AOL.com In this case the user entered what they thought was their e-mail address, except while AOL allows screen names to contain spaces, the Internet does not. myaddress@ispcom - In this case the period was not entered between ISP and Com. The CorrectEmailAddress function: The function takes three parameters: Email The e-mail address to check and correct Suggestion This string passed by reference contains the functions result MaxCorrections The maximum amount of corrections to attempt before stopping (defaults to 5) This function simply loops up to MaxCorrection times, validating the e-mail address then using the FailCode to decide what kind of correction to make, and repeating this until it find a match, determines the address cant be fixed, or has looped more than MaxCorrection times. The following corrections are performed, based on the FailCode (see description above): flUnknown Simply stops corrections, as there is no generic way to correct this problem. flNoSeperator When this error is encountered the system performs a simple but powerful function, it will navigate the e-mail address until it finds the last 2, and then convert it to an @ symbol. This will correct most genuine transposition errors. If it converts a 2 that was not really an @ chances are it has completely invalidated the e-mail address. flToSmall - Simply stops corrections, as there is no generic way to correct this problem. flUserNameToLong Simply stops corrections, as there is no generic way to correct this problem. flDomainNameToLong Simply stops corrections, as there is no generic way to correct this problem. flInvalidChar In this case the offending character is simply deleted. flMissingUser Simply stops corrections, as there is no generic way to correct this problem. flMissingDomain Simply stops corrections, as there is no generic way to correct this problem. flMissingDomainSeperator Simply stops corrections, as there is no generic way to correct this problem. flMissingGeneralDomain Simply stops corrections, as there is no generic way to correct this problem. flToManyAtSymbols Simply stops corrections, as there is no generic way to correct this problem. While only a small portion of errors can be corrected the function can correct the most common errors encountered when working with list of e-mail addresses, specifically when the data is entered by the actual e-mail address account holder. I hope you found this article and function to be useful; Id love to hear your comments, suggestions, etc. -David Lederman dlederman@InterentToolsCorp.com The following is the source code for the functions described above, feel free to use the code in your own programs, but please leave my name and address intact! // ---------------------------ooo------------------------------ \\ // 2000 David Lederman // dlederman@internettoolscorp.com // ---------------------------ooo------------------------------ \\ unit abSMTPRoutines; interface uses SysUtils, Classes; // ---------------------------ooo------------------------------ \\ // These constants represent the various errors validation // errors (known) that can occur. // ---------------------------ooo------------------------------ \\ const flUnknown = 0; flNoSeperator = 1; flToSmall = 2; flUserNameToLong = 3; flDomainNameToLong = 4; flInvalidChar = 5; flMissingUser = 6; flMissingDomain = 7; flMissingDomainSeperator = 8; flMissingGeneralDomain = 9; flToManyAtSymbols = 10; function ValidateEmailAddress(Email : String; var FailCode, FailPosition : Integer) : Boolean; function CorrectEmailAddress(Email : String; var Suggestion : String; MaxCorrections : Integer = 5) : Boolean; function ValidationErrorString(Code : Integer) : String; implementation // ---------------------------ooo------------------------------ \\ // This is a list of error descriptions, it's kept in the // implementation section as it's not needed directlly // from outside this unit, and can be accessed using the // ValidationErrorString which does range checking. // ---------------------------ooo------------------------------ \\ const ErrorDescriptions : array[0..10] of String = ('Unknown error occured!', 'Missing @ symbol!', 'Data to small!', 'User name to long!', 'Domain name to long!', 'Invalid character!', 'Missing user name!', 'Missing domain name!', 'Missing domain portion (.com,.net,etc)', 'Invalid general domain!', 'To many @ symbols!'); AllowedEmailChars : set of Char = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T', 'U','V','W','X','Y','Z','a','b','c','d','e','f','g','h','i','j','k','l','m','n', 'o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7', '8','9','@','-','.','_', '''', '+', '$', '/', '%']; MaxUsernamePortion = 64; // Per RFC 821 MaxDomainPortion = 256; // Per RFC 821 function CorrectEmailAddress; var CurITT, RevITT, ITT, FailCode, FailPosition, LastAt : Integer; begin try // Reset the suggestion Suggestion := Email; CurITT := 1; // Now loop through to the max depth for ITT := CurITT to MaxCorrections do // Iterate begin // Now try to validate the address if ValidateEmailAddress(Suggestion, FailCode, FailPosition) then begin // The email worked so exit result := True; exit; end; // Otherwise, try to correct it case FailCode of // flUnknown: begin // This error can't be fixed Result := False; exit; end; flNoSeperator: begin // This error can possibly be fixed by finding // the last 2 (which was most likely transposed for an @) LastAt := 0; for RevITT := 1 to Length(Suggestion) do // Iterate begin // Look for the 2 if Suggestion[RevITT] = '2' then LastAt := RevITT; end; // for // Now see if we found an 2 if LastAt = 0 then begin // The situation can't get better so exit Result := False; exit; end; // Now convert the 2 to an @ and continue Suggestion[LastAt] := '@'; end; flToSmall: begin // The situation can't get better so exit Result := False; exit; end; flUserNameToLong: begin // The situation can't get better so exit Result := False; exit; end; flDomainNameToLong: begin // The situation can't get better so exit Result := False; exit; end; flInvalidChar: begin // Simply delete the offending char Delete(Suggestion, FailPosition, 1); end; flMissingUser: begin // The situation can't get better so exit Result := False; exit; end; flMissingDomain: begin // The situation can't get better so exit Result := False; exit; end; flMissingDomainSeperator: begin // The best correction we can make here is to go back three spaces // and insert a . // Instead of checking the length of the string, we'll let an // exception shoot since at this point we can't make things any better // (suggestion wise) Insert('.', Suggestion, Length(Suggestion) - 2); end; flMissingGeneralDomain: begin // The situation can't get better so exit Result := False; exit; end; flToManyAtSymbols: begin // The situation can't get better so exit Result := False; exit; end; end; // case end; // for // If we got here fail Result := False; except // Just return false Result := false; end; end; // ---------------------------ooo------------------------------ \\ // This function will validate an address, much further than // simply verifying the syntax as the RFC (821) requires // ---------------------------ooo------------------------------ \\ function ValidateEmailAddress; var DataLen, SepPos, Itt, DomainStrLen, UserStrLen, LastSep, SepCount, PrevSep : Integer; UserStr, DomainStr, SubDomain : String; begin try // Get the data length DataLen := Length(Email); // Make sure that the string is not blank if DataLen = 0 then begin // Set the result and exit FailCode := flToSmall; Result := False; Exit; end; // First real validation, ensure the @ seperator SepPos := Pos('@', Email); if SepPos = 0 then begin // Set the result and exit FailCode := flNoSeperator; Result := False; Exit; end; // Now verify that only the allowed characters are in the system for Itt := 1 to DataLen do // Iterate begin // Make sure the character is allowed if not (Email[Itt] in AllowedEmailChars) then begin // Report an invalid char error and the location FailCode := flInvalidChar; FailPosition := Itt; result := False; exit; end; end; // for // Now split the string into the two elements: user and domain UserStr := Copy(Email, 1, SepPos -1); DomainStr := Copy(Email, SepPos + 1, DataLen); // If either the user or domain is missing then there's an error if (UserStr = '') then begin // Report a missing section and exit FailCode := flMissingUser; Result := False; exit; end; if (DomainStr = '') then begin // Report a missing section and exit FailCode := flMissingDomain; Result := False; exit; end; // Now get the lengths of the two portions DomainStrLen := Length(DomainStr); UserStrLen := Length(UserStr); // Ensure that either one of the sides is not to large (per the standard) if DomainStrLen MaxDomainPortion then begin FailCode := flDomainNameToLong; Result := False; exit; end; if UserStrLen MaxUserNamePortion then begin FailCode := flUserNameToLong; Result := False; exit; end; // Now verify the user portion of the email address // Ensure that the period is neither the first or last char (or the only char) // Check first char if (UserStr[1] = '.') then begin // Report a missing section and exit FailCode := flInvalidChar; Result := False; FailPosition := 1; exit; end; // Check end char if (UserStr[UserStrLen] = '.') then begin // Report a missing section and exit FailCode := flInvalidChar; Result := False; FailPosition := UserStrLen; exit; end; // No direct checking for a single char is needed since the previous two // checks would have detected it. // Ensure no subsequent periods for Itt := 1 to UserStrLen do // Iterate begin if UserStr[Itt] = '.' then begin // Check the next char, to make sure it's not a . if UserStr[Itt + 1] = '.' then begin // Report the error FailCode := flInvalidChar; Result := False; FailPosition := Itt; exit; end; end; end; // for { At this point, we've validated the user name, and will now move into the domain.} // Ensure that the period is neither the first or last char (or the only char) // Check first char if (DomainStr[1] = '.') then begin // Report a missing section and exit FailCode := flInvalidChar; Result := False; // The position here needs to have the user name portion added to it // to get the right number, + 1 for the now missing @ FailPosition := UserStrLen + 2; exit; end; // Check end char if (DomainStr[DomainStrLen] = '.') then begin // Report a missing section and exit FailCode := flInvalidChar; Result := False; // The position here needs to have the user name portion added to it // to get the right number, + 1 for the now missing @ FailPosition := UserStrLen + 1 + DomainStrLen; exit; end; // No direct checking for a single char is needed since the previous two // checks would have detected it. // Ensure no subsequent periods, and while in the loop count the periods, and // record the last one, and while checking items, verify that the domain and // subdomains to dont start or end with a - SepCount := 0; LastSep := 0; PrevSep := 1; // Start of string for Itt := 1 to DomainStrLen do // Iterate begin if DomainStr[Itt] = '.' then begin // Check the next char, to make sure it's not a . if DomainStr[Itt + 1] = '.' then begin // Report the error FailCode := flInvalidChar; Result := False; FailPosition := UserStrLen + 1 + Itt; exit; end; // Up the count, record the last sep Inc(SepCount); LastSep := Itt; // Now verify this domain SubDomain := Copy(DomainStr, PrevSep, (LastSep) - PrevSep); // Make sure it doens't start with a - if SubDomain[1] = '-' then begin FailCode := flInvalidChar; Result := False; FailPosition := UserStrLen + 1 + (PrevSep); exit; end; // Make sure it doens't end with a - if SubDomain[Length(SubDomain)] = '-' then begin FailCode := flInvalidChar; Result := False; FailPosition := (UserStrLen + 1) + LastSep - 1; exit; end; // Update the pointer PrevSep := LastSep + 1; end else begin if DomainStr[Itt] = '@' then begin // Report an error FailPosition := UserStrLen + 1 + Itt; FailCode := flToManyAtSymbols; result := False; exit; end; end; end; // for // Verify that there is at least one . if SepCount begin FailCode := flMissingDomainSeperator; Result := False; exit; end; // Now do some extended work on the final domain the most general (.com) // Verify that the lowest level is at least 2 chars SubDomain := Copy(DomainStr, LastSep, DomainStrLen); if Length(SubDomain) begin FailCode := flMissingGeneralDomain; Result := False; exit; end; // Well after all that checking, we should now have a valid address Result := True; except Result := False; FailCode := -1; end; // try/except end; // ---------------------------ooo------------------------------ \\ // This function returns the error string from the constant // array, and makes sure that the error code is valid, if // not it returns an invalid error code string. // ---------------------------ooo------------------------------ \\ function ValidationErrorString(Code : Integer) : String; begin // Make sure a valid error code is passed if (Code High(ErrorDescriptions)) then begin Result := 'Invalid error code!'; exit; end; // Get the error description from the constant array Result := ErrorDescriptions[Code]; end; end.