In code generation, it’s very common to need to create a valid identifier. For instance, you may need to create a class name, field, property, method, etc based on the name of a table, a file name, a registry key. The problem with this type of generation is that the rules that make one valid may not apply to the other. For instance, “1st File.txt” is a perfectly valid file name, but it’s not a valid C# identifiers.
Valid identifiers in C# are defined in the C# Language Specification, item 2.4.2. The rules are very simple:
- An identifier must start with a letter or an underscore
- After the first character, it may contain numbers, letters, connectors, etc
- If the identifier is a keyword, it must be prepended with “@”
Applying these rules is pretty straightforward. The following code validates items 1 and 2:
private string CleanName(string name)
{ //Compliant with item 2.4.2 of the C# specification
System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(@"[^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Nl}\p{Mn}\p{Mc}\p{Cf}\p{Pc}\p{Lm}]");
string ret = regex.Replace(name, "_"); //The identifier must start with a character if (!char.IsLetter(ret, 0))
ret = string.Concat("_", ret);
return ret;
}
To validate item 3, you can use the C# provider as follows:
ret = Microsoft.CSharp.CSharpCodeProvider.CreateProvider("C#").CreateEscapedIdentifier(ret);
This code will generate an underscore for each space in the identifier. For instance, “c:\1st file.txt” will be generated as “c__1st_file_txt”. If you want to prevent that, change the regex.Replace(name, “_”) with regex.Replace(name, “”). You may also consider capitalizing the first letter after each “_” and then eliminating the “_”.
Finally, you may prefer to have the “keyword” identifiers named with a prefix different from “@”. If that’s the case, use IsValidIdentifier in the CodeDomProvider to know which identifiers are keywords. A full corrected code snippet is below:
private static string CleanName(string name)
{
//Compliant with item 2.4.2 of the C# specification
System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(@"[^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Nl}\p{Mn}\p{Mc}\p{Cf}\p{Pc}\p{Lm}]");
string ret = regex.Replace(name, "");
//The identifier must start with a character or a "_"
if (!char.IsLetter(ret, 0) !Microsoft.CSharp.CSharpCodeProvider.CreateProvider("C#").IsValidIdentifier(ret))
ret = string.Concat("_", ret);
return ret;
}
The only problem you may find after this is with duplicated identifiers. “c:x” will generate the same identifier as “c.x”, which may be a problem depending on your particular code generation needs. If you run into this, use a list to store already used identifiers. When you find that an identifier has been used, add a number at the end and check again.

6 comments:
Thanks! Very useful.
Mauricio.
Mavellous !
Great !!!! Useful to me...
nintendo dsi r4
thanks...
Nice and concise. Thanks for the regex! Saved a bunch of time and made me learn the \p{} stuff I didnt know about regex.
thanks;;;coz ilearn;;;;;;
Post a Comment