Sunday, February 8, 2009

Creating valid C# identifiers

In code generation, it’s very common to need to create a valid identifier. For instance, you may need to create a class name, field, property, method, etc based on the name of a table, a file name, a registry key. The problem with this type of generation is that the rules that make one valid may not apply to the other. For instance, “1st File.txt” is a perfectly valid file name, but it’s not a valid C# identifiers.

Valid identifiers in C# are defined in the C# Language Specification, item 2.4.2. The rules are very simple:

- An identifier must start with a letter or an underscore
- After the first character, it may contain numbers, letters, connectors, etc
- If the identifier is a keyword, it must be prepended with “@”

Applying these rules is pretty straightforward. The following code validates items 1 and 2:

private string CleanName(string name)

{ //Compliant with item 2.4.2 of the C# specification
System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(@"[^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Nl}\p{Mn}\p{Mc}\p{Cf}\p{Pc}\p{Lm}]");
string ret = regex.Replace(name, "_"); //The identifier must start with a character if (!char.IsLetter(ret, 0))

ret = string.Concat("_", ret);
return ret;
}


To validate item 3, you can use the C# provider as follows:

ret = Microsoft.CSharp.CSharpCodeProvider.CreateProvider("C#").CreateEscapedIdentifier(ret); 


This code will generate an underscore for each space in the identifier. For instance, “c:\1st file.txt” will be generated as “c__1st_file_txt”. If you want to prevent that, change the regex.Replace(name, “_”) with regex.Replace(name, “”). You may also consider capitalizing the first letter after each “_” and then eliminating the “_”.


Finally, you may prefer to have the “keyword” identifiers named with a prefix different from “@”. If that’s the case, use IsValidIdentifier in the CodeDomProvider to know which identifiers are keywords. A full corrected code snippet is below:

private static string CleanName(string name)
{
//Compliant with item 2.4.2 of the C# specification

System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(@"[^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Nl}\p{Mn}\p{Mc}\p{Cf}\p{Pc}\p{Lm}]");
string ret = regex.Replace(name, "");
//The identifier must start with a character or a "_"
if (!char.IsLetter(ret, 0) !Microsoft.CSharp.CSharpCodeProvider.CreateProvider("C#").IsValidIdentifier(ret))

ret = string.Concat("_", ret);
return ret;
}

The only problem you may find after this is with duplicated identifiers. “c:x” will generate the same identifier as “c.x”, which may be a problem depending on your particular code generation needs. If you run into this, use a list to store already used identifiers. When you find that an identifier has been used, add a number at the end and check again.

6 comments:

Anonymous said...

Thanks! Very useful.

Mauricio.

Softlion said...

Mavellous !

sandra said...

Great !!!! Useful to me...

nintendo dsi r4

Anonymous said...

thanks...

Gishu said...

Nice and concise. Thanks for the regex! Saved a bunch of time and made me learn the \p{} stuff I didnt know about regex.

Anonymous said...

thanks;;;coz ilearn;;;;;;

Post a Comment