
If you want to include spaces, but not _, just swap them every where in the regex. I don't know where the java regex fits in those, but you may need to modify the regex to cater for its idiosyncrasies. Which you can check out live on, for pcre (php), javascript and python regex engines. So, to build up a full regular expression, start with the look-ahead blocks, then add the blacklisted character block before the final $.įor example, to limit the total numbers of characters, say between 3 and 15 inclusive, start with the positive look-ahead block (?=^.|\\\/^~%# : ,$%?\0-\cZ]+$ Even the ancient VB6/VBA (Office) 5.5 regex engine supports look-ahead. Effectively, each look-ahead block will be preceded by the ^, and if its pattern is greedy, include up to the $. These are sections bounded by (?=) for positive, and (?!) for negative, and effectively become AND blocks, because when the block is processed, if not failed, the regex processor will begin at the start of the text with the next block.

However, sometimes it pays to break down the requirements, and handle each separately. The characters to blacklist then need to be chosen according what is illegal for the purpose for which the data is required.

In 2015, unless designing for a specific country, a blacklist is the only way to accommodate the vast number of characters that may be valid. Even in 2009, it seems too many had a very limited idea of what designing for the WORLDWIDE web involved.
