No need for regexIf you want to keep only the consonants (considering that they are only the ASCII characters, i.e. no characters like "ñ", "ÿ", etc.), just do something like:nome = "T3ste@_test@and000_ t3ste"
vogais = 'aeiouAEIOU'
print(''.join(c for c in nome if (('a' <= c <= 'z') or ('A' <= c <= 'Z')) and c not in vogais))
I used a https://docs.python.org/3/reference/expressions.html#generator-expressions that goes through the characters of the string and takes only the consonants (I see if it is a letter from "A" to "Z" and if it is not vowel). Then all together in a single string, using join.In this case, the result will be Tsttstndtst (only string consonants).Or, instead of having a variable with vowels and see if the letter is not one of them, do the opposite, create a variable containing all consonants, and for each letter of the string you check if it is one of them:consoantes = 'bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ'
nome = "T3ste@_test@and000_ t3ste"
print(''.join(c for c in nome if c in consoantes))
In this way I do not even need to check if it is letter (if it is not consonant, whether it is letter or not, because it will not be included in the final result).If you want, you can use a generating function that returns only the consonants, and then you use it in the join:def get_consoantes(s):
consoantes = 'bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ'
for c in s:
# se é consoante
if c in consoantes:
yield c
print(''.join(get_consoantes(nome))) # Tsttstndtst
By doing c in consoantes, we are doing a linear search in the string consoantes. But if many searches are made, a small optimization that can be done is to use set:consoantes = set('bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ')
print(''.join(c for c in nome if c in consoantes))
The search in a set has constant time (see https://wiki.python.org/moin/TimeComplexity ), so it becomes more optimized than if we search for a string (see comparison at the end).But if you really want to use regex...I think the solution above is much simpler. With regex, in my opinion, it would be more complicated:import re
r = re.compile('[^b-df-hj-np-t-v-z]', re.I)
nome = "T3ste@_test@and000_ t3ste"
print(r.sub('', nome)) # Tsttstndtst
I used one https://www.regular-expressions.info/charclass.html#negated to get everything No is consonant (all that is between [^ and ] is denied by regex). And inside I put the intervals to indicate the letters I don't want: b-d are the letters between "b" and "d", j-n are the letters between "j" and "n", etc. And I used it flag re.I, which ignores the difference between capital and lowercase. That is, this regex takes any letter that is not a consonant. Lastly, use sub to exchange all this for '' (empty string), which in the end is the same as removing them.Use \W, as is in your code, it doesn't work well, because so you will also keep the digits and the character _, but as you said you only want the consonants, then \W is not a good option (and try to arrange it with another regex then, it is worse still; if it is even to use regex, use one that goes straight to the point).In https://pt.stackoverflow.com/a/527679/112052 was suggested the use of https://www.regular-expressions.info/charclassintersect.html but the https://docs.python.org/3/library/re.html Python native No supports this feature (until the current version, which is 3.9; it may be that in the future it changes, but for now it cannot be used).If you want to use this feature, currently the option is to install https://pypi.org/project/regex/ :# ATENÇÃO: módulo externo, deve ser instalado com pip: https://pypi.org/project/regex
import regex
r = regex.compile('[^a-z&&[^aeiou]]', regex.IGNORECASE | regex.VERSION1)
print(r.sub('', nome))
In the case, the a-z&&[^aeiou] are all letters from "a" to "z", except vowels (i.e. only consonants). And [^ at first denies these characters, so regex takes everything that is not consonant.Still, I don't think you need regex. The first solutions above are, in my opinion, much simpler and clearer.Just as curiosity, follow a comparison of the solutions, using the https://docs.python.org/3/library/timeit.html to measure the times:from timeit import timeit
import re
consoantes_string = 'bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ'
consoantes_set = set('bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ')
r = re.compile('[^b-df-hj-np-t-v-z]', re.I)
nome = "T3ste@_test@and000_ t3ste"
executa 1 milhão de vezes cada teste
params = { 'number' : 1000000, 'globals': globals() }
imprime os tempos em segundos
print(timeit("''.join(c for c in nome if c in consoantes_set)", **params))
print(timeit("''.join(c for c in nome if c in consoantes_string)", **params))
print(timeit("r.sub('', nome)", **params))
Times are printed in seconds, and may vary from one machine to another. In my result was:1.3250721090007573
1.4059548949999225
3.3496847289998186
That is, with set was slightly faster than using a string with consonants, while regex was almost 3 times slower.Of course, for small strings being processed few times the difference will be insignificant, but this is another reason not to prefer regex. With for and if the code gets - in my opinion - simpler and clearer to understand and maintain, and still has the bonus to be faster. Regex may be legal, but https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/ .