Match word of specific length
Match word of specific length
How do I match a word of a specific length, let's say, five?
Given the input file temp
of text:
temp
1) ci sono quattro mele
2) sentiamoci il 16 ottobre 2018
3) decidiamo il 17 ottabre 2017
4) Manipolo di eroi
5) 17 mele
6) 18 ott 2020
7) una mela e mezza
8) 2 mele
If i do:
awk '/[[:lower:]]{5}/ {print}' temp
I would expect as output the sentence 7)
, because is the only one with a word of length 5
(mezza
). Actually, it returns every line with a word of length equal or superior than 5
.
7)
5
mezza
5
This behavior is not compatible with any source of information I consulted:
The construct {n}
should match exactly n
times. At this point, I am afraid I am missing something obvious.
{n}
n
It's probably possible to find a better duplicate; this is definitely a common FAQ.
– tripleee
1 hour ago
You need word boundaries. That's it.
– revo
56 mins ago
3 Answers
3
It's matching because it found a string of 5 lowercase letters within the string of longer length. You need to adapt your regex so that the "word" match is surrounded by white space. Don't forget to also address the start/end of the string in the "word" boundary.
So
(^|[[:space:]])[[:lower:]]{5}([[:space:]]|$)
or possibly also inclule numbers, punctuation, and/or uppercase in the boundary conditions.– tripleee
1 hour ago
(^|[[:space:]])[[:lower:]]{5}([[:space:]]|$)
if it's always surrounded by spaces you can do the following[[:lower:]]{5}s+
or s+[[:lower:]]{5}s+
(depending what you want to do)
[[:lower:]]{5}s+
s+[[:lower:]]{5}s+
Awk doesn't support
s
in any version I'm familiar with.– tripleee
1 hour ago
s
@tripleee GNU awk does. That's the only one AFAIK but of course GNU awk also supports word boundaries which might be the more appropriate construct in this case.
– Ed Morton
6 mins ago
With GNU awk for word boundaries <
and >
and w
for word characters:
<
>
w
$ awk '/<w{5}>/' file
7) una mela e mezza
With any awk:
$ awk '/(^|[^[:alpha:]])[[:alpha:]]{5}([^[:alpha:]]|$)/' file
7) una mela e mezza
Those and any other solution will obviously depend on what you mean by a "word".
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Possible duplicate of Why does this regex with no special character match a longer string?
– tripleee
1 hour ago