Match word of specific length

Match word of specific length

How do I match a word of a specific length, let's say, five?

Given the input file temp of text:

temp

1) ci sono quattro mele 2) sentiamoci il 16 ottobre 2018 3) decidiamo il 17 ottabre 2017 4) Manipolo di eroi 5) 17 mele 6) 18 ott 2020 7) una mela e mezza 8) 2 mele

If i do:

awk '/[[:lower:]]{5}/ {print}' temp

I would expect as output the sentence 7), because is the only one with a word of length 5 (mezza). Actually, it returns every line with a word of length equal or superior than 5.

7)

5

mezza

5

This behavior is not compatible with any source of information I consulted:

The construct {n} should match exactly n times. At this point, I am afraid I am missing something obvious.

{n}

n

Possible duplicate of Why does this regex with no special character match a longer string?
– tripleee
1 hour ago

It's probably possible to find a better duplicate; this is definitely a common FAQ.
– tripleee
1 hour ago

You need word boundaries. That's it.
– revo
56 mins ago

3 Answers
3

It's matching because it found a string of 5 lowercase letters within the string of longer length. You need to adapt your regex so that the "word" match is surrounded by white space. Don't forget to also address the start/end of the string in the "word" boundary.

So (^|[[:space:]])[[:lower:]]{5}([[:space:]]|$) or possibly also inclule numbers, punctuation, and/or uppercase in the boundary conditions.
– tripleee
1 hour ago

(^|[[:space:]])[[:lower:]]{5}([[:space:]]|$)

if it's always surrounded by spaces you can do the following
[[:lower:]]{5}s+ or s+[[:lower:]]{5}s+
(depending what you want to do)

[[:lower:]]{5}s+

s+[[:lower:]]{5}s+

Awk doesn't support s in any version I'm familiar with.
– tripleee
1 hour ago

s

@tripleee GNU awk does. That's the only one AFAIK but of course GNU awk also supports word boundaries which might be the more appropriate construct in this case.
– Ed Morton
6 mins ago

With GNU awk for word boundaries < and > and w for word characters:

<

>

w

$ awk '/<w{5}>/' file 7) una mela e mezza

With any awk:

$ awk '/(^|[^[:alpha:]])[[:alpha:]]{5}([^[:alpha:]]|$)/' file 7) una mela e mezza

Those and any other solution will obviously depend on what you mean by a "word".

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Ciugk