Get all pcre group matches for ()* groups

Post Reply
Zanthis
Posts: 23
Joined: Mon Dec 09, 2013 8:35 pm

Get all pcre group matches for ()* groups

Post by Zanthis »

I have the below match for an alias looking for any number of values separated by | symbols, with at least one |.
([^|]*)(\|[^|]+)+

I entered in the following: aaa|bbb|ccc

I expected matches[] to look like this:
Code: [show] | [select all] lua
{ [1] = "aaa|bbb|ccc",  [2] = "aaa", [3] = "|bbb", [4] = "|ccc" }
What I get in matches[] is:
Code: [show] | [select all] lua
{ [1] = "aaa|bbb|ccc",  [2] = "aaa", [3] = "|ccc" }
So the second regex group is just matches[3] and it just overwrites itself so that it only ever ends up being the last match. This brings me to the question: Is there a way to get all the matches into matches[] nicely?

I fumbled into this:
([^|]*)(|[^|]+)+

Notice the | in the middle is missing the escape character. This results in this crazy result in matches[]
Code: [show] | [select all] lua
{
[1] = aaa
[2] = aaa
[3] = 
[4] = 
[5] = 
[6] = 
[7] = bbb
[8] = bbb
[9] = 
[10] = 
[11] = 
[12] = 
[13] = ccc
[14] = ccc
[15] = 
[16] = 
[17] = 
[18] = 
}
Which I can process with for i = 1,#matches,6 do ... end

Debug output for this is amusing too:
Alias: capture group #1 = <aaa>
Alias: capture group #2 = <aaa>
capture group #1 = <bbb>
capture group #2 = <bbb>
capture group #1 = <ccc>
capture group #2 = <ccc>
Notice also matches[1] is not the full line.

User avatar
Vadi
Posts: 5035
Joined: Sat Mar 14, 2009 3:13 pm

Re: Get all pcre group matches for ()* groups

Post by Vadi »

Zanthis wrote: I expected matches[] to look like this:
Code: [show] | [select all] lua
{ [1] = "aaa|bbb|ccc",  [2] = "aaa", [3] = "|bbb", [4] = "|ccc" }
What I get in matches[] is:
Code: [show] | [select all] lua
{ [1] = "aaa|bbb|ccc",  [2] = "aaa", [3] = "|ccc" }
What you get is correct, your pattern only has two sets of brackets and thus two captures. Here are two independent tools confirming Mudlet's output:

Image
Image
Zanthis wrote: I fumbled into this:
([^|]*)(|[^|]+)+
Yeah I'm not sure if that is very valid. Python-style regex refuses it:

Image

Mudlet's triggers capture this:
Code: [show] | [select all] lua
{
  [1] = "aaa",
  [2] = "aaa",
  [3] = ""
}
Aliases capture this:
Code: [show] | [select all] lua
{
  [1] = "aaa",
  [2] = "aaa",
  [3] = "",
  [4] = "",
  [5] = "",
  [6] = "",
  [7] = "bbb",
  [8] = "bbb",
  [9] = "",
  [10] = "",
  [11] = "",
  [12] = "",
  [13] = "ccc",
  [14] = "ccc",
  [15] = "",
  [16] = "",
  [17] = "",
  [18] = ""
}
http://regex101.com set to PCRE captures something different from those two:

Image

I think we can conclude that pattern isn't healthy in itself.

If I were you, I'd just make \| be my pattern and use string.split(command, "|"). It'll be far easier to maintain that than try and decyper the regex again as it is far simpler.

Zanthis
Posts: 23
Joined: Mon Dec 09, 2013 8:35 pm

Re: Get all pcre group matches for ()* groups

Post by Zanthis »

Yeah, that's what I'm likely going to end up doing. The entire system I'm trying to parse is complicated enough I think I just need to make my aliases much more general and just handle it manually. I just was wondering if maybe there was something I was missing. Thanks.

Post Reply