为什么重词捕捉组的索引是一个?[英] Why are regex capturing groups indexed at one?

本文是小编为大家收集整理的关于为什么重词捕捉组的索引是一个?的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我的一部分是担心这个问题会被关闭,但我真的对某事感到困惑.在我使用的每种语言的正则态度中,即使将其余的语言索引为零,捕获组也是一个索引.我想到的设计决策将导致1个指数,这通常是为了降低非技术人员的入境障碍,但是当涉及到Regex时,这已经是地狱般且难以理解的,这种论点似乎并不是真的保持.

此外,由于每种语言似乎对Regex有自己的小调整,因此捕获群体索引与其余的语言保持一致,这似乎是明智的.

还有其他解释吗?这个想法突然出现在我的脑海中,即1个索引是正则腹部内部更深的东西(例如,固有地占用零位置)或沿着这些行的东西.就是说,我找不到有关此特定怪癖的任何文件.是否有任何正则大师知道这里发生了更深层次的事情,或者只是严重的遗产代码?

推荐答案

在我使用过的每个语言的正则时期中,捕获组是 即使将其余的语言索引为零,索引也是一个.

我猜想,按照您的意思,阵列和其他容器类型.好吧,在正则捕获组中,捕获组确实以0开头,但起初并不明显.

捕获组0 ,包含完整的匹配,捕获组是您可以看到的组,您可以使用括号 - ().

创建的组

so,在下面的正则是字符串 - "ab123cd":

ab(\d+)cd

实际上有两个组:

  • 组0 - 是完整的匹配 - ab123cd
  • 组1 - 是您使用()> - 123
  • 捕获的组

在那里,组按打开括号的顺序编号 ( .

.

so,对于以下正则(添加到可读性的空间):

ab(    x   (\d+))cd
  ^        ^
  |        |
 group 1  group 2

将上述正则拨号应用于字符串 - "abx123cd"时,您将有以下组:

  • 组0 - 完成匹配 - abcx123cd
  • 第1组 - 第一次打开括号中的模式 - x123
  • 第2组 - 第二个打开括号中的模式 - 123

当您在Java中映射那些正则等级时,可以使用以下方法获取所有这些组:

  • 获取组0(注意,没有参数),
  • 要获得其余的组(请注意int参数,为各个组获取价值)

本文地址:https://www.itbaoku.cn/post/627326.html

问题描述

Part of me is worries that this question will get closed, but I'm genuinely baffled by something. In every language's regex that I've used, the capturing groups are indexed at one, even when the rest of the language is indexed at zero. I thought of design decisions that would lead to 1-indexing, which is usually to lower the barrier to entry for non-technical people, however when it comes to regex, which is already hellish and incomprehensible, this argument doesn't really seem to hold.

Additionally, since each language seems to have its own small tweaks on regex, it seems like it would be sensible to have capturing group indexing be consistent with the rest of the language.

Is there some other explanation? The idea has popped into my head that the 1-indexing is a result of something deeper within the belly of regex (like something inherently taking up the zero spot) or something along those lines. That said, I wasn't able to find any documentation on this particular quirk. Are there any regex masters out there that are aware of something deeper going on here, or is it just something in seriously legacy code?

推荐答案

In every language's regex that I've used, the capturing groups are indexed at one, even when the rest of the language is indexed at zero.

I guess, by rest of the language you mean, arrays and other container types. Well, in regex, capture groups do start with 0, but it is not obvious at first.

The capture group 0, contains the complete match, and the capture groups thereon, are the groups that you can see as created using parenthesis - ().

So, in the below regex, for string - "ab123cd":

ab(\d+)cd

There are really two groups:

  • Group 0 - Is complete match - ab123cd
  • Group 1 - Is the group you captured using () - 123

There on, the groups are numbered in the order of occurrence of opening parenthesis (.

So, for the below regex (Whitespaces added to readability):

ab(    x   (\d+))cd
  ^        ^
  |        |
 group 1  group 2

When applying the above regex to string - "abx123cd", you will have following groups:

  • Group 0 - Complete match - abcx123cd
  • Group 1 - Pattern in first opening parenthesis - x123
  • Group 2 - Pattern in 2nd opening parenthesis - 123

When you map those regex in Java, you can get all those groups using the following methods:

  • Matcher.group() to get group 0 (Note, there are no parameters), and
  • Matcher.group(int) to get rest of the groups (Note an int parameter, taking value for respective group)