正则是在一定长度/字符数量之前在最后一个逗号上打破一条线[英] Regex to break a line at the last comma before a certain length / number of characters

本文是小编为大家收集整理的关于正则是在一定长度/字符数量之前在最后一个逗号上打破一条线的处理方法,想解了正则是在一定长度/字符数量之前在最后一个逗号上打破一条线的问题怎么解决?正则是在一定长度/字符数量之前在最后一个逗号上打破一条线问题的解决办法?正则是在一定长度/字符数量之前在最后一个逗号上打破一条线问题的解决方案?那么可以参考本文帮助大家快速定位并解决问题,译文如有不准确的地方,大家可以切到English参考源文内容。

问题描述

有人将其作为一个本来无关的问题的重新选择( C#REGEX由多个闭合支架集),因此我将其添加为一个单独的问题:

我有许多行,如果这些行之一超过50个字符,我想在50个字符之前将这些行分开.

示例输入:

AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )
AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) )
AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )

预期输出:

AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )
AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10',
 'PM99', 'PM59' ) )
AND ( AUART IN ( 'PM01', 'PM01232132132',
 'PM03' ) )

推荐答案

您还可以使用图案,而无需捕获组,并且可以说出字符串的开始或逗号的开始.

然后在右侧断言50个字符,并匹配1-49个字符,然后是逗号.

在替换中使用完整的比赛,然后是newline $0\n

(?<=^|,)(?=.{50}).{1,49},

regex demo | c#demo

List<string> strings = new List<string>()
{
    "AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )",
    "AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) )",
    "AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )",
    "AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )",
    "AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04', 'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16', 'PM21', 'PM22', 'PM23', 'PM24', 'PM25', 'PM31' ) )"
};
var regex = new Regex(@"(?<=^|,)(?=.{50}).{1,49},");

foreach (String s in strings)
{
    Console.WriteLine(regex.Replace(s, "$0\n"));
}

输出

AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )
AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10',
 'PM99', 'PM59' ) )
AND ( AUART IN ( 'PM01', 'PM01232132132',
 'PM03' ) )
AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )
AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04',
 'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16',
 'PM21', 'PM22', 'PM23', 'PM24', 'PM25',
 'PM31' ) )

其他推荐答案

您可以使用正面的主张启动正则表达式,如果其长度超过50个字符,则可以匹配整个行,然后添加一个负面的外观,以确保您要匹配的逗号之前有少于50个字符:

(?=.{50})(.*)(?<!.{50})(,)

然后,您找到了要在or或例如替换为逗号和新线.

完整示例:

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?=.{50})(.*)(?<!.{50})(,)";
      string replacement = "$1,\n";
      List<string> inputs = new List<string>();
      inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )"); // shorter than 50 chars
      inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) )");
      inputs.Add("AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )");
      inputs.Add("AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )"); // first comma appearing later than character 50
     
      foreach (string input in inputs)
      {
          string result = Regex.Replace(input, pattern, replacement);
          Console.WriteLine(result);
      }
   }
}

请注意,这有一些局限性:

  1. 如果引号(')中有一个逗号,那么正则匹配可能是您想要的也可能不是您想要的.
  2. 如果第一个逗号出现的时间晚于位置50,那么正则表达式不匹配
  3. 如果线长于〜100个字符,第二部分将再次超过50个字符,我想这不是您想要的

可以通过在捕获组中捕获其余部分,并使用递归在剩余的时间超过50个字符的情况下再次应用正则申请:

,可以解决最后一点:

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      List<string> inputs = new List<string>();
      inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )"); // shorter than 50 chars
      inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) )");
      inputs.Add("AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )");
      inputs.Add("AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )"); // first comma appearing later than character 50
      inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04', 'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16', 'PM21', 'PM22', 'PM23', 'PM24', 'PM25', 'PM31' ) )"); // string longer than 100 chars, i.e. the remainder needs to be processed again
      List<string> results = new List<string>();
       
      var regex = new Regex(@"((?=.{50}).*(?<!.{50}),)(.*)");
    
      foreach (string input in inputs)
      {
          string str = input;
          while(!String.IsNullOrEmpty(str)) {
             var match = regex.Match(str);
             if (match.Success) {
                 results.Add(match.Groups[1].Value);         
                 str = match.Groups[2].Value;
             } else {
                 results.Add(str);
                 break;
             }
          }
      }
      Console.WriteLine(String.Join("\n", results));
   }
}

结果:

AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )
AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10',
 'PM99', 'PM59' ) )
AND ( AUART IN ( 'PM01', 'PM01232132132',
 'PM03' ) )
AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )
AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04',
 'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16',
 'PM21', 'PM22', 'PM23', 'PM24', 'PM25',
 'PM31' ) )

要解决第二点,您可以考虑在逗号,和whitespace \s而不是逗号上破裂(如果在应用程序方案中可以使用).这将是((?=.{50}).*(?<!.{50})[\s,])(.*).

本文地址:https://www.itbaoku.cn/post/2351958.html

问题描述

Someone asked this as an encore to an otherwise unrelated question (C# Regex split by multiple closing brackets sets), so I'm adding it as a separate question:

I have a number of lines and if one of those lines is longer than 50 chars, I would like to split those lines at the last comma (,) before 50 characters.

Example input:

AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )
AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) )
AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )

Expected output:

AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )
AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10',
 'PM99', 'PM59' ) )
AND ( AUART IN ( 'PM01', 'PM01232132132',
 'PM03' ) )

推荐答案

You might also use a pattern without capture groups and a lookbehind asserting either the start of the string or a comma.

Then assert 50 chars to the right and match 1-49 characters followed by a comma.

In the replacement use the full match followed by a newline $0\n

(?<=^|,)(?=.{50}).{1,49},

Regex demo | C# demo

List<string> strings = new List<string>()
{
    "AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )",
    "AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) )",
    "AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )",
    "AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )",
    "AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04', 'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16', 'PM21', 'PM22', 'PM23', 'PM24', 'PM25', 'PM31' ) )"
};
var regex = new Regex(@"(?<=^|,)(?=.{50}).{1,49},");

foreach (String s in strings)
{
    Console.WriteLine(regex.Replace(s, "$0\n"));
}

Output

AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )
AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10',
 'PM99', 'PM59' ) )
AND ( AUART IN ( 'PM01', 'PM01232132132',
 'PM03' ) )
AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )
AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04',
 'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16',
 'PM21', 'PM22', 'PM23', 'PM24', 'PM25',
 'PM31' ) )

其他推荐答案

You can start your regex with a positive lookahead assertion that matches the whole line if it is longer than 50 characters, then add a negative lookbehind that makes sure there are less than 50 characters before the comma you want to match:

(?=.{50})(.*)(?<!.{50})(,)

Then you have found the comma that you want to split at or e.g. replace with a comma and a newline.

Full example:

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?=.{50})(.*)(?<!.{50})(,)";
      string replacement = "$1,\n";
      List<string> inputs = new List<string>();
      inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )"); // shorter than 50 chars
      inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) )");
      inputs.Add("AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )");
      inputs.Add("AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )"); // first comma appearing later than character 50
     
      foreach (string input in inputs)
      {
          string result = Regex.Replace(input, pattern, replacement);
          Console.WriteLine(result);
      }
   }
}

Note that this has some limitations:

  1. if there is a comma within the quotes ('), the regex will match, which may or may not be what you want.
  2. if the first comma appears later than at position 50, the regex will obviously not match
  3. if the line is longer than ~100 characters, the second part will be again longer than 50 characters, which I guess is not what you want

The last point can be addressed by capturing the remainder in a capture group and using a recursion to apply the regex again should the remainder be longer than 50 characters:

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      List<string> inputs = new List<string>();
      inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )"); // shorter than 50 chars
      inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) )");
      inputs.Add("AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )");
      inputs.Add("AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )"); // first comma appearing later than character 50
      inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04', 'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16', 'PM21', 'PM22', 'PM23', 'PM24', 'PM25', 'PM31' ) )"); // string longer than 100 chars, i.e. the remainder needs to be processed again
      List<string> results = new List<string>();
       
      var regex = new Regex(@"((?=.{50}).*(?<!.{50}),)(.*)");
    
      foreach (string input in inputs)
      {
          string str = input;
          while(!String.IsNullOrEmpty(str)) {
             var match = regex.Match(str);
             if (match.Success) {
                 results.Add(match.Groups[1].Value);         
                 str = match.Groups[2].Value;
             } else {
                 results.Add(str);
                 break;
             }
          }
      }
      Console.WriteLine(String.Join("\n", results));
   }
}

Result:

AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )
AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10',
 'PM99', 'PM59' ) )
AND ( AUART IN ( 'PM01', 'PM01232132132',
 'PM03' ) )
AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )
AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04',
 'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16',
 'PM21', 'PM22', 'PM23', 'PM24', 'PM25',
 'PM31' ) )

To address the second point, you may consider breaking at commas , and whitespace \s instead of only commas (if that is possible in your application scenario). The regex for that would be ((?=.{50}).*(?<!.{50})[\s,])(.*).

查看更多