问题描述
有人将其作为一个本来无关的问题的重新选择( C#REGEX由多个闭合支架集),因此我将其添加为一个单独的问题:
我有许多行,如果这些行之一超过50个字符,我想在50个字符之前将这些行分开.
示例输入:
AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) ) AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) ) AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )
预期输出:
AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) ) AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) ) AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )
推荐答案
您还可以使用图案,而无需捕获组,并且可以说出字符串的开始或逗号的开始.
然后在右侧断言50个字符,并匹配1-49个字符,然后是逗号.
在替换中使用完整的比赛,然后是newline $0\n
(?<=^|,)(?=.{50}).{1,49},
List<string> strings = new List<string>() { "AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )", "AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) )", "AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )", "AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )", "AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04', 'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16', 'PM21', 'PM22', 'PM23', 'PM24', 'PM25', 'PM31' ) )" }; var regex = new Regex(@"(?<=^|,)(?=.{50}).{1,49},"); foreach (String s in strings) { Console.WriteLine(regex.Replace(s, "$0\n")); }
输出
AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) ) AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) ) AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) ) AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) ) AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04', 'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16', 'PM21', 'PM22', 'PM23', 'PM24', 'PM25', 'PM31' ) )
其他推荐答案
您可以使用正面的主张启动正则表达式,如果其长度超过50个字符,则可以匹配整个行,然后添加一个负面的外观,以确保您要匹配的逗号之前有少于50个字符:
(?=.{50})(.*)(?<!.{50})(,)
然后,您找到了要在or或例如替换为逗号和新线.
完整示例:
using System; using System.Collections.Generic; using System.Text.RegularExpressions; public class Example { public static void Main() { string pattern = @"(?=.{50})(.*)(?<!.{50})(,)"; string replacement = "$1,\n"; List<string> inputs = new List<string>(); inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )"); // shorter than 50 chars inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) )"); inputs.Add("AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )"); inputs.Add("AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )"); // first comma appearing later than character 50 foreach (string input in inputs) { string result = Regex.Replace(input, pattern, replacement); Console.WriteLine(result); } } }
请注意,这有一些局限性:
- 如果引号(')中有一个逗号,那么正则匹配可能是您想要的也可能不是您想要的.
- 如果第一个逗号出现的时间晚于位置50,那么正则表达式不匹配
- 如果线长于〜100个字符,第二部分将再次超过50个字符,我想这不是您想要的
可以通过在捕获组中捕获其余部分,并使用递归在剩余的时间超过50个字符的情况下再次应用正则申请:
,可以解决最后一点:using System; using System.Collections.Generic; using System.Text.RegularExpressions; public class Example { public static void Main() { List<string> inputs = new List<string>(); inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )"); // shorter than 50 chars inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) )"); inputs.Add("AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )"); inputs.Add("AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )"); // first comma appearing later than character 50 inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04', 'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16', 'PM21', 'PM22', 'PM23', 'PM24', 'PM25', 'PM31' ) )"); // string longer than 100 chars, i.e. the remainder needs to be processed again List<string> results = new List<string>(); var regex = new Regex(@"((?=.{50}).*(?<!.{50}),)(.*)"); foreach (string input in inputs) { string str = input; while(!String.IsNullOrEmpty(str)) { var match = regex.Match(str); if (match.Success) { results.Add(match.Groups[1].Value); str = match.Groups[2].Value; } else { results.Add(str); break; } } } Console.WriteLine(String.Join("\n", results)); } }
结果:
AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) ) AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) ) AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) ) AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) ) AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04', 'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16', 'PM21', 'PM22', 'PM23', 'PM24', 'PM25', 'PM31' ) )
要解决第二点,您可以考虑在逗号,和whitespace \s而不是逗号上破裂(如果在应用程序方案中可以使用).这将是((?=.{50}).*(?<!.{50})[\s,])(.*).
问题描述
Someone asked this as an encore to an otherwise unrelated question (C# Regex split by multiple closing brackets sets), so I'm adding it as a separate question:
I have a number of lines and if one of those lines is longer than 50 chars, I would like to split those lines at the last comma (,) before 50 characters.
Example input:
AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) ) AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) ) AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )
Expected output:
AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) ) AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) ) AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )
推荐答案
You might also use a pattern without capture groups and a lookbehind asserting either the start of the string or a comma.
Then assert 50 chars to the right and match 1-49 characters followed by a comma.
In the replacement use the full match followed by a newline $0\n
(?<=^|,)(?=.{50}).{1,49},
List<string> strings = new List<string>() { "AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )", "AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) )", "AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )", "AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )", "AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04', 'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16', 'PM21', 'PM22', 'PM23', 'PM24', 'PM25', 'PM31' ) )" }; var regex = new Regex(@"(?<=^|,)(?=.{50}).{1,49},"); foreach (String s in strings) { Console.WriteLine(regex.Replace(s, "$0\n")); }
Output
AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) ) AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) ) AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) ) AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) ) AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04', 'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16', 'PM21', 'PM22', 'PM23', 'PM24', 'PM25', 'PM31' ) )
其他推荐答案
You can start your regex with a positive lookahead assertion that matches the whole line if it is longer than 50 characters, then add a negative lookbehind that makes sure there are less than 50 characters before the comma you want to match:
(?=.{50})(.*)(?<!.{50})(,)
Then you have found the comma that you want to split at or e.g. replace with a comma and a newline.
Full example:
using System; using System.Collections.Generic; using System.Text.RegularExpressions; public class Example { public static void Main() { string pattern = @"(?=.{50})(.*)(?<!.{50})(,)"; string replacement = "$1,\n"; List<string> inputs = new List<string>(); inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )"); // shorter than 50 chars inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) )"); inputs.Add("AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )"); inputs.Add("AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )"); // first comma appearing later than character 50 foreach (string input in inputs) { string result = Regex.Replace(input, pattern, replacement); Console.WriteLine(result); } } }
Note that this has some limitations:
- if there is a comma within the quotes ('), the regex will match, which may or may not be what you want.
- if the first comma appears later than at position 50, the regex will obviously not match
- if the line is longer than ~100 characters, the second part will be again longer than 50 characters, which I guess is not what you want
The last point can be addressed by capturing the remainder in a capture group and using a recursion to apply the regex again should the remainder be longer than 50 characters:
using System; using System.Collections.Generic; using System.Text.RegularExpressions; public class Example { public static void Main() { List<string> inputs = new List<string>(); inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )"); // shorter than 50 chars inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) )"); inputs.Add("AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )"); inputs.Add("AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )"); // first comma appearing later than character 50 inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04', 'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16', 'PM21', 'PM22', 'PM23', 'PM24', 'PM25', 'PM31' ) )"); // string longer than 100 chars, i.e. the remainder needs to be processed again List<string> results = new List<string>(); var regex = new Regex(@"((?=.{50}).*(?<!.{50}),)(.*)"); foreach (string input in inputs) { string str = input; while(!String.IsNullOrEmpty(str)) { var match = regex.Match(str); if (match.Success) { results.Add(match.Groups[1].Value); str = match.Groups[2].Value; } else { results.Add(str); break; } } } Console.WriteLine(String.Join("\n", results)); } }
Result:
AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) ) AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) ) AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) ) AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) ) AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04', 'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16', 'PM21', 'PM22', 'PM23', 'PM24', 'PM25', 'PM31' ) )
To address the second point, you may consider breaking at commas , and whitespace \s instead of only commas (if that is possible in your application scenario). The regex for that would be ((?=.{50}).*(?<!.{50})[\s,])(.*).