通过LINQ中的模式寻找动态词[英] Find dynamic words through patterns in LINQ

本文是小编为大家收集整理的关于通过LINQ中的模式寻找动态词的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

这是 html 的开始方式

商业文件

<p>Some company</p>
<p>
<p>DEPARTMENT: Legal Process</p>
<p>FUNCTION: Computer Department</p>
<p>PROCESS: Process Server</p>
<p>PROCEDURE: ABC Process Server</p>
<p>OWNER: Some User</p>
<p>REVISION DATE: 06/10/2013</p>
<p>
<p>OBJECTIVE: To ensure that the process server receive their invoices the following day.</p>
<p>
<p>WHEN TO PERFORM: Daily</p>
<p>
<p>WHO WILL PERFORM? Computer Team</p>
<p>
<p>TIME TO COMPLETE: 5 minutes</p>
<p>
<p>TECHNOLOGY REQUIREMENT(S): </p>
<p>
<p>SOURCE DOCUMENT(S): N/A</p>
<p>
<p>CODES AND DEFINITIONS: N/A</p>
<p>
<table border="1">
  <tr>
    <td>
      <p>KPI&rsquo;s: </p>
    </td>
  </tr>
</table>
<p>
<table border="1">
  <tr>
    <td>
      <p>RISKS:  </p>
    </td>
  </tr>
</table>

在这之后有一大堆文本.我需要做的是从上面我需要解析出具体的数据.

我需要解析出Department、Function、Process、Procedure.目标、何时执行、谁将执行、完成时间、技术要求、源文档、代码和定义、风险.

然后我需要从 Html 列中删除此信息,同时保留其他所有内容.这在 LINQ 中可行吗?

这是我正在使用的 LINQ 查询:

var result = (from d in IPACS_Documents
join dp in IPACS_ProcedureDocs on d.DocumentID equals dp.DocumentID
join p in IPACS_Procedures on dp.ProcedureID equals p.ProcedureID
where d.DocumentID == 4
&& d.DateDeleted == null
select d.Html);

Console.WriteLine(result);

推荐答案

这个正则表达式在你的输入数据上对我来说工作得很好

(DEPARTMENT|FUNCTION|OBJECTIVE):\s*(?<value>.+)\<

结果是多个匹配,每个匹配有 2 个组 - 第一个是键,第二个是值.我只处理了两个案例,但您可以轻松添加其余案例.

要删除如此解析的信息,您可以执行 Regex.Replace 用这个正则表达式

<块引用>

(?\(部门|职能|目标):\s*)(?.+)(?\)

替换字符串为

<块引用>

${开始}${结束}

忽略价值.

在代码中,这看起来有点像(在 Notepad++ 中快速输入 - 可能有小错误).

private static readonly ParseDocRegex = new Regex(@"(?<start>\<p\>(?<name>DEPARTMENT|FUNCTION|OBJECTIVE):\s*)(?<value>.+)(?<end>\</p\>)", RegexOptions.ExplicitCaptured | RegexOptions.Compiled);

...

from html in result
    let matches = findValuesRegex.Match(html)
    where matches.Success
    select new
    {
        namesAndValues = from m in matches.AsType<Match>() 
        select new KeyValuePair<string, string>(m.Groups["name"].Value, m.Groups["value"].Value),
        strippedHtml = ParseDocRegex.Replace(html, "${start}${end}")
    };

这应该会给你想要的输出.

本文地址:https://www.itbaoku.cn/post/1557052.html

问题描述

Here is how the html starts

BUSINESS DOCUMENTATION

<p>Some company</p>
<p>
<p>DEPARTMENT: Legal Process</p>
<p>FUNCTION: Computer Department</p>
<p>PROCESS: Process Server</p>
<p>PROCEDURE: ABC Process Server</p>
<p>OWNER: Some User</p>
<p>REVISION DATE: 06/10/2013</p>
<p>
<p>OBJECTIVE: To ensure that the process server receive their invoices the following day.</p>
<p>
<p>WHEN TO PERFORM: Daily</p>
<p>
<p>WHO WILL PERFORM? Computer Team</p>
<p>
<p>TIME TO COMPLETE: 5 minutes</p>
<p>
<p>TECHNOLOGY REQUIREMENT(S): </p>
<p>
<p>SOURCE DOCUMENT(S): N/A</p>
<p>
<p>CODES AND DEFINITIONS: N/A</p>
<p>
<table border="1">
  <tr>
    <td>
      <p>KPI&rsquo;s: </p>
    </td>
  </tr>
</table>
<p>
<table border="1">
  <tr>
    <td>
      <p>RISKS:  </p>
    </td>
  </tr>
</table>

After this there is a whole bunch of text. What I need to do is from the above I need to parse out specific data.

I need to parse out the Department, Function, Process, Procedure. Objective, When to Perform, Who Will Perform, Time To Complete, Technology Requirements, Source Documents, Codes and Definitions, Risks.

I then need to delete this information from the Html column while leaving everything else in-tact. Is this possible in LINQ?

Here is the LINQ query I am using:

var result = (from d in IPACS_Documents
join dp in IPACS_ProcedureDocs on d.DocumentID equals dp.DocumentID
join p in IPACS_Procedures on dp.ProcedureID equals p.ProcedureID
where d.DocumentID == 4
&& d.DateDeleted == null
select d.Html);

Console.WriteLine(result);

推荐答案

This regex worked just fine for me on your input data

(DEPARTMENT|FUNCTION|OBJECTIVE):\s*(?<value>.+)\<

The result is multiple Matches with 2 groups each - the first the key and the second the value. I have only handled two cases, but you can add the rest easily enough.

To remove the information thus parsed, you can do a Regex.Replace with this regex

(?\(DEPARTMENT|FUNCTION|OBJECTIVE):\s*)(?.+)(?\)

and replacement string as

${start}${end}

leaving out value.

In code, this looks kinda like this (quickly typed this out in Notepad++ - may have minor errors).

private static readonly ParseDocRegex = new Regex(@"(?<start>\<p\>(?<name>DEPARTMENT|FUNCTION|OBJECTIVE):\s*)(?<value>.+)(?<end>\</p\>)", RegexOptions.ExplicitCaptured | RegexOptions.Compiled);

...

from html in result
    let matches = findValuesRegex.Match(html)
    where matches.Success
    select new
    {
        namesAndValues = from m in matches.AsType<Match>() 
        select new KeyValuePair<string, string>(m.Groups["name"].Value, m.Groups["value"].Value),
        strippedHtml = ParseDocRegex.Replace(html, "${start}${end}")
    };

This ought to give you the desired output.