仅适用于每个唯一ID的第一n行的高效查询[英] Efficient query for only the first N rows for each unique ID

本文是小编为大家收集整理的关于仅适用于每个唯一ID的第一n行的高效查询的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

这是这个问题.

tldr:

问题:

我想过滤一个查询,以仅保留每个唯一ID的第一个n行.

答案:

query = query.GroupBy(q => q.ID).SelectMany(g => g.Take(n));

这个答案的问题是,对于80,000以上的行,评估查询比通过迭代(foreach)过滤的时间要长得多(至少是慢的两倍).查看该答案生成的SQL,使用A CROSS APPLY,很可能是SelectMany().

这个链接描述CROSS APPLY做什么:

应用程序操作员允许您加入两个表格;每次从左表表达式每行处理右表表达式.

简而言之,我正在寻找一个过滤查询,该查询有效地收集了每个唯一ID的顶部N行.

带有SQL的LINQ解决方案是理想的.

推荐答案

我在sql 这里(底部的SQL 2000解决方案),并设法实现了一个查询/linq版本:

query = tableQueryable.Where(a =>
          tableQueryable.Where(b => b.ID == a.ID)
            .OrderByDescending(o => o.Timestamp)
            .Take(N)
            .Select(s => s.PK)
          .Contains(a.PK)
        ).OrderByDescending(d => d.Timestamp);

相当标准的"子查询"模式.在一张大桌子上要快得多.

其他推荐答案

L2S没有行号,因此无法使用Martin的技巧.我也一直在解决这个问题,据我发现,这是最佳的L2S解决方案(不以任何方式使用本机SQL).

您可以尝试将所有结果删除到应用程序中,并在此处完成行号.这可能会伤害或受益.哪一个取决于具体情况.

本文地址:https://www.itbaoku.cn/post/1557047.html

问题描述

This is a follow-up to this question.

TLDR:

The question:

I want to filter a query to only keep the first n rows for each unique ID.

The answer:

query = query.GroupBy(q => q.ID).SelectMany(g => g.Take(n));

The problem with this answer is that for 80,000+ rows, evaluating the query takes much longer than filtering by iteration (foreach) (at least twice as slow). Looking at the SQL generated by this answer, a CROSS APPLY is used, most likely for the SelectMany().

This link describes what CROSS APPLY does:

The APPLY operator allows you to join two table expressions; the right table expression is processed every time for each row from the left table expression.

In short, I'm looking for a filtering query which efficiently gathers the top N rows for each unique ID.

A Linq solution with explained SQL would be ideal.

推荐答案

I found my answer in SQL here (SQL 2000 Solution at the bottom) and managed to implement a Queryable/Linq version:

query = tableQueryable.Where(a =>
          tableQueryable.Where(b => b.ID == a.ID)
            .OrderByDescending(o => o.Timestamp)
            .Take(N)
            .Select(s => s.PK)
          .Contains(a.PK)
        ).OrderByDescending(d => d.Timestamp);

A fairly standard "sub-query" pattern. It's much faster on a large table.

其他推荐答案

L2S does not have row number so Martin's trick cannot be used. I have been through this problem as well and as far as I ever found out this is the optimal L2S solution (that does not use native SQL in any way).

You can try pulling down all results into the application and doing the row number thing there. This can hurt or benefit performance. Which one it is depends on the concrete case.