优化LINQ将多个列表组合到新的通用列表中[英] Optimizing LINQ combining multiple lists into new generic list

本文是小编为大家收集整理的关于优化LINQ将多个列表组合到新的通用列表中的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

给定以下三个列表:

    var FirstNames = new List<string>(){ "Bob", "Sondra", "Avery", "Von", "Randle", "Gwen", "Paisley" };
    var LastNames = new List<string>(){ "Anderson", "Carlson", "Vickers", "Black", "Schultz", "Marigold", "Johnson" };
    var Birthdates = new List<DateTime>()
                    { 
                        Convert.ToDateTime("11/12/1980"), 
                        Convert.ToDateTime("09/16/1978"), 
                        Convert.ToDateTime("05/18/1985"), 
                        Convert.ToDateTime("10/29/1980"), 
                        Convert.ToDateTime("01/19/1989"), 
                        Convert.ToDateTime("01/14/1972"), 
                        Convert.ToDateTime("02/20/1981") 
                    };

我想将它们合并为一种新的通用类型,其中列表共享的关系是它们在集合中的位置.即firstNames [0],lastnames [0],出生日期[0]是相关的.

所以我想出了这个linq,与索引匹配,目前似乎正常工作:

    var students = from fn in FirstNames
                   from ln in LastNames
                   from bd in Birthdates
                   where FirstNames.IndexOf(fn) == LastNames.IndexOf(ln)
                   where FirstNames.IndexOf(fn) == Birthdates.IndexOf(bd)
                   select new { First = fn, Last = ln, Birthdate = bd.Date };

但是,我强调了该代码(每个List<string>和List<DateTime>加载了几百万个记录),我遇到了SystemOutOfMemory例外.

是否有其他方法可以使用LINQ?

更有效地获得相同的结果.

推荐答案

这就是zip的目的.

var result = FirstNames
  .Zip(LastNames, (f,l) => new {f,l})
  .Zip(BirthDates, (fl, b) => new {First=fl.f, Last = fl.l, BirthDate = b});

关于缩放:

int count = 50000000;
var FirstNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var LastNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var BirthDates = Enumerable.Range(0, count).Select(x=> DateTime.Now.AddSeconds(x));

var sw = new Stopwatch();
sw.Start();

var result = FirstNames
  .Zip(LastNames, (f,l) => new {f,l})
  .Zip(BirthDates, (fl, b) => new {First=fl.f, Last = fl.l, BirthDate = b});

foreach(var r in result)
{
    var x = r;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds); // Returns 69191 on my machine.

虽然这些以不记忆力炸毁:

int count = 50000000;
var FirstNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var LastNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var BirthDates = Enumerable.Range(0, count).Select(x=> DateTime.Now.AddSeconds(x));

var sw = new Stopwatch();
sw.Start();

var FirstNamesList = FirstNames.ToList(); // Blows up in 32-bit .NET with out of Memory
var LastNamesList = LastNames.ToList();
var BirthDatesList = BirthDates.ToList();

var result = Enumerable.Range(0, FirstNamesList.Count())
    .Select(i => new 
                 { 
                     First = FirstNamesList[i], 
                     Last = LastNamesList[i], 
                     Birthdate = BirthDatesList[i] 
                 });

result = BirthDatesList.Select((bd, i) => new
{ 
    First = FirstNamesList[i], 
    Last = LastNamesList[i], 
    BirthDate = bd 
});

foreach(var r in result)
{
    var x = r;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);

在较低的值下,将枚举的成本转换为列表的成本也比其他对象创建要贵得多. ZIP比索引版本快30%.当您添加更多列时,Zips Advantage可能会缩小.

性能特征也有很大不同. ZIP例程将几乎立即开始输出答案,而其他ZIP例程只有在将整个枚举并转换为列表之后才开始输出答案,因此,如果您使用结果并使用.Skip(x).Take(y)进行分页,或者检查是否有任何问题存在.Any(...)它将更快地幅度,因为它不必转换整个枚举.

最后,如果它变得至关重要,并且您需要实现许多结果,则可以考虑扩展ZIP来处理任意数量的枚举数量(如Jon Skeet无耻偷来偷来的 - ):

private static IEnumerable<TResult> Zip<TFirst, TSecond, TThird, TResult>( 
    IEnumerable<TFirst> first, 
    IEnumerable<TSecond> second,
    IEnumerable<TThird> third, 
    Func<TFirst, TSecond, TThird, TResult> resultSelector) 
{ 
    using (IEnumerator<TFirst> iterator1 = first.GetEnumerator()) 
    using (IEnumerator<TSecond> iterator2 = second.GetEnumerator()) 
    using (IEnumerator<TThird> iterator3 = third.GetEnumerator()) 
    { 
        while (iterator1.MoveNext() && iterator2.MoveNext() && iterator3.MoveNext()) 
        { 
            yield return resultSelector(iterator1.Current, iterator2.Current, iterator3.Current); 
        } 
    } 
}

然后您可以这样做:

var result = FirstNames
  .Zip(LastNames, BirthDates, (f,l,b) => new {First=f,Last=l,BirthDate=b});

现在,您甚至没有创建中间对象的问题,因此您可以获得所有世界中最好的.

或在此处使用该实现来一般处理任何数字:> C#

中的ZIP多重/量词数量

其他推荐答案

另一个选项是将Select Overload与提供的索引:

使用:

var result = Birthdates.Select((bd, i) => new
{ 
    First = FirstNames[i], 
    Last = LastNames[i], 
    Birthdate = bd 
});

其他推荐答案

yeap,使用范围生成器:

var result = Enumerable.Range(0, FirstNames.Count)
    .Select(i => new 
                 { 
                     First = FirstNames[i], 
                     Last = LastNames[i], 
                     Birthdate = Birthdates[i] 
                 });

本文地址:https://www.itbaoku.cn/post/1556907.html

问题描述

Given the following three lists:

    var FirstNames = new List<string>(){ "Bob", "Sondra", "Avery", "Von", "Randle", "Gwen", "Paisley" };
    var LastNames = new List<string>(){ "Anderson", "Carlson", "Vickers", "Black", "Schultz", "Marigold", "Johnson" };
    var Birthdates = new List<DateTime>()
                    { 
                        Convert.ToDateTime("11/12/1980"), 
                        Convert.ToDateTime("09/16/1978"), 
                        Convert.ToDateTime("05/18/1985"), 
                        Convert.ToDateTime("10/29/1980"), 
                        Convert.ToDateTime("01/19/1989"), 
                        Convert.ToDateTime("01/14/1972"), 
                        Convert.ToDateTime("02/20/1981") 
                    };

I'd like to combine them into a new generic type where the relationship the lists share is their position in the collection. i.e. FirstNames[0], LastNames[0], Birthdates[0] are related.

So I have come up with this LINQ, matching the indices, which seems to work fine for now:

    var students = from fn in FirstNames
                   from ln in LastNames
                   from bd in Birthdates
                   where FirstNames.IndexOf(fn) == LastNames.IndexOf(ln)
                   where FirstNames.IndexOf(fn) == Birthdates.IndexOf(bd)
                   select new { First = fn, Last = ln, Birthdate = bd.Date };

However, I have stressed tested this code (Each List<string> and List<DateTime> loaded with a few million records) and I run into SystemOutOfMemory Exception.

Is there any other way of writing out this query to achieve the same results more effectively using Linq?

推荐答案

That is what Zip is for.

var result = FirstNames
  .Zip(LastNames, (f,l) => new {f,l})
  .Zip(BirthDates, (fl, b) => new {First=fl.f, Last = fl.l, BirthDate = b});

Regarding scaling:

int count = 50000000;
var FirstNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var LastNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var BirthDates = Enumerable.Range(0, count).Select(x=> DateTime.Now.AddSeconds(x));

var sw = new Stopwatch();
sw.Start();

var result = FirstNames
  .Zip(LastNames, (f,l) => new {f,l})
  .Zip(BirthDates, (fl, b) => new {First=fl.f, Last = fl.l, BirthDate = b});

foreach(var r in result)
{
    var x = r;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds); // Returns 69191 on my machine.

While these blow up with out of memory:

int count = 50000000;
var FirstNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var LastNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var BirthDates = Enumerable.Range(0, count).Select(x=> DateTime.Now.AddSeconds(x));

var sw = new Stopwatch();
sw.Start();

var FirstNamesList = FirstNames.ToList(); // Blows up in 32-bit .NET with out of Memory
var LastNamesList = LastNames.ToList();
var BirthDatesList = BirthDates.ToList();

var result = Enumerable.Range(0, FirstNamesList.Count())
    .Select(i => new 
                 { 
                     First = FirstNamesList[i], 
                     Last = LastNamesList[i], 
                     Birthdate = BirthDatesList[i] 
                 });

result = BirthDatesList.Select((bd, i) => new
{ 
    First = FirstNamesList[i], 
    Last = LastNamesList[i], 
    BirthDate = bd 
});

foreach(var r in result)
{
    var x = r;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);

At lower values, the cost of converting the Enumerables to a List is much more expensive than the additional object creation as well. Zip was approximately 30% faster than the indexed versions. As you add more columns, Zips advantage would likely shrink.

The performance characteristics are also very different. The Zip routine will start outputting answers almost immediately, while the others will start outputting answers only after the entire Enumerables have been read and converted to Lists, so if you take the results and do pagination on it with .Skip(x).Take(y), or check if something exists .Any(...) it will be magnitudes faster as it doesn't have to convert the entire enumerable.

Lastly, if it becomes performance critical, and you need to implement many results, you could consider extending zip to handle an arbitrary number of Enumerables like (shamelessly stolen from Jon Skeet - https://codeblog.jonskeet.uk/2011/01/14/reimplementing-linq-to-objects-part-35-zip/):

private static IEnumerable<TResult> Zip<TFirst, TSecond, TThird, TResult>( 
    IEnumerable<TFirst> first, 
    IEnumerable<TSecond> second,
    IEnumerable<TThird> third, 
    Func<TFirst, TSecond, TThird, TResult> resultSelector) 
{ 
    using (IEnumerator<TFirst> iterator1 = first.GetEnumerator()) 
    using (IEnumerator<TSecond> iterator2 = second.GetEnumerator()) 
    using (IEnumerator<TThird> iterator3 = third.GetEnumerator()) 
    { 
        while (iterator1.MoveNext() && iterator2.MoveNext() && iterator3.MoveNext()) 
        { 
            yield return resultSelector(iterator1.Current, iterator2.Current, iterator3.Current); 
        } 
    } 
}

Then you can do this:

var result = FirstNames
  .Zip(LastNames, BirthDates, (f,l,b) => new {First=f,Last=l,BirthDate=b});

And now you don't even have the issue of the middle object being created, so you get the best of all worlds.

Or use the implementation here to handle any number generically: Zip multiple/abitrary number of enumerables in C#

其他推荐答案

Another option is to use Select overload with the indexer supplied:

var result = Birthdates.Select((bd, i) => new
{ 
    First = FirstNames[i], 
    Last = LastNames[i], 
    Birthdate = bd 
});

其他推荐答案

Yeap, use range generator:

var result = Enumerable.Range(0, FirstNames.Count)
    .Select(i => new 
                 { 
                     First = FirstNames[i], 
                     Last = LastNames[i], 
                     Birthdate = Birthdates[i] 
                 });