本文是小编为大家收集整理的关于与目标值相加的集合的所有可能组合的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。
问题描述
我有一个输入向量,例如:
weights <- seq(0, 1, by = 0.2)
我想生成权重的所有组合(重复允许),以使总和等于1. 我想到
l <- rep(list(weights), 10) combinations <- expand.grid(l) combinations[which(apply(combinations, 1, sum) == 1),]
问题当然是我生成了更多的组合.有没有办法使它更有效地完成?
编辑: 感谢您的答案.那是问题的第一部分.正如@frank指出的那样,既然我已经拥有所有加起来1的"解决方案",那么问题是要从长度10的向量中的解决方案中获取所有排列(不确定它是否是正确的单词).实例:
s1 <- c(0, 0, 0.2, 0, 0, 0, 0.8, 0, 0, 0) s2 <- c(0.8, 0, 0, 0, 0, 0, 0, 0, 0.2, 0) etc...
推荐答案
查找一组整数的子集,这些整数概括到某个目标t是子集总和问题,这是NP-填充.结果,有效地计算集合的所有组合(重复),总和到目标值在理论上具有挑战性.
易于解决子集总和问题的特殊情况,让我们假设输入是积极的整数(对于您的示例w <- c(2, 4, 6, 8, 10)>;我不会在此答案中考虑非阳性整数或非智能者)并且目标也是一个正整数(在您的示例10中).将D(i, j)定义为集合w的第一个j元素中的所有组合集合.如果w中有n元素,则您对D(t, n).
让我们从几个基本案例开始:D(0, k) = {{}}对于所有k >= 0(总和到0的唯一方法是包含一个元素),对于任何k > 0到一个零元素的正数).现在考虑以下伪代码来计算任意D(i, j)值:
for j = 1 ... n for i = 1 ... t D[(i, j)] = {} for rep = 0 ... floor(i/w_j) Dnew = D[(i-rep*w_j, j-1)], with w_j added "rep" times D[(i, j)] = Union(D[(i, j)], Dnew)
请注意,这仍然可能是非常低效的(D(t, n)可以包含一个指数的可行子集,因此不能避免这种情况),但是在许多情况下,相对较少数量的可行组合总和到达目标比仅考虑集合的每个子集(有2^n这样的子集,因此方法始终具有指数运行时).
让我们用r编码您的示例:
w <- c(2, 4, 6, 8, 10) n <- length(w) t <- 10 D <- list() for (j in 0:n) D[[paste(0, j)]] <- list(c()) for (i in 1:t) D[[paste(i, 0)]] <- list() for (j in 1:n) { for (i in 1:t) { D[[paste(i, j)]] <- do.call(c, lapply(0:floor(i/w[j]), function(r) { lapply(D[[paste(i-r*w[j], j-1)]], function(x) c(x, rep(w[j], r))) })) } } D[[paste(t, n)]] # [[1]] # [1] 2 2 2 2 2 # # [[2]] # [1] 2 2 2 4 # # [[3]] # [1] 2 4 4 # # [[4]] # [1] 2 2 6 # # [[5]] # [1] 4 6 # # [[6]] # [1] 2 8 # # [[7]] # [1] 10
代码正确识别了集合中的所有元素组合,将总和到10.
要有效获取所有2002唯一长度10组合,我们可以从multicool package中使用allPerm函数:
library(multicool) out <- do.call(rbind, lapply(D[[paste(t, n)]], function(x) { allPerm(initMC(c(x, rep(0, 10-length(x))))) })) dim(out) # [1] 2002 10 head(out) # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] # [1,] 2 2 2 2 2 0 0 0 0 0 # [2,] 0 2 2 2 2 2 0 0 0 0 # [3,] 2 0 2 2 2 2 0 0 0 0 # [4,] 2 2 0 2 2 2 0 0 0 0 # [5,] 2 2 2 0 2 2 0 0 0 0 # [6,] 2 2 2 2 0 2 0 0 0 0
对于给定输入,整个操作非常快(在我的计算机上为0.03秒),并且不使用大量内存.同时,即使将最后一行替换为(更有效的)combinations[rowSums(combinations) == 1,].
其他推荐答案
看partitions库,
library(partitions) ps <- parts(10) res <- ps[,apply(ps, 2, function(x) all(x[x>0] %% 2 == 0))] / 10
其他推荐答案
如果您想使用base r,这是我为此问题提出的精美的递归代码;它返回结果为列表,因此不是对特定问题的完整答案.
combnToSum = function(target, values, collapse = T) { if(any(values<=0)) stop("All values must be positive numbers.") appendValue = function(root) { if(sum(root) == target) return(list(root)) candidates = values + sum(root) <= target if(length(root)>0 & collapse) candidates = candidates & values >= root[1] if(!any(candidates)) return(NULL) roots = lapply(values[candidates], c, root) return(unlist(lapply(roots, addValue), recursive = F)) } appendValue(integer(0)) }
代码相当有效,解决了测试问题.
combnToSum(1, c(.2,.4,.6,.8,1)) # [[1]] # [1] 0.2 0.2 0.2 0.2 0.2 # # [[2]] # [1] 0.4 0.2 0.2 0.2 # # [[3]] # [1] 0.6 0.2 0.2 # # [[4]] # [1] 0.4 0.4 0.2 # # [[5]] # [1] 0.8 0.2 # # [[6]] # [1] 0.6 0.4 # # [[7]] # [1] 1
当values包含相对于target>的数字时,可能会发生错误.例如,试图找到以10美元的价格进行更改的所有方法:
combnToSum(1000, c(1, 5, 10, 25))
产生以下错误
# enter code here`Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
i具有appendValue作为嵌套在combnToSum范围内的函数,因此target和values不必复制并传递每个调用(内部,在R中).我也喜欢漂亮的干净签名combnToSum(target, values);用户不需要了解中间值root.
也就是说,appendValue可以是带有签名appendValue(target, values, root)的单独函数,在这种情况下,您可以使用appendValue(1, c(0.2, 0.4, 0.6, 0.8, 1), integer(0))获得相同的答案.但是,您要么丢失错误检查负值的错误检查,要么,如果将错误检查放入appendValue中,则每个递归调用对该功能的错误检查似乎有点效率低.
设置collapse = F将返回所有具有唯一顺序的排列.
combnToSum(1, c(.2,.4,.6,.8,1), collapse = F) # [[1]] # [1] 0.2 0.2 0.2 0.2 0.2 # # [[2]] # [1] 0.4 0.2 0.2 0.2 # # [[3]] # [1] 0.2 0.4 0.2 0.2 # # [[4]] # [1] 0.6 0.2 0.2 # # [[5]] # [1] 0.2 0.2 0.4 0.2 # # [[6]] # [1] 0.4 0.4 0.2 # # [[7]] # [1] 0.2 0.6 0.2 # # [[8]] # [1] 0.8 0.2 # # [[9]] # [1] 0.2 0.2 0.2 0.4 # # [[10]] # [1] 0.4 0.2 0.4 # # [[11]] # [1] 0.2 0.4 0.4 # # [[12]] # [1] 0.6 0.4 # # [[13]] # [1] 0.2 0.2 0.6 # # [[14]] # [1] 0.4 0.6 # # [[15]] # [1] 0.2 0.8 # # [[16]] # [1] 1
问题描述
I have an input vector such as:
weights <- seq(0, 1, by = 0.2)
I would like to generate all the combinations of weights (repeats allowed) such that the sum is equal to 1. I came up with
l <- rep(list(weights), 10) combinations <- expand.grid(l) combinations[which(apply(combinations, 1, sum) == 1),]
The problem is of course I generate far more combinations that I need. Is there a way to get it done more efficiently?
EDIT: Thanks for the answers. That's the first part of the problem. As @Frank pointed out, now that I have all the "solutions" that add up to 1, the problem is to get all the permutations (not sure if it is the right word) from the solutions in a vector of length 10. For instance:
s1 <- c(0, 0, 0.2, 0, 0, 0, 0.8, 0, 0, 0) s2 <- c(0.8, 0, 0, 0, 0, 0, 0, 0, 0.2, 0) etc...
推荐答案
Finding any subset of a set of integers that sums to some target t is a form of the subset sum problem, which is NP-complete. As a result, efficiently computing all the combinations (repeats allowed) of your set that sum to a target value is theoretically challenging.
To tractably solve a special case of the subset sum problem, let's recast your problem by assuming the input is positive integers (for your example w <- c(2, 4, 6, 8, 10); I won't consider non-positive integers or non-integers in this answer) and that the target is also a positive integer (in your example 10). Define D(i, j) to be the set of all combinations that sum to i among the first j elements of the set w. If there are n elements in w, then you are interested in D(t, n).
Let's start with a few base cases: D(0, k) = {{}} for all k >= 0 (the only way to sum to 0 is to include none of the elements) and D(k, 0) = {} for any k > 0 (you can't sum to a positive number with zero elements). Now consider the following pseudocode to compute arbitrary D(i, j) values:
for j = 1 ... n for i = 1 ... t D[(i, j)] = {} for rep = 0 ... floor(i/w_j) Dnew = D[(i-rep*w_j, j-1)], with w_j added "rep" times D[(i, j)] = Union(D[(i, j)], Dnew)
Note that this could still be quite inefficient (D(t, n) can contain an exponentially large number of feasible subsets so there is no avoiding this), but in many cases where there are a relatively small number of feasible combinations that sum to the target this could be quite a bit quicker than simply considering every single subset of the set (there are 2^n such subsets, so that approach always has exponential runtime).
Let's use R to code up your example:
w <- c(2, 4, 6, 8, 10) n <- length(w) t <- 10 D <- list() for (j in 0:n) D[[paste(0, j)]] <- list(c()) for (i in 1:t) D[[paste(i, 0)]] <- list() for (j in 1:n) { for (i in 1:t) { D[[paste(i, j)]] <- do.call(c, lapply(0:floor(i/w[j]), function(r) { lapply(D[[paste(i-r*w[j], j-1)]], function(x) c(x, rep(w[j], r))) })) } } D[[paste(t, n)]] # [[1]] # [1] 2 2 2 2 2 # # [[2]] # [1] 2 2 2 4 # # [[3]] # [1] 2 4 4 # # [[4]] # [1] 2 2 6 # # [[5]] # [1] 4 6 # # [[6]] # [1] 2 8 # # [[7]] # [1] 10
The code correctly identifies all combinations of elements in the set that sum to 10.
To efficiently get all 2002 unique length-10 combinations, we can use the allPerm function from the multicool package:
library(multicool) out <- do.call(rbind, lapply(D[[paste(t, n)]], function(x) { allPerm(initMC(c(x, rep(0, 10-length(x))))) })) dim(out) # [1] 2002 10 head(out) # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] # [1,] 2 2 2 2 2 0 0 0 0 0 # [2,] 0 2 2 2 2 2 0 0 0 0 # [3,] 2 0 2 2 2 2 0 0 0 0 # [4,] 2 2 0 2 2 2 0 0 0 0 # [5,] 2 2 2 0 2 2 0 0 0 0 # [6,] 2 2 2 2 0 2 0 0 0 0
For the given input, the whole operation is pretty quick (0.03 seconds on my computer) and doesn't use a huge amount of memory. Meanwhile the solution in the original post ran in 22 seconds and used 15 GB of memory, even when replacing the last line to the (much) more efficient combinations[rowSums(combinations) == 1,].
其他推荐答案
Take a look at partitions library,
library(partitions) ps <- parts(10) res <- ps[,apply(ps, 2, function(x) all(x[x>0] %% 2 == 0))] / 10
其他推荐答案
If you want to use base R, here is a nifty bit of recursive code that I came up with for this problem; it returns results as a list, so isn't a complete answer to the specific question.
combnToSum = function(target, values, collapse = T) { if(any(values<=0)) stop("All values must be positive numbers.") appendValue = function(root) { if(sum(root) == target) return(list(root)) candidates = values + sum(root) <= target if(length(root)>0 & collapse) candidates = candidates & values >= root[1] if(!any(candidates)) return(NULL) roots = lapply(values[candidates], c, root) return(unlist(lapply(roots, addValue), recursive = F)) } appendValue(integer(0)) }
The code is fairly efficient, solving the test problem in a blink.
combnToSum(1, c(.2,.4,.6,.8,1)) # [[1]] # [1] 0.2 0.2 0.2 0.2 0.2 # # [[2]] # [1] 0.4 0.2 0.2 0.2 # # [[3]] # [1] 0.6 0.2 0.2 # # [[4]] # [1] 0.4 0.4 0.2 # # [[5]] # [1] 0.8 0.2 # # [[6]] # [1] 0.6 0.4 # # [[7]] # [1] 1
An error can occur when values contains numbers that are small relative to target. For instance, trying to find all of the ways to make change for $10 US:
combnToSum(1000, c(1, 5, 10, 25))
yields the following error
# enter code here`Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
I have appendValue as a function nested within the scope of combnToSum so that target and values don't have to be copied and passed for each call (internally, within R). I also like the nice clean signature combnToSum(target, values); the user doesn't need to know about the intermediate value root.
That said, appendValue could be a separate function with the signature appendValue(target, values, root), in which case you could just use appendValue(1, c(0.2, 0.4, 0.6, 0.8, 1), integer(0)) to get the same answer. But you'd either lose the error check for negative values or, if you put the error check into appendValue, the error check would occur for each recursive call to the function, which seems a bit inefficient.
Setting collapse = F will return all of the permutations that have unique order.
combnToSum(1, c(.2,.4,.6,.8,1), collapse = F) # [[1]] # [1] 0.2 0.2 0.2 0.2 0.2 # # [[2]] # [1] 0.4 0.2 0.2 0.2 # # [[3]] # [1] 0.2 0.4 0.2 0.2 # # [[4]] # [1] 0.6 0.2 0.2 # # [[5]] # [1] 0.2 0.2 0.4 0.2 # # [[6]] # [1] 0.4 0.4 0.2 # # [[7]] # [1] 0.2 0.6 0.2 # # [[8]] # [1] 0.8 0.2 # # [[9]] # [1] 0.2 0.2 0.2 0.4 # # [[10]] # [1] 0.4 0.2 0.4 # # [[11]] # [1] 0.2 0.4 0.4 # # [[12]] # [1] 0.6 0.4 # # [[13]] # [1] 0.2 0.2 0.6 # # [[14]] # [1] 0.4 0.6 # # [[15]] # [1] 0.2 0.8 # # [[16]] # [1] 1