如何分组彼此"接近"的纬度/经度点?[英] How to group latitude/longitude points that are 'close' to each other?

本文是小编为大家收集整理的关于如何分组彼此"接近"的纬度/经度点?的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我有一个用户提交的纬度/经度点的数据库,并试图将"关闭"点分组在一起. "关闭"是相对的,但目前似乎约500英尺.

起初,似乎我只能按成排的行与前3个小数点相同的纬度/经度进行分组(大约是一个300x300盒子,理解当您远离赤道时它会发生变化).

但是,这种方法似乎很缺乏. "紧密度"与每个小数位代表的距离没有显着差异.它没有考虑到两个位置在第3(或任何)小数位的位置可能具有不同的位置,但仍位于该位置代表的距离内(33.1239和33.1240).

我还考虑了A点A点"接近" B点(但不是彼此)的情况 - 是否应该将它们分组在一起?如果是这样,当点D点"接近" C点(也没有其他点)时会发生什么 - 也应该分组.当然,我必须确定所需的行为,但是如何实施?

任何人都可以指向我如何做到这一点以及可以使用哪些不同的方法/方法?

我有点像我缺少明显的东西.

当前数据是A mySQL数据库,使用PHP应用程序;但是,如果它们是实现此目的的关键部分,则我对其他存储方法开放.这里.

推荐答案

有多种方法可以确定两个点之间的距离,但是对于在2-D图上绘制点,您可能需要欧几里得距离.如果(x1, y1)代表您的第一个点,而(x2, y2)代表您的第二点,则距离为

d = sqrt( (x2-x1)^2 + (y2-y1)^2 )

关于分组,您可能需要使用某种二维均值来确定"近距离"的事物彼此之间的"近距离".例如,如果您有三点,(x1, y1),(x2, y2),(x3, y3),可以通过简单平均找到这三个点的中心:

x(mean) = (x1+x2+x3)/3
y(mean) = (y1+y2+y3)/3

然后,您可以看到每个人都离中心有多近,以确定它是否应该是"群集"的一部分.


有多种方法可以定义簇,所有这些都使用a 聚类算法.我现在很着急,没有时间来总结,但是请查看链接和算法,希望其他人能够提供更多细节.祝你好运!

其他推荐答案

使用与您在问题中概述的方法相似的内容,以获得大约结果集,然后通过进行适当的计算来降低该近似设置.如果正确选择网格尺寸(即正确地围绕坐标),则至少可以希望将要完成的工作量减少到可接受的水平,尽管您必须管理该网格大小.

例如,通过将lat/long对转换为(x,y,z)笛卡尔坐标,将lat/long对转换为(x,y,z)笛卡尔坐标,将地球建模为统一的球体,将 EAMDISTASES 扩展到后Gresters. PostgreSQL具有一个复杂的索引系统,可以将这些坐标或周围的盒子索引到R-Trees中,但是您可以将一些仍然有用的东西拧在一起.

如果您将(x,y,z)取出三重和圆形,即乘以某个因素并截断为整数 - 然后,您有三个整数可以连接以产生"盒子名称",该整数可以标识一个盒子在您的"网格"中,要点在.

如果要在某个目标点的x km中搜索所有点,则在该点附近生成所有"框名称"(一旦将目标点转换为(x,x,y,z)三重,好吧,这很容易),消除所有没有与地球表面相交的盒子(trick窃,但是在每个角落使用x^2+y^2+z^2=R^2公式会告诉您)您最终都会得到一个盒子目标点列表因此,只需搜索与其中一个盒子之一匹配的所有点,这也将返回一些额外的积分.因此,作为最后阶段,您需要计算到目标点的实际距离并消除一些(同样,可以通过在笛卡尔坐标中工作并将目标大圆形半径转换为距离距离距离).

摆弄的归结为确保您不必搜索太多盒子,但同时又不带来太多的额外积分.我发现在几个不同的网格上索引每个点(例如1公里,5公里,25公里,125公里等的分辨率)对每个点进行索引.理想情况下,您只想搜索一个盒子,请记住,它在目标半径超过网格大小后将其扩展至至少27个.

我已经使用了这种技术来使用Lucene构建空间索引,而不是在SQL数据库中进行计算.它确实有效,尽管有一些摆弄要进行设置,并且这些索引需要一段时间才能生成并且很大.使用R-Tree持有所有坐标是一种更好的方法,但是将采用更多的自定义编码 - 此技术基本上只需要快速的Hash table查找(因此,与所有NOSQL数据库都可以很好地工作这些天的愤怒,也应该在SQL数据库中使用).

其他推荐答案

也许过高,但在我看来, clustering问题: href =" http://en.wikipedia.org/wiki/cluster_analsisy#distance_measure" rel =" noreferrer"> measure 将确定如何计算两个元素的相似性.如果您需要一个较小的解决方案,请尝试数据挖掘:实用的机器学习工具和技术,并使用 weka 橙色

本文地址:https://www.itbaoku.cn/post/597587.html

问题描述

I have a database of user submitted latitude/longitude points and am trying to group 'close' points together. 'Close' is relative, but for now it seems to ~500 feet.

At first it seemed I could just group by rows that have the same latitude/longitude for the first 3 decimal places (roughly a 300x300 box, understanding that it changes as you move away from the equator).

However, that method seems to be quite lacking. 'Closeness' can't be significantly different than the distance each decimal place represents. It doesn't take into account that two locations may have different digits in the 3rd (or any) decimal place, but still be within the distance that place represents (33.1239 and 33.1240).

I've also mulled over the situation where Point A, and Point C are both 'close' to Point B (but not each other) - should they be grouped together? If so, what happens when Point D is 'close' to point C (and no other points) - should it be grouped as well. Certainly I have to determine the desired behavior, but how would either be implemented?

Can anyone point me in the right direction as to how this can be done and what different methods/approaches can be used?

I feel a bit like I'm missing something obvious.

Currently the data is an a MySQL database, use by a PHP application; however, I'm open to other storage methods if they're a key part in accomplishing this. here.

推荐答案

There are a number of ways of determining the distance between two points, but for plotting points on a 2-D graph you probably want the Euclidean distance. If (x1, y1) represents your first point and (x2, y2) represents your second, the distance is

d = sqrt( (x2-x1)^2 + (y2-y1)^2 )

Regarding grouping, you may want to use some sort of 2-D mean to determine how "close" things are to each other. For example, if you have three points, (x1, y1), (x2, y2), (x3, y3), you can find the center of these three points by simple averaging:

x(mean) = (x1+x2+x3)/3
y(mean) = (y1+y2+y3)/3

You can then see how close each is to the center to determine whether it should be part of the "cluster".


There are a number of ways one can define clusters, all of which use some variant of a clustering algorithm. I'm in a rush now and don't have time to summarize, but check out the link and the algorithms, and hopefully other people will be able to provide more detail. Good luck!

其他推荐答案

Use something similar to the method you outlined in your question to get an approximate set of results, then whittle that approximate set down by doing proper calculations. If you pick your grid size (i.e. how much you round off your co-ordinates) correctly, you can at least hope to reduce the amount of work to be done to an acceptable level, although you have to manage what that grid size is.

For example, the earthdistance extension to PostgreSQL works by converting lat/long pairs to (x,y,z) cartesian co-ordinates, modelling the Earth as a uniform sphere. PostgreSQL has a sophisticated indexing system that allows these co-ordinates, or boxes around them, to be indexed into R-trees, but you can whack something together that is still useful without that.

If you take your (x,y,z) triple and round off- i.e. multiply by some factor and truncate to integer- you then have three integers that you can concatenate to produce a "box name", which identifies a box in your "grid" that the point is in.

If you want to search for all points within X km of some target point, you generate all the "box names" around that point (once you've converted your target point to an (x,y,z) triple as well, that's easy) and eliminate all the boxes that don't intersect the Earth's surface (tricker, but use of the x^2+y^2+z^2=R^2 formula at each corner will tell you) you end up with a list of boxes target points can be in- so just search for all points matching one of those boxes, which will also return you some extra points. So as a final stage you need to calculate the actual distance to your target point and eliminate some (again, this can be sped up by working in Cartesian co-ordinates and converting your target great-circle distance radius to secant distance).

The fiddling around comes down to making sure you don't have to search too many boxes, but at the same time don't bring in too many extra points. I've found it useful to index each point on several different grids (e.g. resolutions of 1Km, 5Km, 25Km, 125Km etc). Ideally you want to be searching just one box, remember it expands to at least 27 as soon as your target radius exceeds your grid size.

I've used this technique to construct a spatial index using Lucene rather than doing calculations in a SQL databases. It does work, although there is some fiddling to set it up, and the indices take a while to generate and are quite big. Using an R-tree to hold all the co-ordinates is a much nicer approach, but would take more custom coding- this technique basically just requires a fast hash-table lookup (so would probably work well with all the NoSQL databases that are the rage these days, and should be usable in a SQL database too).

其他推荐答案

Maybe overkill, but it seems to me a clustering problem: distance measure will determine how the similarity of two elements is calculated. If you need a less naive solution try Data Mining: Practical Machine Learning Tools and Techniques, and use Weka or Orange