如何从按距离排序的JPA实体中获得结果?[英] How can I get results from a JPA entity ordered by distance?

本文是小编为大家收集整理的关于如何从按距离排序的JPA实体中获得结果?的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

本文来自:IT宝库(https://www.itbaoku.cn)

我目前正在编写一个移动应用程序,用户必须从列表中选择位置.所有位置都使用Play应用程序中的JPA存储在Postgres数据库中.

我想做的就是获取用户在应用程序中的位置,然后提出请求以获取与该用户最近的20或50个位置.

如果我为此使用了自己的数据结构,我将使用KD-Tree.但是,我对JPA/Play/PostgreSQL非常陌生,因此我不确定如何手动处理数据持久性.

我目前的知识唯一能想到的是查看每个位置并确定其距离,但这在如此巨大的数据库中会非常慢.

我可以说"给我x距离这个纬度和经度的距离订购的x第一个结果?

编辑:我正在使用Heroku,并且由于该申请处于开发的早期阶段,因此如果您想将PostGIS与您的应用程序一起使用,我不必支付200美元/月Heroku的要求.

推荐答案

您真的不想为此滚动自己的数据结构,但幸运的是,您正在使用PostgreSQL,因此您很幸运.使用 postgis .它的数量级将比您可以在合理的时间内建立的任何数量级.

其他推荐答案

这是我在大约3年前构建的应用中使用的函数的很大简化版本.适应当前的问题.

  • 使用 box 在点的周长中找到位置.一个人可以用一个圆圈来获得更准确的结果,但这只是一个近似.

    .
  • 忽略了世界不是平坦的事实.我的申请仅适用于一个地区,跨越了100公里.搜索周边只跨越几公里.使世界平坦足以实现目的. (todo:根据地理位置取决于地理位置的比率更好的比率可能会有所帮助.)

  • 像从Google Maps获得的地理编码一起操作.

  • 与标准的Postgresql 无需扩展(不需要后gis),在Postgresql 9.1和9.2上进行了测试.

    .

没有索引,必须计算基础表中每一行的距离并过滤最接近的距离.大桌子非常昂贵.

编辑:
我进行了重新检查,当前的实现允许点索引(Postgres 9.1或更高版本).相应地简化了代码.

主要技巧是使用 boxes 的功能性要素索引,即使列只是一个点.这使得使用现有 GIST实现. >

通过这样的(非常快速)的搜索,我们可以将所有位置都放在盒子内.剩下的问题:我们知道行的数量,但我们不知道它们所在的盒子的大小.这就像知道部分答案,但不知道问题.

我使用类似的反向在 dba.se上的此相关答案. (只是,我在这里不使用部分索引 - 实际上也可能起作用).

通过一系列预定义的搜索步骤迭代,从很小到"足够大至少足够的位置".意味着我们必须运行几个(非常快的)查询才能到达搜索框的大小.

然后,使用此框搜索基本表,并计算仅索引返回的几行的实际距离.通常会有一些盈余,因为我们发现握住的盒子至少足够的位置.通过占据最接近的,我们有效地围绕着盒子的角落.您可以通过使盒子变大来强制这种效果(在功能中由sqrt(2)乘以radius以完全获得准确结果,但是我不会全力以赴,因为这是近似开始).

使用 sp gist 索引,最新版本的PostgreSQL.但是我不知道这是否可以.我们需要对数据类型进行实际实施,我没有时间潜入其中.如果找到方法,请保证报告!

给出了这个简化的表,上面有一些示例值(adr ..地址):

CREATE TABLE adr(adr_id int, adr text, geocode point);
INSERT INTO adr (adr_id, adr, geocode) VALUES
    (1,  'adr1', '(48.20117,16.294)'),
    (2,  'adr2', '(48.19834,16.302)'),
    (3,  'adr3', '(48.19755,16.299)'),
    (4,  'adr4', '(48.19727,16.303)'),
    (5,  'adr5', '(48.19796,16.304)'),
    (6,  'adr6', '(48.19791,16.302)'),
    (7,  'adr7', '(48.19813,16.304)'),
    (8,  'adr8', '(48.19735,16.299)'),
    (9,  'adr9', '(48.19746,16.297)');

索引看起来像这样:

CREATE INDEX adr_geocode_gist_idx ON adr USING gist (geocode);

- > sqlfiddle >

您必须根据您的需求调整家庭区域,步骤和缩放系数.只要您在一个点附近几公里的盒子里搜索,平坦的地球就足够近似.

您需要很好地了解PLPGSQL才能使用.我觉得我在这里做得很好.

CREATE OR REPLACE FUNCTION f_find_around(_lat double precision, _lon double precision, _limit bigint = 50)
  RETURNS TABLE(adr_id int, adr text, distance int) AS
$func$
DECLARE
   _homearea   CONSTANT box := '(49.05,17.15),(46.35,9.45)'::box;      -- box around legal area
-- 100m = 0.0008892                   250m, 340m, 450m, 700m,1000m,1500m,2000m,3000m,4500m,7000m
   _steps      CONSTANT real[] := '{0.0022,0.003,0.004,0.006,0.009,0.013,0.018,0.027,0.040,0.062}';  -- find optimum _steps by experimenting
   geo2m       CONSTANT integer := 73500;                              -- ratio geocode(lon) to meter (found by trial & error with google maps)
   lat2lon     CONSTANT real := 1.53;                                  -- ratio lon/lat (lat is worth more; found by trial & error with google maps in (Vienna)
   _radius     real;                                                   -- final search radius
   _area       box;                                                    -- box to search in
   _count      bigint := 0;                                            -- count rows
   _point      point := point($1,$2);                                  -- center of search
   _scalepoint point := point($1 * lat2lon, $2);                       -- lat scaled to adjust
BEGIN

 -- Optimize _radius
IF (_point <@ _homearea) THEN
   FOREACH _radius IN ARRAY _steps LOOP
      SELECT INTO _count  count(*) FROM adr a
      WHERE  a.geocode <@ box(point($1 - _radius, $2 - _radius * lat2lon)
                            , point($1 + _radius, $2 + _radius * lat2lon));

      EXIT WHEN _count >= _limit;
   END LOOP;
END IF;

IF _count = 0 THEN                                                     -- nothing found or not in legal area
   EXIT;
ELSE
   IF _radius IS NULL THEN
      _radius := _steps[array_upper(_steps,1)];                        --  max. _radius
   END IF;
   _area := box(point($1 - _radius, $2 - _radius * lat2lon)
              , point($1 + _radius, $2 + _radius * lat2lon));
END IF;

RETURN QUERY
SELECT a.adr_id
      ,a.adr
      ,((point (a.geocode[0] * lat2lon, a.geocode[1]) <-> _scalepoint) * geo2m)::int4 AS distance
FROM   adr a
WHERE  a.geocode <@ _area
ORDER  BY distance, a.adr, a.adr_id
LIMIT  _limit;

END
$func$  LANGUAGE plpgsql;

呼叫:

SELECT * FROM f_find_around (48.2, 16.3, 20);

返回$3位置的列表,如果已定义的最大搜索区域中足够.
按实际距离分类.

进一步改进

构建一个函数:

CREATE OR REPLACE FUNCTION f_geo2m(double precision, double precision)
  RETURNS point AS
$BODY$
SELECT point($1 * 111200, $2 * 111400 * cos(radians($1)));
$BODY$
  LANGUAGE sql IMMUTABLE;

COMMENT ON FUNCTION f_geo2m(double precision, double precision)
IS 'Project geocode to approximate metric coordinates.
    SELECT f_geo2m(48.20872, 16.37263)  --';

(从字面上)全局常数111200和111400从一定程度的经度 ,但基本上只是在世界各地工作.

使用它在基本表中添加缩放的地理编码,理想情况下是A 生成的列在此答案中概述:
一年?
请参阅 3.黑魔法版我在此过程中引导您.
然后,您可以更多地简化该函数:比例输入值一次并删除冗余计算.

本文地址:https://www.itbaoku.cn/post/997594.html

问题描述

I am currently writing a mobile application where the user has to pick a location from a list. All the locations are stored in a Postgres database using JPA from a Play app.

What I would like to do is get the users location in the app, and then make a request to get the first 20 or 50 locations nearest to that user.

If I was using my own data structure for this, I would use a KD-Tree. However, I am very new to JPA/Play/PostgreSQL so I am unsure how to handle data persistance manually.

The only thing I can think of with my current knowledge would be to look at each location and determine it's distance but that would be incredibly slow on such a huge database.

Is there a query I can run to say "give me the X first results ordered by distance from this latitude and longitude?

EDIT: I am using Heroku and since the application is in the early stages of development, I would prefer to not have to pay the $200/month Heroku requires if you want to use PostGIS with your app.

推荐答案

You really don't want to be rolling your own data structure for this, but fortunately you're using PostgreSQL, so you're in luck. Use PostGIS. It's going to be orders of magnitude faster than anything you can build in a reasonable amount of time.

其他推荐答案

This is a largely simplified version of a function I use in an app that built around 3 years ago. Adapted to the question at hand.

  • Finds locations in the perimeter of a point using a box. One could do this with a circle to get more accurate results, but this is only meant to be an approximation to begin with.

  • Ignores the fact that the world is not flat. My application was only meant for a local region, a few 100 kilometers across. And the search perimeter only spans a few kilometers across. Making the world flat is good enough for the purpose. (Todo: A better approximation for the ratio lat/lon depending on the geolocation might help.)

  • Operates with geocodes like you get from Google maps.

  • Works with standard PostgreSQL without extension (no PostGis required), tested on PostgreSQL 9.1 and 9.2.

Without index, one would have to calculate the distance for every row in the base table and filter the closest ones. Extremely expensive with big tables.

Edit:
I rechecked and the current implementation allows a GisT index on points (Postgres 9.1 or later). Simplified the code accordingly.

The major trick is to use a functional GiST index of boxes, even though the column is just a point. This makes it possible to use the existing GiST implementation.

With such a (very fast) search, we can get all locations inside a box. The remaining problem: we know the number of rows, but we do not know the size of the box they are in. That's like knowing part of the answer, but not the question.

I use a similar reverse-lookup approach to the one described in more detail in this related answer on dba.SE. (Only, I am not using partial indexes here - might actually work, too).

Iterate through an array of pre-defined search-steps, from very small up to "just big enough to hold at least enough locations". Means we have to run a couple of (very fast) queries to get to the size for the search box.

Then search the base table with this box and calculate actual distance for only the few rows returned from the index. There will usually be some surplus since we found the box holding at least enough locations. By taking the closest ones, we effectively round the corners of the box. You could force this effect by making the box a notch bigger (multiply radius in the function by sqrt(2) to get completely accurate results, but I wouldn't go all out, since this is approximating to begin with).

This would be even faster and simpler with an SP GiST index, available in the latest version of PostgreSQL. But I don't know if that's possible yet. We'd need an actual implementation for the data type and I didn't have the time to dive into it. If you find a way, promise to report back!

Given this simplified table with some example values (adr .. address):

CREATE TABLE adr(adr_id int, adr text, geocode point);
INSERT INTO adr (adr_id, adr, geocode) VALUES
    (1,  'adr1', '(48.20117,16.294)'),
    (2,  'adr2', '(48.19834,16.302)'),
    (3,  'adr3', '(48.19755,16.299)'),
    (4,  'adr4', '(48.19727,16.303)'),
    (5,  'adr5', '(48.19796,16.304)'),
    (6,  'adr6', '(48.19791,16.302)'),
    (7,  'adr7', '(48.19813,16.304)'),
    (8,  'adr8', '(48.19735,16.299)'),
    (9,  'adr9', '(48.19746,16.297)');

The index looks like this:

CREATE INDEX adr_geocode_gist_idx ON adr USING gist (geocode);

-> SQLfiddle

You'll have to adjust the home area, the steps and the scaling factor to your needs. As long as you search in boxes of a few kilometers around a point, a flat earth is a good enough approximation.

You need to understand plpgsql well to work with this. I feel I have done quite enough here.

CREATE OR REPLACE FUNCTION f_find_around(_lat double precision, _lon double precision, _limit bigint = 50)
  RETURNS TABLE(adr_id int, adr text, distance int) AS
$func$
DECLARE
   _homearea   CONSTANT box := '(49.05,17.15),(46.35,9.45)'::box;      -- box around legal area
-- 100m = 0.0008892                   250m, 340m, 450m, 700m,1000m,1500m,2000m,3000m,4500m,7000m
   _steps      CONSTANT real[] := '{0.0022,0.003,0.004,0.006,0.009,0.013,0.018,0.027,0.040,0.062}';  -- find optimum _steps by experimenting
   geo2m       CONSTANT integer := 73500;                              -- ratio geocode(lon) to meter (found by trial & error with google maps)
   lat2lon     CONSTANT real := 1.53;                                  -- ratio lon/lat (lat is worth more; found by trial & error with google maps in (Vienna)
   _radius     real;                                                   -- final search radius
   _area       box;                                                    -- box to search in
   _count      bigint := 0;                                            -- count rows
   _point      point := point($1,$2);                                  -- center of search
   _scalepoint point := point($1 * lat2lon, $2);                       -- lat scaled to adjust
BEGIN

 -- Optimize _radius
IF (_point <@ _homearea) THEN
   FOREACH _radius IN ARRAY _steps LOOP
      SELECT INTO _count  count(*) FROM adr a
      WHERE  a.geocode <@ box(point($1 - _radius, $2 - _radius * lat2lon)
                            , point($1 + _radius, $2 + _radius * lat2lon));

      EXIT WHEN _count >= _limit;
   END LOOP;
END IF;

IF _count = 0 THEN                                                     -- nothing found or not in legal area
   EXIT;
ELSE
   IF _radius IS NULL THEN
      _radius := _steps[array_upper(_steps,1)];                        --  max. _radius
   END IF;
   _area := box(point($1 - _radius, $2 - _radius * lat2lon)
              , point($1 + _radius, $2 + _radius * lat2lon));
END IF;

RETURN QUERY
SELECT a.adr_id
      ,a.adr
      ,((point (a.geocode[0] * lat2lon, a.geocode[1]) <-> _scalepoint) * geo2m)::int4 AS distance
FROM   adr a
WHERE  a.geocode <@ _area
ORDER  BY distance, a.adr, a.adr_id
LIMIT  _limit;

END
$func$  LANGUAGE plpgsql;

Call:

SELECT * FROM f_find_around (48.2, 16.3, 20);

Returns a list of $3 locations, if there are enough in the defined maximum search area.
Sorted by actual distance.

Further improvements

Build a function like:

CREATE OR REPLACE FUNCTION f_geo2m(double precision, double precision)
  RETURNS point AS
$BODY$
SELECT point($1 * 111200, $2 * 111400 * cos(radians($1)));
$BODY$
  LANGUAGE sql IMMUTABLE;

COMMENT ON FUNCTION f_geo2m(double precision, double precision)
IS 'Project geocode to approximate metric coordinates.
    SELECT f_geo2m(48.20872, 16.37263)  --';

The (literally) global constants 111200 and 111400 are optimized for my area (Austria) from the Length of a degree of longitude and The length of a degree of latitude, but basically just work all over the world.

Use it to add a scaled geocode to the base table, ideally a generated column like outlined in this answer:
How do you do date math that ignores the year?
Refer to 3. Black magic version where I walk you through the process.
Then you can simplify the function some more: Scale input values once and remove redundant calculations.