多于1列的B树索引是什么样子的?[英] what does a B-tree index on more than 1 column look like?

本文是小编为大家收集整理的关于多于1列的B树索引是什么样子的?的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

所以我正在阅读有关索引及其实施的信息,我偶然发现了这个网站,该网站简要说明了B-Tree索引:

/

B-Tree索引对于仅在单列上的索引非常有意义,但是假设我创建了一个带有多个列的索引,那么B树如何工作? b-tree中每个节点的值是多少?

例如,如果我有此表:

table customer:
id    number
name   varchar
phone_number   varchar
city   varchar

我创建了一个索引:(id,name,city)

然后运行以下查询:

SELECT id, name 
  FROM customer
 WHERE city = 'My City';

此查询如何利用多列索引,或者除非创建索引为(城市,ID,名称)或(城市,名称,id),否则它不利用它?

推荐答案

想象一下密钥由python元组(Col1,col2,col3)表示...索引操作涉及将tuple_a与tuple_b进行比较...如果您不知道哪个值您感兴趣的Col1和Col2,但只有COL3,然后必须阅读整个索引("完整索引扫描"),这不是高效.

如果您在(COL1,COL2,COL3)上有索引,那么您可以期望任何RDBMS都会使用该索引(直接方式)时,当WHERE子句包含对(1)所有3列(2)的引用时Col1和Col2(3)仅Col1.

否则(例如,在WHERE子句中仅Col3),RDBMS根本不会使用该索引(例如SQLITE),或者将进行完整的索引扫描(例如Oracle)[如果没有其他索引更好.<<<<<<<<<<<<<<<<<<<<<<

在您的特定示例中,假设ID是客户的唯一标识符,将其出现在索引中是毫无意义的(除了您的DBMS应该为主键或列以唯一的唯一键设置的索引以外) .

其他推荐答案

在大多数实现中,密钥只是一个更长的密钥,其中包含所有密钥值,带有分隔符.那里没有魔法; - )

在您的示例中,钥匙值可能看起来像

"123499|John Doe|Conway, NH"
"32144|Bill Gates| Seattle, WA"

使用复合键的这些索引的特征之一是,在某些情况下可以使用中间树节点来"覆盖"查询.

例如,如果查询要找到给定ID的名称和城市,则由于ID在索引中首先是索引,则该索引可以有效地搜索.一旦进入中间节点,它就可以从钥匙中"解析"名称和城市,并且不需要转到叶子节点来读取相同的.

但是,如果查询也希望显示电话号码,则在找到完整记录时逻辑会跟随叶子.

其他推荐答案

一些实现只需将列的顺序与分界符的顺序串联.

.

另一个解决方案是简单地将B树在B树中.当您在第一列上击中叶子时,您既可以获得匹配记录的列表,又可以获得下一列的迷你B-Tree,依此类推.因此,索引中指定的列的顺序对该索引是否对特定查询有用有很大的不同.

这是我上周写的一个相关问题:

do sql server to sql Server跳跃时复合群集索引?

本文地址:https://www.itbaoku.cn/post/597662.html

问题描述

So I was reading up on indexes and their implementation, and I stumbled upon this website that has a brief explanation of b-tree indexes:

http://20bits.com/articles/interview-questions-database-indexes/

The b-tree index makes perfect sense for indexes that are only on a single column, but let's say I create an index with multiple columns, how then does the b-tree work? What is the value of each node in the b-tree?

For example, if I have this table:

table customer:
id    number
name   varchar
phone_number   varchar
city   varchar

and I create an index on: (id, name, city)

and then run the following query:

SELECT id, name 
  FROM customer
 WHERE city = 'My City';

how does this query utilize the multiple column index, or does it not utilize it unless the index is created as (city, id, name) or (city, name, id) instead?

推荐答案

Imagine that the key is represented by a Python tuple (col1, col2, col3) ... the indexing operation involves comparing tuple_a with tuple_b ... if you have don't know which value of col1 and col2 that you are interested in, but only col3, then it would have to read the whole index ("full index scan"), which is not as efficient.

If you have an index on (col1, col2, col3), then you can expect that any RDBMS will use the index (in a direct manner) when the WHERE clause contains reference to (1) all 3 columns (2) both col1 and col2 (3) only col1.

Otherwise (e.g. only col3 in the WHERE clause), either the RDBMS will not use that index at all (e.g. SQLite), or will do a full index scan (e.g. Oracle) [if no other index is better].

In your specific example, presuming that id is a unique identifier of a customer, it is pointless to have it appear in an index (other than the index that your DBMS should set up for a primary key or column noted as UNIQUE).

其他推荐答案

With most implementations, the key is simply a longer key that includes all of the key values, with a separator. No magic there ;-)

In your example the key values could look something like

"123499|John Doe|Conway, NH"
"32144|Bill Gates| Seattle, WA"

One of the characteristics of these indexes with composite keys is that the intermediate tree nodes can be used in some cases to "cover" the query.

For example, if the query is to find the Name and City given the ID, since the ID is first in the index, the index can search by this efficiently. Once in the intermediate node, it can "parse" the Name and City, from the key, and doesn't need to go to the leaf node to read the same.

If however the query wanted also to display the phone number, then the logic would follow down the leaf when the full record is found.

其他推荐答案

Some implementations simply concatenate the values in the order of the columns, with delimiters.

Another solution is to simply have a b-tree within a b-tree. When you hit a leaf on the first column, you get both a list of matching records and a mini b-tree of the next column, and so on. Thus, the order of the columns specified in the index makes a huge difference on whether that index will be useful for particular queries.

Here's a related question I wrote last week:

Does SQL Server jump leaves when using a composite clustered index?