# pandas idxmax：在出现并列的情况下返回所有的行[英] pandas idxmax: return all rows in case of ties

### 问题描述

```data = [['chr1',100,200,0.2],
['ch1',300,500,0.3],
['chr1', 300, 500, 0.3],
['chr1', 600, 800, 0.3]]
```

```weighted = pd.DataFrame.from_records(data,columns=['chrom','start','end','probability'])
```

```  chrom  start  end  probability
0  chr1    100  200          0.2
1   ch1    300  500          0.3
2  chr1    300  500          0.3
3  chr1    600  800          0.3
```

```selected =  weighted.ix[weighted['probability'].idxmax()]
```

```chrom          ch1
start          300
end            500
probability    0.3
Name: 1, dtype: object
```

## 推荐答案

```weighted.loc[weighted['probability']==weighted['probability'].max()].T
#               1     2     3
#chrom        ch1  chr1  chr1
#start        300   300   600
#end          500   500   800
#probability  0.3   0.3   0.3
```

## 其他推荐答案

```df2 = df[df['probability'].values == df['probability'].values.max()]
```

```# tested on Pandas v0.19.2, Python 3.6.0

df = pd.concat([df]*100000, ignore_index=True)

%timeit df['probability'].eq(df['probability'].max())               # 3.78 ms per loop
%timeit df['probability'].values == df['probability'].values.max()  # 416 µs per loop
```

### 问题描述

I am working with a dataframe where I have weight each row by its probability. Now, I want to select the row with the highest probability and I am using pandas idxmax() to do so, however when there are ties, it just returns the first row among the ones that tie. In my case, I want to get all the rows that tie.

Furthermore, I am doing this as part of a research project where I am processing millions a dataframes like the one below, so keeping it fast is an issue.

Example:

My data looks like this:

```data = [['chr1',100,200,0.2],
['ch1',300,500,0.3],
['chr1', 300, 500, 0.3],
['chr1', 600, 800, 0.3]]
```

From this list, I create a pandas dataframe as follows:

```weighted = pd.DataFrame.from_records(data,columns=['chrom','start','end','probability'])
```

Which looks like this:

```  chrom  start  end  probability
0  chr1    100  200          0.2
1   ch1    300  500          0.3
2  chr1    300  500          0.3
3  chr1    600  800          0.3
```

Then select the row that fits argmax(probability) using:

```selected =  weighted.ix[weighted['probability'].idxmax()]
```

Which of course returns:

```chrom          ch1
start          300
end            500
probability    0.3
Name: 1, dtype: object
```

Is there a (fast) way to the get all the values when there are ties?

thanks!

## 推荐答案

Well, this might be solution you are looking for:

```weighted.loc[weighted['probability']==weighted['probability'].max()].T
#               1     2     3
#chrom        ch1  chr1  chr1
#start        300   300   600
#end          500   500   800
#probability  0.3   0.3   0.3
```

## 其他推荐答案

The bottleneck lies in calculating the Boolean indexer. You can bypass the overhead associated with pd.Series objects by performing calculations with the underlying NumPy array:

```df2 = df[df['probability'].values == df['probability'].values.max()]
```

Performance benchmarking with the Pandas equivalent:

```# tested on Pandas v0.19.2, Python 3.6.0

df = pd.concat([df]*100000, ignore_index=True)

%timeit df['probability'].eq(df['probability'].max())               # 3.78 ms per loop
%timeit df['probability'].values == df['probability'].values.max()  # 416 µs per loop
```