本文是小编为大家收集整理的关于在Pandas中切分时的数值错误的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。
问题描述
我有一个我想使用" str.contrains()"方法的数据框.我相信当我阅读 pandas + dataframe时,我已经找到了如何做到这一点.但是,我一直遇到一个价值错误.
我的数据帧如下:
ID,ENROLLMENT_DATE,TRAINER_MANAGING,TRAINER_OPERATOR,FIRST_VISIT_DATE 1536D,12-Feb-12,"06DA1B3-Lebanon NH",,15-Feb-12 F15D,18-May-12,"06405B2-Lebanon NH",,25-Jul-12 8096,8-Aug-12,"0643D38-Hanover NH","0643D38-Hanover NH",25-Jun-12 A036,1-Apr-12,"06CB8CF-Hanover NH","06CB8CF-Hanover NH",9-Aug-12 8944,19-Feb-12,"06D26AD-Hanover NH",,4-Feb-12 1004E,8-Jun-12,"06388B2-Lebanon NH",,24-Dec-11 11795,3-Jul-12,"0649597-White River VT","0649597-White River VT",30-Mar-12 30D7,11-Nov-12,"06D95A3-Hanover NH","06D95A3-Hanover NH",30-Nov-11 3AE2,21-Feb-12,"06405B2-Lebanon NH",,26-Oct-12 B0FE,17-Feb-12,"06D1B9D-Hartland VT",,16-Feb-12 127A1,11-Dec-11,"064456E-Hanover NH","064456E-Hanover NH",11-Nov-12 161FF,20-Feb-12,"0643D38-Hanover NH","0643D38-Hanover NH",3-Jul-12 A036,30-Nov-11,"063B208-Randolph VT","063B208-Randolph VT", 475B,25-Sep-12,"06D26AD-Hanover NH",,5-Nov-12 151A3,7-Mar-12,"06388B2-Lebanon NH",,16-Nov-12 CA62,3-Jan-12,,, D31B,18-Dec-11,"06405B2-Lebanon NH",,9-Jan-12 20F5,8-Jul-12,"0669C50-Randolph VT",,3-Feb-12 8096,19-Dec-11,"0649597-White River VT","0649597-White River VT",9-Apr-12 14E48,1-Aug-12,"06D3206-Hanover NH",, 177F8,20-Aug-12,"063B208-Randolph VT","063B208-Randolph VT",5-May-12 553E,11-Oct-12,"06D95A3-Hanover NH","06D95A3-Hanover NH",8-Mar-12 12D5F,18-Jul-12,"0649597-White River VT","0649597-White River VT",2-Nov-12 C6DC,13-Apr-12,"06388B2-Lebanon NH",, 11795,27-Feb-12,"0643D38-Hanover NH","0643D38-Hanover NH",19-Jun-12 17B43,11-Aug-12,,,22-Oct-12 A036,11-Aug-12,"06D3206-Hanover NH",,19-Jun-12
然后我运行以下代码:
test = pandas.read_csv('testcsv.csv') test[test.TRAINER_MANAGING.str.contains('Han', na=False)]
我会收到以下错误:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-54-e0c4624c9346> in <module>() ----> 1 test[test.TRAINER_MANAGING.str.contains('Han', na=False)] .virtualenvs/ipython/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key) 1958 1959 # also raises Exception if object array with NA values -> 1960 if com._is_bool_indexer(key): 1961 key = np.asarray(key, dtype=bool) 1962 return self._getitem_array(key) .virtualenvs/ipython/lib/python2.7/site-packages/pandas/core/common.pyc in _is_bool_indexer(key) 685 if not lib.is_bool_array(key): 686 if isnull(key).any(): --> 687 raise ValueError('cannot index with vector containing ' 688 'NA / NaN values') 689 return False ValueError: cannot index with vector containing NA / NaN values
我觉得我缺少一些简单的东西.任何帮助,将不胜感激.
推荐答案
您的字符串搜索仍然返回NAN值,而切片操作仅适用于布尔值.看起来" na = false"不起作用(在这种情况下?),我可以使用最新(已发布的)pandas版本在我的计算机上复制它.
您可以通过首先将.fillna()函数应用于以下结果来解决它.
test[test.TRAINER_MANAGING.str.contains('Han').fillna(False)]
返回:
ID ENROLLMENT_DATE TRAINER_MANAGING TRAINER_OPERATOR FIRST_VISIT_DATE 2 8096 8-Aug-12 0643D38-Hanover NH 0643D38-Hanover NH 25-Jun-12 3 A036 1-Apr-12 06CB8CF-Hanover NH 06CB8CF-Hanover NH 9-Aug-12 4 8944 19-Feb-12 06D26AD-Hanover NH NaN 4-Feb-12 7 30D7 11-Nov-12 06D95A3-Hanover NH 06D95A3-Hanover NH 30-Nov-11 10 127A1 11-Dec-11 064456E-Hanover NH 064456E-Hanover NH 11-Nov-12 11 161FF 20-Feb-12 0643D38-Hanover NH 0643D38-Hanover NH 3-Jul-12 13 475B 25-Sep-12 06D26AD-Hanover NH NaN 5-Nov-12 19 14E48 1-Aug-12 06D3206-Hanover NH NaN NaN 21 553E 11-Oct-12 06D95A3-Hanover NH 06D95A3-Hanover NH 8-Mar-12 24 11795 27-Feb-12 0643D38-Hanover NH 0643D38-Hanover NH 19-Jun-12 26 A036 11-Aug-12 06D3206-Hanover NH NaN 19-Jun-12
我以前从未使用过str.Contains功能,因此我不确定它是否正确工作.如果应该像您的示例中工作,我们应该在GitHub上打开一个问题.
问题描述
I have a DataFrame that I would like to use the 'str.contrains()' method. I believed I had found how to do this when I read pandas + dataframe - select by partial string. However, I keep getting a value error.
My DataFrame is as follow:
ID,ENROLLMENT_DATE,TRAINER_MANAGING,TRAINER_OPERATOR,FIRST_VISIT_DATE 1536D,12-Feb-12,"06DA1B3-Lebanon NH",,15-Feb-12 F15D,18-May-12,"06405B2-Lebanon NH",,25-Jul-12 8096,8-Aug-12,"0643D38-Hanover NH","0643D38-Hanover NH",25-Jun-12 A036,1-Apr-12,"06CB8CF-Hanover NH","06CB8CF-Hanover NH",9-Aug-12 8944,19-Feb-12,"06D26AD-Hanover NH",,4-Feb-12 1004E,8-Jun-12,"06388B2-Lebanon NH",,24-Dec-11 11795,3-Jul-12,"0649597-White River VT","0649597-White River VT",30-Mar-12 30D7,11-Nov-12,"06D95A3-Hanover NH","06D95A3-Hanover NH",30-Nov-11 3AE2,21-Feb-12,"06405B2-Lebanon NH",,26-Oct-12 B0FE,17-Feb-12,"06D1B9D-Hartland VT",,16-Feb-12 127A1,11-Dec-11,"064456E-Hanover NH","064456E-Hanover NH",11-Nov-12 161FF,20-Feb-12,"0643D38-Hanover NH","0643D38-Hanover NH",3-Jul-12 A036,30-Nov-11,"063B208-Randolph VT","063B208-Randolph VT", 475B,25-Sep-12,"06D26AD-Hanover NH",,5-Nov-12 151A3,7-Mar-12,"06388B2-Lebanon NH",,16-Nov-12 CA62,3-Jan-12,,, D31B,18-Dec-11,"06405B2-Lebanon NH",,9-Jan-12 20F5,8-Jul-12,"0669C50-Randolph VT",,3-Feb-12 8096,19-Dec-11,"0649597-White River VT","0649597-White River VT",9-Apr-12 14E48,1-Aug-12,"06D3206-Hanover NH",, 177F8,20-Aug-12,"063B208-Randolph VT","063B208-Randolph VT",5-May-12 553E,11-Oct-12,"06D95A3-Hanover NH","06D95A3-Hanover NH",8-Mar-12 12D5F,18-Jul-12,"0649597-White River VT","0649597-White River VT",2-Nov-12 C6DC,13-Apr-12,"06388B2-Lebanon NH",, 11795,27-Feb-12,"0643D38-Hanover NH","0643D38-Hanover NH",19-Jun-12 17B43,11-Aug-12,,,22-Oct-12 A036,11-Aug-12,"06D3206-Hanover NH",,19-Jun-12
Then I run the following code:
test = pandas.read_csv('testcsv.csv') test[test.TRAINER_MANAGING.str.contains('Han', na=False)]
and I get the following error:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-54-e0c4624c9346> in <module>() ----> 1 test[test.TRAINER_MANAGING.str.contains('Han', na=False)] .virtualenvs/ipython/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key) 1958 1959 # also raises Exception if object array with NA values -> 1960 if com._is_bool_indexer(key): 1961 key = np.asarray(key, dtype=bool) 1962 return self._getitem_array(key) .virtualenvs/ipython/lib/python2.7/site-packages/pandas/core/common.pyc in _is_bool_indexer(key) 685 if not lib.is_bool_array(key): 686 if isnull(key).any(): --> 687 raise ValueError('cannot index with vector containing ' 688 'NA / NaN values') 689 return False ValueError: cannot index with vector containing NA / NaN values
I feel like I am missing something simple. Any help would be appreciated.
推荐答案
Your string search still returns nan values whereas the slicing operation works with booleans only. It appears 'na=False' is not working (in this case?), i can replicate it on my machine with the latest (released) Pandas version.
You can workaround it by first applying the .fillna() function to the results like:
test[test.TRAINER_MANAGING.str.contains('Han').fillna(False)]
Which returns:
ID ENROLLMENT_DATE TRAINER_MANAGING TRAINER_OPERATOR FIRST_VISIT_DATE 2 8096 8-Aug-12 0643D38-Hanover NH 0643D38-Hanover NH 25-Jun-12 3 A036 1-Apr-12 06CB8CF-Hanover NH 06CB8CF-Hanover NH 9-Aug-12 4 8944 19-Feb-12 06D26AD-Hanover NH NaN 4-Feb-12 7 30D7 11-Nov-12 06D95A3-Hanover NH 06D95A3-Hanover NH 30-Nov-11 10 127A1 11-Dec-11 064456E-Hanover NH 064456E-Hanover NH 11-Nov-12 11 161FF 20-Feb-12 0643D38-Hanover NH 0643D38-Hanover NH 3-Jul-12 13 475B 25-Sep-12 06D26AD-Hanover NH NaN 5-Nov-12 19 14E48 1-Aug-12 06D3206-Hanover NH NaN NaN 21 553E 11-Oct-12 06D95A3-Hanover NH 06D95A3-Hanover NH 8-Mar-12 24 11795 27-Feb-12 0643D38-Hanover NH 0643D38-Hanover NH 19-Jun-12 26 A036 11-Aug-12 06D3206-Hanover NH NaN 19-Jun-12
I have never used the str.contains function before so im not sure if it doesnt work correctly. We should open an issue on github if it should work as in your example.