本文是小编为大家收集整理的关于pandas.read_csv将列名移到一个以上。的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。
问题描述
我正在使用all.zip文件位于在这里.我的目标是使用它创建熊猫数据框架.但是,如果我运行
data=pd.read_csv(foo.csv)
列名不匹配.第一列没有名称,然后将第二列标记为第一列,最后一列是一系列NAN.所以我尝试了
colnames=[list of colnames] data=pd.read_csv(foo.csv, names=colnames, header=False)
给了我完全相同的东西,所以我跑了
data=pd.read_csv(foo.csv, names=colnames)
完美地将colnames列出,但具有CSV分配的列名(CSV文档中的第一行)与第一行数据完美对齐.所以我跑了
data=data[1:]
因此,我在没有解决实际问题的情况下找到了工作.我看着 read_csv 压倒性的,无法仅使用PD.Read_CSV来解决此问题.
什么是基本问题(我假设它是用户错误或文件问题)?有没有一种方法可以用read_csv的命令之一对其进行修复?
这是CSV文件的前2行
cmte_id,cand_id,cand_nm,contbr_nm,contbr_city,contbr_st,contbr_zip,contbr_employer,contbr_occupation,contb_receipt_amt,contb_receipt_dt,receipt_desc,memo_cd,memo_text,form_tp,file_num,tran_id,election_tp C00458844,"P60006723","Rubio, Marco","HEFFERNAN, MICHAEL","APO","AE","090960009","INFORMATION REQUESTED PER BEST EFFORTS","INFORMATION REQUESTED PER BEST EFFORTS",210,27-JUN-15,"","","","SA17A","1015697","SA17.796904","P2016",
推荐答案
这不是您有问题的列,而是索引
import pandas as pd df = pd.read_csv('P00000001-ALL.csv', index_col=False, low_memory=False) print(df.head(1)) cmte_id cand_id cand_nm contbr_nm contbr_city \ 0 C00458844 P60006723 Rubio, Marco HEFFERNAN, MICHAEL APO contbr_st contbr_zip contbr_employer \ 0 AE 090960009 INFORMATION REQUESTED PER BEST EFFORTS contbr_occupation contb_receipt_amt contb_receipt_dt \ 0 INFORMATION REQUESTED PER BEST EFFORTS 210 27-JUN-15 receipt_desc memo_cd memo_text form_tp file_num tran_id election_tp 0 NaN NaN NaN SA17A 1015697 SA17.796904 P2016
low_memory=False是因为第6列具有混合的数据类型.
其他推荐答案
问题来自文件中的每一行,除了第一个终止逗号(分隔符).熊猫认为如果需要将第一个"列名"视为索引列.
尝试
data= pd.read_csv('P00000001-AL.csv',index_col=False)
问题描述
I am using the ALL.zip file located here. My goal is to create a pandas DataFrame with it. However, if I run
data=pd.read_csv(foo.csv)
the column names do not match up. The first column has no name, and then the second column is labeled with the first, and the last column is a Series of NaN. So I tried
colnames=[list of colnames] data=pd.read_csv(foo.csv, names=colnames, header=False)
which gave me the exact same thing, so I ran
data=pd.read_csv(foo.csv, names=colnames)
which lined the colnames up perfectly, but had the csv assigned column names(the first line in the csv document) perfectly aligned as the first row of data it. So I ran
data=data[1:]
which did the trick.
So I found a work around without solving the actual problem. I looked at the read_csv document and found it a bit overwhelming, and could not figure out a way using only pd.read_csv to fix this problem.
What was the fundamental problem (I am assuming it is either user error or a problem with the file)? Is there a way to fix it with one of the commands from the read_csv?
Here is the first 2 rows from the csv file
cmte_id,cand_id,cand_nm,contbr_nm,contbr_city,contbr_st,contbr_zip,contbr_employer,contbr_occupation,contb_receipt_amt,contb_receipt_dt,receipt_desc,memo_cd,memo_text,form_tp,file_num,tran_id,election_tp C00458844,"P60006723","Rubio, Marco","HEFFERNAN, MICHAEL","APO","AE","090960009","INFORMATION REQUESTED PER BEST EFFORTS","INFORMATION REQUESTED PER BEST EFFORTS",210,27-JUN-15,"","","","SA17A","1015697","SA17.796904","P2016",
推荐答案
It's not the column that you're having a problem with, it's the index
import pandas as pd df = pd.read_csv('P00000001-ALL.csv', index_col=False, low_memory=False) print(df.head(1)) cmte_id cand_id cand_nm contbr_nm contbr_city \ 0 C00458844 P60006723 Rubio, Marco HEFFERNAN, MICHAEL APO contbr_st contbr_zip contbr_employer \ 0 AE 090960009 INFORMATION REQUESTED PER BEST EFFORTS contbr_occupation contb_receipt_amt contb_receipt_dt \ 0 INFORMATION REQUESTED PER BEST EFFORTS 210 27-JUN-15 receipt_desc memo_cd memo_text form_tp file_num tran_id election_tp 0 NaN NaN NaN SA17A 1015697 SA17.796904 P2016
The low_memory=False is because column 6 has mixed datatype.
其他推荐答案
The problem comes from having every line in the file except for the first terminating in a comma (the separator character). Pandas thinks there's an empty column there if it needs to consider the first 'column name' as the index column.
Try
data= pd.read_csv('P00000001-AL.csv',index_col=False)