Detecting CSV file truncation when NA is a valid value

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP


Detecting CSV file truncation when NA is a valid value



If I use the Pandas read_csv() function, elements of short rows are mapped to NaN by default. I would like to suppress that mapping while interpreting NA as NaN. I'm mostly interested in file truncation as a result of transmission problems, but short rows in the middle of the file should feature the same missing" value. I tried messing around with na_filter=False and keep_default_na=False, and while each seemed to map empty cells to the empty string, neither mapped the string NA to NaN.


NA


NaN


missing"


na_filter=False


keep_default_na=False


NA


NaN



Is there a way to have my cake (NA => NaN) and eat it too (missing values not mapped to NaN)? (I'm using Pandas 0.22.0 w/ Python 3.6.)


NA


NaN


NaN


Pandas 0.22.0 w/ Python 3.6.



Example:


col1,col2,col3,col4
1,2,NA,4
4,5
12



Assume the file has been truncated, so the characters "12" are the last in the file (no EOF). With na_filter and keep_default_na at their default values of True, the resulting values are


"12"


na_filter


keep_default_na


True


1,2,NaN,4
4,5,NaN,NaN
12,NaN,NaN,NaN



If I set either to False, I get


False


1,2,NA,4
4,5,,
12,,,



I would like to find some way to get a NaN out of the third column of the first row without also mapping the missing values to NaN.


NaN


NaN





You need to give more details, e.g., example data, expected output, attempts at a solution not in paragraph form.
– Rushabh Mehta
39 mins ago




1 Answer
1



By default, Pandas will interpret the following values as NaN (from the docs):


NaN



The default NaN recognized values are ['-1.#IND', '1.#QNAN', '1.#IND',
'-1.#QNAN', '#N/A N/A', '#N/A', 'N/A', 'n/a', 'NA', '#NA', 'NULL',
'null', 'NaN', '-NaN', 'nan', '-nan', '']
.


NaN


['-1.#IND', '1.#QNAN', '1.#IND',
'-1.#QNAN', '#N/A N/A', '#N/A', 'N/A', 'n/a', 'NA', '#NA', 'NULL',
'null', 'NaN', '-NaN', 'nan', '-nan', '']



In order to avoid NA being parsed as a null value, you need to set keep_default_na=False and specify the na_values directly.


NA


keep_default_na=False


na_values



To address your second problem, in order to avoid Pandas throwing errors when reading in a file with a variable number of columns, you need to specify the columns names. Putting it all together:


# new null values, removing NA from the list
new_na_values = ['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A N/A', '#N/A', 'N/A', 'n/a', '#NA', 'NULL', 'null', 'NaN', '-NaN', 'nan', '-nan', '']

# read in the file
df = pd.read_csv("<path to file>.csv", keep_default_na=False, na_values=new_na_values, names=["col1", "col2", "col3", "col4"])






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Makefile test if variable is not empty

Will Oldham

'Series' object is not callable Error / Statsmodels illegal variable name