Handling missing/incomplete data in R

Posted by doug on Stack Overflow See other posts from Stack Overflow or by doug
Published on 2010-04-10T12:52:47Z Indexed on 2010/04/10 15:13 UTC
Read the original article Hit count: 361

Filed under:

data

As you would expect from a DSL aimed at data analysts, R handles missing/incomplete data very well, for instance:

Many R functions have an 'na.rm' flag that you can set to 'T' to remove the NAs, but if you want to deal with this before the function call, then:

to replace each 'NA' w/ 0:

ifelse(is.na(vx), 0, vx)

to remove each 'NA':

vx = vx[!is.na(a)]

to remove entire each row that contains 'NA' from a data frame:

dfx = dfx[complete.cases(dfx),]

All of these functions remove 'NA' or rows with an 'NA' in them.

Sometimes this isn't quite what you want though--making an 'NA'-excised copy of the data frame might be necessary for the next step in the workflow but in subsequent steps you often want those rows back (e.g., to calculate a column-wise statistic for a column that has missing rows caused by a prior call to 'complete cases' yet that column has no 'NA' values in it).

to be as clear as possible about what i'm looking for: python/numpy has a class, 'masked array', with a 'mask' method, which lets you conceal--but not remove--NAs during a function call. Is there an analogous function in R?

Related posts about data

timetable in a jTable

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to create a timetable in a jTable. For the top row it will display from monday to sunday and the left colume will display the time of the day with 2h interval e.g 1st colume (0000 - 0200), 2nd colume (0200 - 0400) .... And if i click a button the timing will change from 2h interval to 1h interval… >>> More
Reading data from an Entity Framework data model through a WCF Data Service

as seen on ASP.net Weblogs - Search for 'ASP.net Weblogs'
This is going to be the fourth post of a series of posts regarding ASP.Net and the Entity Framework and how we can use Entity Framework to access our datastore. You can find the first one here , the second one here and the third one here . I have a post regarding ASP.Net and EntityDataSource. You… >>> More
SQL SERVER – Advanced Data Quality Services with Melissa Data – Azure Data Market

as seen on SQL Authority - Search for 'SQL Authority'
There has been much fanfare over the new SQL Server 2012, and especially around its new companion product Data Quality Services (DQS). Among the many new features is the addition of this integrated knowledge-driven product that enables data stewards everywhere to profile, match, and cleanse data.… >>> More
Modifying a HTML page to fix several "bugs" add a function to next/previous on a option dropdown

as seen on Stack Overflow - Search for 'Stack Overflow'
SOF, I've got a few problems plaguing me at the moment and am wondering if anyone could assist me with them. I'm trying to get Next Class | Previous Class to act as buttons so that when Next Class is clicked it will go to the next item in the dropdown list and for previous it would go to back one… >>> More
Shrinking TCP Window Size to 0 on Cisco ASA

as seen on Server Fault - Search for 'Server Fault'
Having an issue with any large file transfer that crosses our Cisco ASA unit come to an eventual pause. Setup Test1: Server A, FileZilla Client <- 1GBPS - Cisco ASA <- 1 GBPS - Server B, FileZilla Server TCP Window size on large transfers will drop to 0 after around 30 seconds of a large… >>> More

Developer IT