Using nested apply functions instead of nested for loops

Using nested apply functions instead of nested for loops

My objective here was to iterate across each column in a df and then for each column iterate down each row and perform a function. The specific function in this case replaces the NA values with the corresponding value in the final column, but the details of the function required are not relevant to the question here. I got the results I needed using two nested for loops like this:

df

NA

for (j in 1:ncol(df.i)) { for (i in 1:nrow(df.i)) { df.i[i,j] <- ifelse(is.na(df.i[i,j]), df.i[i,39], df.i[i,j]) } }

However, I believe this should be possible using an apply(df.i, 1, function) nested within an apply(df.i, 2, function) But I'm not totally sure that is possible or how to do it. Does anyone know how to achieve the same thing with a nested use of the apply function?

apply(df.i, 1, function)

apply(df.i, 2, function)

apply

ifelse is a vectorized function, so your inner loop can be replaced with: df.i[,j] <- ifelse(is.na(df.i[,j]), df.i[,39], df.i[,j]). This can now be used in your apply function.
– Dave2e
15 mins ago

ifelse

df.i[,j] <- ifelse(is.na(df.i[,j]), df.i[,39], df.i[,j])

Beware when using apply() with data.frames. apply() coerces the data.frame to matrix where all columns are of the same data type. This seems not to be an issue in your particular case but in general it is safer to use lapply().
– Uwe
10 mins ago

apply()

lapply()

1 Answer
1

Here are three ways to do what the inner instruction does.

First, a dataset example.

set.seed(5345) # Make the results reproducible df.i <- matrix(1:400, ncol = 40) is.na(df.i) <- sample(400, 50)

Now, the comment by @Dave2e: just one for loop, vectorize the inner most one.

for

df.i1 <- df.i # Work with a copy for (j in 1:ncol(df.i1)) { df.i1[,j] <- ifelse(is.na(df.i1[, j]), df.i1[, 39], df.i1[, j]) }

Then, fully vectorized, no loops at all.

df.i2 <- ifelse(is.na(df.i), df.i[, 39], df.i)

And your solution, as posted in the question.

for (j in 1:ncol(df.i)) { for (i in 1:nrow(df.i)) { df.i[i,j] <- ifelse(is.na(df.i[i,j]), df.i[i,39], df.i[i,j]) } }

Compare the results.

identical(df.i, df.i1) #[1] TRUE identical(df.i, df.i2) #[1] TRUE

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Ciugk