Update/Replace Values in Dataframe with Tidyverse Join

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP


Update/Replace Values in Dataframe with Tidyverse Join



What is the most efficient way to update/replace NAs in main dataset with (correct) values in a lookup table? This is such a common operation! Similar questions do not seem to have tidy solutions.



Constraints:
1) Please assume a large number of missing values and bigger lookup table than the example given. So case-wise replacement operations would be impractical (no case_when, if_else, etc.)


case_when


if_else



2)The lookup table does not have all values of main dataframe, only the replacement ones.



Tidyverse solution answer much preferred. Similar questions do not seem to have tidy solutions.


library(tidyverse)

### Main Dataframe ###
df1 <- tibble(
state_abbrev = state.abb[1:10],
state_name = c(state.name[1:5], rep(NA, 3), state.name[9:10]),
value = sample(500:1200, 10, replace=TRUE)
)


#> # A tibble: 10 x 3
#> state_abbrev state_name value
#> <chr> <chr> <int>
#> 1 AL Alabama 551
#> 2 AK Alaska 765
#> 3 AZ Arizona 508
#> 4 AR Arkansas 756
#> 5 CA California 741
#> 6 CO <NA> 1100
#> 7 CT <NA> 719
#> 8 DE <NA> 874
#> 9 FL Florida 749
#> 10 GA Georgia 580


### Lookup Dataframe ###
lookup_df <- tibble(
state_abbrev = state.abb[6:8],
state_name = state.name[6:8]
)

#> # A tibble: 3 x 2
#> state_abbrev state_name
#> <chr> <chr>
#> 1 CO Colorado
#> 2 CT Connecticut
#> 3 DE Delaware



Ideally, a left_join would have a replacement option for missing values. Alas...


left_join(df1, lookup_df)
#> Joining, by = c("state_abbrev", "state_name")
#> # A tibble: 10 x 3
#> state_abbrev state_name value
#> <chr> <chr> <int>
#> 1 AL Alabama 551
#> 2 AK Alaska 765
#> 3 AZ Arizona 508
#> 4 AR Arkansas 756
#> 5 CA California 741
#> 6 CO <NA> 1100
#> 7 CT <NA> 719
#> 8 DE <NA> 874
#> 9 FL Florida 749
#> 10 GA Georgia 580



```



Created on 2018-07-28 by the reprex package (v0.2.0).





You could just create a full look-up table and do a merge (left join)?
– Roman Luštrik
15 hours ago





If the lookup table is more than two columns, it's an update join, which doesn't exist in dplyr (for now). The basic functionality is built defined in coalesce, though, which can be applied programmatically if you don't want to specify lots of column names.
– alistaire
15 hours ago




coalesce





@Ronak: This question marked as duplicate, though link does not go to tidy solution.
– Nettle
13 hours ago





@alistaire: your suggestion, I think? left_join(df1, lookup_df, by = "state_abbrev") %>% mutate(merged.col = coalesce(df.merged$state_name.x, state_name.y) ) %>% select(state_abbrev, state_name= merged.col, value)
– Nettle
11 hours ago


left_join(df1, lookup_df, by = "state_abbrev") %>% mutate(merged.col = coalesce(df.merged$state_name.x, state_name.y) ) %>% select(state_abbrev, state_name= merged.col, value)





@alistaire: get this into dplyr at once!
– Nettle
8 hours ago




2 Answers
2



Picking up Alistaire's and Nettle's suggestions and transforming into a working solution


df1 %>%
left_join(lookup_df, by = "state_abbrev") %>%
mutate(state_name = coalesce(state_name.x, state_name.y)) %>%
select(-state_name.x, -state_name.y)


# A tibble: 10 x 3
state_abbrev value state_name
<chr> <int> <chr>
1 AL 671 Alabama
2 AK 501 Alaska
3 AZ 1030 Arizona
4 AR 694 Arkansas
5 CA 881 California
6 CO 821 Colorado
7 CT 742 Connecticut
8 DE 665 Delaware
9 FL 948 Florida
10 GA 790 Georgia



If the abbreviation column is complete and the lookup table is complete, could you just drop the state_name column and then join?


left_join(df1 %>% select(-state_name), lookup_df, by = 'state_abbrev') %>%
select(state_abbrev, state_name, value)



Another option could be to use match and if_else in a mutate call using the built in state name and abbreviation lists:


match


if_else


mutate


df1 %>%
mutate(state_name = if_else(is.na(state_name), state.name[match(state_abbrev,state.abb)], state_name))



Both give the same output:


# A tibble: 10 x 3
state_abbrev state_name value
<chr> <chr> <int>
1 AL Alabama 525
2 AK Alaska 719
3 AZ Arizona 1186
4 AR Arkansas 1051
5 CA California 888
6 CO Colorado 615
7 CT Connecticut 578
8 DE Delaware 894
9 FL Florida 536
10 GA Georgia 599






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Makefile test if variable is not empty

Will Oldham

Visual Studio Code: How to configure includePath for better IntelliSense results