Update/Replace Values in Dataframe with Tidyverse Join
Update/Replace Values in Dataframe with Tidyverse Join
What is the most efficient way to update/replace NAs in main dataset with (correct) values in a lookup table? This is such a common operation! Similar questions do not seem to have tidy solutions.
Constraints:
1) Please assume a large number of missing values and bigger lookup table than the example given. So case-wise replacement operations would be impractical (no case_when
, if_else
, etc.)
case_when
if_else
2)The lookup table does not have all values of main dataframe, only the replacement ones.
Tidyverse solution answer much preferred. Similar questions do not seem to have tidy solutions.
library(tidyverse)
### Main Dataframe ###
df1 <- tibble(
state_abbrev = state.abb[1:10],
state_name = c(state.name[1:5], rep(NA, 3), state.name[9:10]),
value = sample(500:1200, 10, replace=TRUE)
)
#> # A tibble: 10 x 3
#> state_abbrev state_name value
#> <chr> <chr> <int>
#> 1 AL Alabama 551
#> 2 AK Alaska 765
#> 3 AZ Arizona 508
#> 4 AR Arkansas 756
#> 5 CA California 741
#> 6 CO <NA> 1100
#> 7 CT <NA> 719
#> 8 DE <NA> 874
#> 9 FL Florida 749
#> 10 GA Georgia 580
### Lookup Dataframe ###
lookup_df <- tibble(
state_abbrev = state.abb[6:8],
state_name = state.name[6:8]
)
#> # A tibble: 3 x 2
#> state_abbrev state_name
#> <chr> <chr>
#> 1 CO Colorado
#> 2 CT Connecticut
#> 3 DE Delaware
Ideally, a left_join would have a replacement option for missing values. Alas...
left_join(df1, lookup_df)
#> Joining, by = c("state_abbrev", "state_name")
#> # A tibble: 10 x 3
#> state_abbrev state_name value
#> <chr> <chr> <int>
#> 1 AL Alabama 551
#> 2 AK Alaska 765
#> 3 AZ Arizona 508
#> 4 AR Arkansas 756
#> 5 CA California 741
#> 6 CO <NA> 1100
#> 7 CT <NA> 719
#> 8 DE <NA> 874
#> 9 FL Florida 749
#> 10 GA Georgia 580
```
Created on 2018-07-28 by the reprex package (v0.2.0).
If the lookup table is more than two columns, it's an update join, which doesn't exist in dplyr (for now). The basic functionality is built defined in
coalesce
, though, which can be applied programmatically if you don't want to specify lots of column names.– alistaire
15 hours ago
coalesce
@Ronak: This question marked as duplicate, though link does not go to tidy solution.
– Nettle
13 hours ago
@alistaire: your suggestion, I think?
left_join(df1, lookup_df, by = "state_abbrev") %>% mutate(merged.col = coalesce(df.merged$state_name.x, state_name.y) ) %>% select(state_abbrev, state_name= merged.col, value)
– Nettle
11 hours ago
left_join(df1, lookup_df, by = "state_abbrev") %>% mutate(merged.col = coalesce(df.merged$state_name.x, state_name.y) ) %>% select(state_abbrev, state_name= merged.col, value)
@alistaire: get this into dplyr at once!
– Nettle
8 hours ago
2 Answers
2
Picking up Alistaire's and Nettle's suggestions and transforming into a working solution
df1 %>%
left_join(lookup_df, by = "state_abbrev") %>%
mutate(state_name = coalesce(state_name.x, state_name.y)) %>%
select(-state_name.x, -state_name.y)
# A tibble: 10 x 3
state_abbrev value state_name
<chr> <int> <chr>
1 AL 671 Alabama
2 AK 501 Alaska
3 AZ 1030 Arizona
4 AR 694 Arkansas
5 CA 881 California
6 CO 821 Colorado
7 CT 742 Connecticut
8 DE 665 Delaware
9 FL 948 Florida
10 GA 790 Georgia
If the abbreviation column is complete and the lookup table is complete, could you just drop the state_name column and then join?
left_join(df1 %>% select(-state_name), lookup_df, by = 'state_abbrev') %>%
select(state_abbrev, state_name, value)
Another option could be to use match
and if_else
in a mutate
call using the built in state name and abbreviation lists:
match
if_else
mutate
df1 %>%
mutate(state_name = if_else(is.na(state_name), state.name[match(state_abbrev,state.abb)], state_name))
Both give the same output:
# A tibble: 10 x 3
state_abbrev state_name value
<chr> <chr> <int>
1 AL Alabama 525
2 AK Alaska 719
3 AZ Arizona 1186
4 AR Arkansas 1051
5 CA California 888
6 CO Colorado 615
7 CT Connecticut 578
8 DE Delaware 894
9 FL Florida 536
10 GA Georgia 599
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
You could just create a full look-up table and do a merge (left join)?
– Roman Luštrik
15 hours ago