Loop to identify data location in a Dataframe, and populate to new Dataframe (R)

Multi tool use


Loop to identify data location in a Dataframe, and populate to new Dataframe (R)
So I have a dataframe of user IDs as columns with rows (corresponding to period intervals) with binary variables (where an event happened) like this:
date Id1 Id2 id3 id4 id5
row1 1 0 0 1 0
row2 0 0 1 1 0
row3 0 1 0 0 1
row4 1 1 0 0 0
row5 0 0 1 1 1
...
I am trying to build a loop that runs through each row of each column which identifies any cell with a 1 and populates a new data frame with the row number of each occurrence, e.g:
occ. Id1 Id2 id3 id4 id5
1 1 3 2 1 3
2 4 4 5 2 5
3 5
4
5
I am pretty lost in how to approach this, if anyone is able to help?
Best to store in a
list
, e.g. lapply(df[, -1], function(x) which(x == 1))
– Maurits Evers
12 hours ago
list
lapply(df[, -1], function(x) which(x == 1))
@r2evans - updated!
– user10137196
11 hours ago
3 Answers
3
You can using reshape::melt
with spread
from tidyverse
reshape::melt
spread
tidyverse
reshape::melt(df)%>%filter(.,value==1)%>%group_by(variable)%>%mutate(n=row_number())%>%spread(variable,date)
# A tibble: 3 x 7
value n Id1 Id2 id3 id4 id5
* <int> <int> <fctr> <fctr> <fctr> <fctr> <fctr>
1 1 1 row1 row3 row2 row1 row3
2 1 2 row4 row4 row5 row2 row5
3 1 3 NA NA NA row5 NA
Here is an option using tidyverse
tidyverse
library(tidyverse)
gather(df1, key, val, Id1:id5) %>%
filter(val != 0) %>%
mutate(val = readr::parse_number(date)) %>%
spread(key, val) %>%
mutate_at(vars(Id1:id5), funs(c(.[!is.na(.)], .[is.na(.)]))) %>%
rename(occ = date) %>%
mutate( occ = readr::parse_number(occ))
# occ Id1 Id2 id3 id4 id5
#1 1 1 3 2 1 3
#2 2 4 4 5 2 5
#3 3 NA NA NA 5 NA
#4 4 NA NA NA NA NA
#5 5 NA NA NA NA NA
df1 <- structure(list(date = c("row1", "row2", "row3", "row4", "row5"
), Id1 = c(1L, 0L, 0L, 1L, 0L), Id2 = c(0L, 0L, 1L, 1L, 0L),
id3 = c(0L, 1L, 0L, 0L, 1L), id4 = c(1L, 1L, 0L, 0L, 1L),
id5 = c(0L, 0L, 1L, 0L, 1L)), .Names = c("date", "Id1", "Id2",
"id3", "id4", "id5"), class = "data.frame", row.names = c(NA,
-5L))
Here is another slightly different tidyverse
option:
tidyverse
library(tidyverse)
df %>%
gather(id, value, Id1:id5) %>%
filter(value != 0) %>%
mutate(row = str_replace(date, 'row', '')) %>%
select(id, row) %>%
group_by(id) %>%
mutate(occ = row_number()) %>%
spread(id, row)
# A tibble: 3 x 6
occ Id1 Id2 id3 id4 id5
<int> <chr> <chr> <chr> <chr> <chr>
1 1 1 3 2 1 3
2 2 4 4 5 2 5
3 3 NA NA NA 5 NA
#sample data:
df <- read.table(text='date Id1 Id2 id3 id4 id5
row1 1 0 0 1 0
row2 0 0 1 1 0
row3 0 1 0 0 1
row4 1 1 0 0 0
row5 0 0 1 1 1', header = TRUE)
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Your second frame isn't really a frame since you have different lengths of each column. Additionally, your sample frames don't match, can you edit your question to make them representative?
– r2evans
12 hours ago