Loop to identify data location in a Dataframe, and populate to new Dataframe (R)

Loop to identify data location in a Dataframe, and populate to new Dataframe (R)

So I have a dataframe of user IDs as columns with rows (corresponding to period intervals) with binary variables (where an event happened) like this:

date Id1 Id2 id3 id4 id5 row1 1 0 0 1 0 row2 0 0 1 1 0 row3 0 1 0 0 1 row4 1 1 0 0 0 row5 0 0 1 1 1 ...

I am trying to build a loop that runs through each row of each column which identifies any cell with a 1 and populates a new data frame with the row number of each occurrence, e.g:

occ. Id1 Id2 id3 id4 id5 1 1 3 2 1 3 2 4 4 5 2 5 3 5 4 5

I am pretty lost in how to approach this, if anyone is able to help?

Your second frame isn't really a frame since you have different lengths of each column. Additionally, your sample frames don't match, can you edit your question to make them representative?
– r2evans
12 hours ago

Best to store in a list, e.g. lapply(df[, -1], function(x) which(x == 1))
– Maurits Evers
12 hours ago

list

lapply(df[, -1], function(x) which(x == 1))

@r2evans - updated!
– user10137196
11 hours ago

3 Answers
3

You can using reshape::melt with spread from tidyverse

reshape::melt

spread

tidyverse

reshape::melt(df)%>%filter(.,value==1)%>%group_by(variable)%>%mutate(n=row_number())%>%spread(variable,date) # A tibble: 3 x 7 value n Id1 Id2 id3 id4 id5 * <int> <int> <fctr> <fctr> <fctr> <fctr> <fctr> 1 1 1 row1 row3 row2 row1 row3 2 1 2 row4 row4 row5 row2 row5 3 1 3 NA NA NA row5 NA

Here is an option using tidyverse

tidyverse

library(tidyverse) gather(df1, key, val, Id1:id5) %>% filter(val != 0) %>% mutate(val = readr::parse_number(date)) %>% spread(key, val) %>% mutate_at(vars(Id1:id5), funs(c(.[!is.na(.)], .[is.na(.)]))) %>% rename(occ = date) %>% mutate( occ = readr::parse_number(occ)) # occ Id1 Id2 id3 id4 id5 #1 1 1 3 2 1 3 #2 2 4 4 5 2 5 #3 3 NA NA NA 5 NA #4 4 NA NA NA NA NA #5 5 NA NA NA NA NA

df1 <- structure(list(date = c("row1", "row2", "row3", "row4", "row5" ), Id1 = c(1L, 0L, 0L, 1L, 0L), Id2 = c(0L, 0L, 1L, 1L, 0L), id3 = c(0L, 1L, 0L, 0L, 1L), id4 = c(1L, 1L, 0L, 0L, 1L), id5 = c(0L, 0L, 1L, 0L, 1L)), .Names = c("date", "Id1", "Id2", "id3", "id4", "id5"), class = "data.frame", row.names = c(NA, -5L))

Here is another slightly different tidyverse option:

tidyverse

library(tidyverse) df %>% gather(id, value, Id1:id5) %>% filter(value != 0) %>% mutate(row = str_replace(date, 'row', '')) %>% select(id, row) %>% group_by(id) %>% mutate(occ = row_number()) %>% spread(id, row) # A tibble: 3 x 6 occ Id1 Id2 id3 id4 id5 <int> <chr> <chr> <chr> <chr> <chr> 1 1 1 3 2 1 3 2 2 4 4 5 2 5 3 3 NA NA NA 5 NA #sample data: df <- read.table(text='date Id1 Id2 id3 id4 id5 row1 1 0 0 1 0 row2 0 0 1 1 0 row3 0 1 0 0 1 row4 1 1 0 0 0 row5 0 0 1 1 1', header = TRUE)

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Ciugk