Faster Way to Generate Rolling Calculations on a list of columns within a groupby object

Faster Way to Generate Rolling Calculations on a list of columns within a groupby object

I created this function to calculate the rolling stats for a list of feats in my df. This function works as intended but takes roughly 20min to run on my df which has about 1 million rows. Is there a faster way to do this in python/pandas ?

def add_rolling_vars(df, feats, amounts, group): #creates rolling stats for a list of feats(columns) over a list of amounts[12,48](window sizes) #grouped by a group like $gvkey or $sector orig_feats = feats.copy() new_feats= for amount in amounts: for name in feats: df[group+'_'+name+f'_{amount}_sma'] = df.groupby(group)[name].rolling(amount,1).mean().values df[group+'_'+name+f'_{amount}_std'] = df.groupby(group)[name].rolling(amount,1).std().values df[group+'_'+name+f'_{amount}_min'] = df.groupby(group)[name].rolling(amount,1).min().values df[group+'_'+name+f'_{amount}_max'] = df.groupby(group)[name].rolling(amount,1).max().values df[group+'_'+name+f'_{amount}_med'] = df.groupby(group)[name].rolling(amount,1).median().values df[group+'_'+name+f'_{amount}_25Q'] = df.groupby(group)[name].rolling(amount,1).quantile(.25).values df[group+'_'+name+f'_{amount}_75Q'] = df.groupby(group)[name].rolling(amount,1).quantile(.75).values

As an example, I also created this function that runs on the same dataset in about 1 min. Obviously, it's different because it doesn't have to iterate through windows of rows, but I still can pass a list of feats instead of name in feats and then add the whole transformed output to my dataframe using a list comprehension naming scheme:

def add_cat_stats(df,feats,group): #feats is a list of continuous feats to compute the monthly stats of df[[group+'_'+name+'_avg' for name in feats]] = df.groupby([group,'Date'])[feats].transform('mean') df[[group+'_'+name+'_std' for name in feats]] = df.groupby([group,'Date'])[feats].transform('std') df[[group+'_'+name+'_min' for name in feats]] = df.groupby([group,'Date'])[feats].transform('min') df[[group+'_'+name+'_max' for name in feats]] = df.groupby([group,'Date'])[feats].transform('max') df[[group+'_'+name+'_med' for name in feats]] = df.groupby([group,'Date'])[feats].transform('median')

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Ciugk

Faster Way to Generate Rolling Calculations on a list of columns within a groupby object

Faster Way to Generate Rolling Calculations on a list of columns within a groupby object

Popular posts from this blog

Arduino Mega cannot recieve any sketches, stk500_recv() programmer is not responding

Visual Studio Code: How to configure includePath for better IntelliSense results

C++ virtual function: Base class function is called instead of derived