select subsamples many times in order to calculate an average value

Multi tool use


select subsamples many times in order to calculate an average value
First of all sorry for my english I am a beginner in programming.
In general I am trying to retrieve an index of an input file, which contain some values. The first function selects randomly values in a certain range.
The second function takes the output and calculates an Index for each sample size. However if I execute my script I got a value for each sample size.
But I want to pick the subsamples (200, 400, etc.) multiple times, so that I can calculate the average index for each sample size.
In a first step I passed an input file as an argument to my script.
file_name1 = sys.argv[1]
I selected the values of the input file and saved them to a list, as following:
data2 = [7, 7, 7, 5, 3, 1, 2, 8, 6, 5, 1, 1, 9, 7 ......] #sample size 2010
I wrote a function, which picks randomly numbers from the list within a certain range (0, 200, 400, n). But I want, that my script picks many times (200, 400, 600) values, so that I can calculate the average index of each sample size.
Example:
first script execution gives me for 200 values an index of 4.67
second script execution gives me for 200 values an index of 4.32
third script execution gives me for 200 values an index of 4.52
...
I need the average index of each sample size.
Below is the function, that picks randomly 200, 400, 600 values and saves same values as key-values in a dictionary
def subsamples(list_object):
val = np.array(list_object)
n = len(val)
count = 0
while (count < n )
count += 200
if (count > n):
break
subsample = np.random.choice(val, count, replace=False)
unique, counts = np.unique(subsample, return_counts=True)
group_cat = dict(zip(unique, counts))
pois_group.append(group_cat)
return pois_group
Additionally I have a second function that calculates an Index for each sample size.
def index(object):
data = subsamples(object)
#def p(n, N):
#if n is 0:
#return 0
#else:
#return (float(n)/N) * ln(float(n)/N)
for i in data:
N = sum(i.values())
#calculate Index
sh = -sum(p(n,N) for n in i.values() if n is not 0)
index = round(math.exp(sh),2)
print("Index: %f, sample size: %s" % (index, N))
y.append(index)
x.append(N)
return x,y
#call the function
x_1, y_1= index(data1)
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.