Differences

This shows you the differences between two versions of the page.

--- admb_synthetic:admb_synthtetic [2010/04/28 13:22] – lennartfr
+++ admb_synthetic:admb_synthtetic [2010/05/04 07:55] (current) – lennartfr
@@ Line 4: / Line 4: @@
 ===== The Model =====
-The ADMB framework is first tested on synthetic data. In the first step, we start with a simple model [1] . We generate synthetic data using parameters in Table 1. Next, we attempt to recover the parameters using the objective function defined in [2]\\
+The ADMB framework is first tested on synthetic data. In the first step, we start with a simple model, which is [[admb_real:admb_real#Simple Model|the same (simple) model]] used for real data:
-^N<sub>t</sub>      ^= ^(N<sub>t-1</sub>-C<sub>t-1</sub>)e<sup>-M</sup>  ^[1] ^
-^f(s,q<sub>t</sub>) ^= ^-(2s<sup>2</sup>)<sup>-1</sup>[ln(I<sub>t</sub>)-ln(q<sub>t</sub>N<sub>t</sub>)]<sup>2</sup>-ln(s) ^[2] ^
+^N<sub>i,t</sub>      ^= ^(N<sub>i,t-1</sub>-C<sub>i,t-1</sub>)e<sup>-M</sup>  ^[1] ^
+^f(N<sub>i,0</sub>,q<sub>k</sub>,s,M) ^= ^ sum<sub>i,t,k</sub> {-(2s<sup>2</sup>)<sup>-1</sup>[ln(I<sub>i,t,k</sub>)-ln(q<sub>k</sub>N<sub>i,t</sub>)]<sup>2</sup>-ln(s)} ^[2] ^
+We generate synthetic data using parameters in Table 1. Next, we attempt to recover the parameters using the objective function defined in [2]
 == Table 1: Data model parameters ==
 ^Parameter       ^ Description        ^ Dimension         ^
-^ N    | Recruitment          | m                   |
+^ N<sub>i,0</sub>    | Recruitment for cohort i  | m-by-1 |
-^ M    | Natural mortality          | Scalar                   |
+^ M    | Natural mortality          | Scalar               |
-^ C    | Catch data          | m-by-maxage                   |
+^ C<sub>i,t</sub>    | Catch of indivduals in cohort i, at age t           | m-by-(maxage+1)           |
-^ I    | Survey indices          | At most m-by-maxage-by-num_surveys                   |
+^ I<sub>i,t,k</sub>    | Survey index for cohort i, at age t, by survey k      | At most m-by-(maxage+1)-by-num_surveys                |
-^ s    | Standard deviation for I          | Scalar                   |
+^ s    | Standard deviation for I         | Scalar                 |
-^ q    | Catchability          | 1-by-num_surveys                   |
+^ q<sub>k</sub>    | Catchability of survey k       | 1-by-num_surveys              |
-^ t    |           |                    |
 ===== Assumptions =====
-===== Implementation and Coding =====
+The number of cohorts m, N<sub>i,0</sub>, the number of surveys, the catchabilities q<sub>k</sub>, s and M are input by the user. The catch is randomly generated (see formula below), and the survey indices are generated to comply with the model's statistical assumption that:
-Data can be generated using the following code:
+ln(I<sub>i,t,k</sub>) %%~%% N(ln(q<sub>k</sub>N<sub>i,t</sub>),s<sup>2</sup>)
-<file txt generate_data.m>
+===== Implementation and Coding =====
-function N = generate_data(N0,q,s,M, filename, cohorts, years, surveys, requested_observations,year_offset,in_cfactor);
-% filegenerator(N0,q,s,M, filename, cohorts, years, surveys, observations, year_offset, cfactor)
-%
-% Generates .dat file for cod problem.
-%
-% VERSION WITH SIMPLE MODEL - no random effects, and just one M!
-%
-% Written by Lennart Frimannslund
-if (exist('year_offset','var')==0 | length(year_offset) == 0),
+Data can be generated using the file generate_data.m, which can be found in the [[mfiles:mfiles|mfile repository.]]
-  year_offset = 1980;
-end;
-% Generate filename if not specified
+===== Example Results =====
-if (exist('filename','var') == 0 | length(filename) == 0),
-  c = clock;
-  filename = sprintf('herring-%s-%02d.%02d.%02d',date,c(4),c(5), ...
-                     floor(c(6)));
-  clear c;
-end;
+:!: ** SUMMARY: ** Below are various experiments conducted in April 2010 to test the performance of the model and the
-% Get number of surveys
+data file generator. The conclusion that can be drawn from them is that unless s is very small, then it is important to have a certain level of catch, otherwise the estimates of the initial population size will have enormous variance.
-if (exist('q','var') ~= 0 & length(q) ~= 0),
-  surveys = length(q);
-end;
-% s - Generate random number between 0 and 1
-if (exist('s','var') == 0 | length(s) == 0),
-  s = rand;
-end;
-% M - Generate random number between 0 and 0.2
-if (exist('M','var') == 0 | length(M) == 0),
-  M = rand(1,1) * 0.2;
-end;
-% If no of cohorts not specified - generate random number between 1 and 6
-if (exist('cohorts','var') == 0 | length(cohorts) == 0),
-  cohorts = ceil((rand*5)) + 1;
-end;
-% years - if not specified, generate random number between 5 and 25
-if (exist('years','var') == 0 | length(years) == 0),
-  years = ceil((rand*20)) + 5;
-end;
-% surveys - if not specified, generate random number between 1 and 6
-if (exist('surveys','var') == 0 | length(surveys) == 0),
-  surveys = ceil((rand*5)) + 1;
-end;
-% if not specified - generate which fraction of maximum possible
-% observations available
-%if (exist('observations','var') == 0 | length(observations) == 0),
-%  observations = ceil(rand * surveys * years * cohorts);
-%end;
-% Define N0 if not supplied
-% exp(N0) between 0 and 3
-%if (exist('N0','var') == 0 | length(N0) == 0),
-%  N0 = 3 * rand(1,1);
-%end;
-% Generate q - between 0.1 and 0.8
-if (exist('q','var') == 0 | length(q) == 0),
-  q = 0.7 * rand(1,surveys) + 0.1;
-end;
-% Input variables now in place - generate data set itself
-%
-% For the most part translated line-by-line from r/s-code.
-n = years;
-m = cohorts;
-%sM = t;
-logs = log(s);
-if (exist('in_cfactor','var')),
-  cfactor = in_cfactor;
-else,
-  cfactor = 0.65;
-end;
-fprintf(1,'Using cfactor = %s. (This number, best left between 0 and 1\n',num2str(cfactor));
-fprintf(1,'determines how much of the stock should be subject to catch. A larger\n');
-fprintf(1,'value means more catch.)\n\n');
-%alpha = 0;
-%Z = (rand(1,cohorts)) - 0.5;
-% TRANSPOSE OF ITS USUAL FORM!
-N = zeros(n,m);
-%N(1,:) = N0;
-% Added "* 100" 01032010
-N(1,:) = N0(:)';
-% SAME HERE, TRANSPOSE!
-CC = zeros(size(N));
-%u = randn(n+m-1,1) * sM;
-%for (i=2:n),
-%
-%  if (i>10),
-%    CC(i-1,:) = rand(1,m) * cfactor/n  .* N(i-1,:);
-%  end;
-%
-%  N(i,:)  = (N(i-1,:) - CC(i-1,:)) * exp(-(M+u(i)));
-%
-%end;
-% NEW loop, replaces the one above
-for (j=1:m),
-  for (i=2:n),
-    if (i>2),
-      CC(i-1,j) = rand(1,1) * cfactor/n * N(i-1,j);
-    end;
-    N(i,j) = (N(i-1,j) - CC(i-1,j)) * exp(-M);
-  end;
-end;
-%fprintf(1,'Population trajectory:\n');
-%N'
-II = cell(surveys,1);
-for (i=1:surveys),
-  % Updated on 26Apr2010 to not include 0 year group
-  II{i} = q(i) * N(2:n,:) .* exp(randn(n-1,m) * exp(logs));
-end;
-surveyraw = [];
-% now create surveyraw matrix, to be put in file
-% Updated on 26Apr2010 to not include 0 year group
-for (l=1:surveys),
-  surveyraw_l = zeros((years-1)*cohorts,4);
-  III = II{l};
-  for (i = 1:cohorts),
-    for (j=1:years-1),
-      surveyraw_l((i-1)*(years-1)+j,:) = [i j l III(j,i)];
-    end;
-  end;
-  surveyraw = [surveyraw; surveyraw_l];
-end;
-% Surveyraw matrix now in place, extract requested  number of rows from
-% it randomly
-max_possible_observations = (years-1)*cohorts*surveys;
-if (requested_observations >= max_possible_observations),
-  observations = max_possible_observations;
-  if (requested_observations > max_possible_observations),
-    fprintf(1,'Requested number of observations %s larger than maximum number of\n',num2str(requested_observations))
-    fprintf(1,'observations possible (%s). Creating %s observations.\n',num2str(max_possible_observations), ...
-            num2str(max_possible_observations));
-    fprintf(1,'(No observations of zero age group in this version.)\n');
-  end;
-  permutation = 1:max_possible_observations;
-else,
-  observations = requested_observations;
-  fprintf(1,'Including %s of %s possible observations, randomly chosen.\n',num2str(requested_observations), ...
-          num2str(max_possible_observations));
-  initial_permutation = randperm(max_possible_observations);
-  permutation = sort(initial_permutation(1:requested_observations));
-end;
-% Write to file
-fid = fopen([filename '.dat'],'w');
-% disp(fopen('all'));
-fprintf(fid,'# Randomly generated herring file\n');
-fprintf(fid,'# True values:\n');
-fprintf(fid,'#\tN0: ');
-fprintf(fid,'%s ',num2str(N0));
-fprintf(fid,'\n');
-fprintf(fid,'#\tq: ');
-fprintf(fid,'%s ',num2str(q));
-fprintf(fid,'\n');
-fprintf(fid,'#\ts: %8.5f \n#\tM: %8.5f\n#\n',s,M);
-fprintf(fid,'# True population trajectory, transpose(N):\n#\n');
-for (i=1:size(N,1)),
-  fprintf(fid,'# ');
-  for (j=1:size(N,2)),
-   fprintf(fid,'%s ',num2str(N(i,j)));
-  end;
-  fprintf(fid,'\n');
-end;
-fprintf(fid,'#\n');
-fprintf(fid,'\n# m (Antall kohorter)\n %3d \n\n',cohorts);
-% Endret 01032010 -- gammel kode hadde et sluttår for mye
-fprintf(fid,'# Startaar for kohorter\n');
-fprintf(fid,'%i ',[0:cohorts-1]+year_offset);
-fprintf(fid,'\n# Sluttaar for kohorter\n');
-fprintf(fid,'%i ',[years-1:years-1+cohorts-1]+year_offset);
-fprintf(fid,'\n# Catches (one cohort per line)\n');
-for (j=1:cohorts),
-  fprintf(fid,'%5.3f ',CC(:,j)');
-  fprintf(fid,'\n');
-end;
-fprintf(fid,'\n# Number of surveys\n %3d \n',surveys);
-fprintf(fid,'# Number of survey indices\n %4d \n',observations);
-fprintf(fid,'# Survey table\n');
-for (i=1:observations),
-  fprintf(fid,'%3d %3d %3d %6.3f\n',surveyraw(permutation(i),:));
-end;
-fclose(fid);
-return;
-</file>
-===== Example Results =====
 ==== Varying s in the data: single experiments ====
@@ Line 268: / Line 44: @@
 </file>
-In other words, the three cohorts should start with initial levels of 200, 400 and 800, respectively, and the catchabilities for the surveys should both be 0.5. We set s to be close to zero, which will make the data close to deterministic in nature. The natural mortality is set to 0.15, and we follow the cohorts until they are age 15, starting in 1980. 3*15*2 specifies the number of survey indices, in other words that we should have all the possible survey indices available. This gives the following dat file:
+In other words, the three cohorts should start with initial levels of 200, 400 and 800, respectively, and the catchabilities for the surveys should both be 0.5. We set s to be close to zero, which will make the data close to deterministic in nature. The natural mortality is set to 0.15, and we follow the cohorts until they are age 15, starting in 1980. 3*15*2 specifies the number of survey indices, in other words that we should have all the possible survey indices available. This gives {{:admb_synthetic:varying_s_rename_to_cod_dat.txt|this .dat file.}}
-<file txt cod.dat>
-# Randomly generated herring file
-# True values:
-#	N0: 200   400   800
-#	q: 0.5        0.5
-#	s:  0.00010
-#	M:  0.15000
-#
-# True population trajectory, transpose(N):
-#
-# 200 400 800
-# 172.14 344.28 688.57
-# 142.73 295.01 591.66
-# 118.02 252.57 498.6
-# 96.874 203.58 424.03
-# 78.606 165.17 359.4
-# 63.656 138.09 289.4
-# 54.084 116.14 249.08
-# 44.391 96.088 209.76
-# 38.007 82.218 175.77
-# 30.675 70.264 143
-# 26.244 57.632 121.86
-# 22.184 47.104 103.69
-# 18.767 38.653 86.048
-# 15.731 33.261 73.114
-#
-# m (Antall kohorter)
-# Startaar for kohorter
-1981 1982
-# Sluttaar for kohorter
-1995 1996
-# Catches (one cohort per line)
-.000 6.315 5.606 5.471 5.547 4.648 0.820 2.509 0.233 2.367 0.184 0.470 0.380 0.490 0.000
-.000 1.531 1.568 16.040 11.677 4.738 3.157 4.497 0.565 0.582 3.305 2.905 2.196 0.009 0.000
-.000 1.152 12.373 5.942 6.473 23.160 0.009 5.380 5.547 9.627 1.420 1.392 3.713 1.101 0.000
-#Number of surveys
-#Number of survey indices
-# Survey table
-  12   2 23.557
-   6   2 69.054
-   9   2 41.106
-   0   2 400.004
-   5   1 39.303
-   7   1 124.524
-   1   2 86.081
-  13   1 43.025
-  12   2 51.844
-  10   1 15.337
-   0   2 200.004
-   2   1 71.366
-  10   2 15.337
-   8   2 104.871
-  10   2 35.126
-  14   2 36.549
-   3   2 249.298
-  11   1 28.815
-   7   1 27.042
-  12   2 11.093
-   5   2 179.693
-  14   2  7.866
-   7   2 124.544
-   9   2 19.005
-   9   1 41.103
-   8   1 104.876
-   3   2 59.011
-   8   2 48.041
-  14   1  7.865
-  11   2 28.816
-   6   1 69.037
-  10   1 35.132
-   8   1 22.198
-   5   1 82.592
-   0   2 100.001
-  12   1 23.549
-  12   1 51.840
-   3   1 126.293
-   5   1 179.659
-   5   2 82.578
-   7   2 27.042
-  11   1 60.935
-   4   1 101.798
-   4   1 48.430
-  14   1 36.555
-   6   2 31.827
-  10   2 71.497
-   8   1 48.051
-   9   1 19.002
-  13   2 19.326
-   1   1 172.127
-  11   2 60.931
-  13   1 19.322
-  13   1  9.384
-   2   1 295.802
-   2   1 147.490
-  14   2 16.631
-   9   2 87.897
-   0   1 199.992
-   2   2 71.371
-   2   2 295.754
-   6   2 144.705
-   1   2 344.264
-   3   1 59.006
-   1   1 86.070
-   6   1 31.832
-   8   2 22.191
-   3   2 126.276
-   0   1 99.993
-  11   2 13.122
-   4   2 212.039
-   7   1 58.060
-   1   1 344.302
-   3   1 249.324
-  14   1 16.633
-  10   1 71.508
-   2   2 147.520
-   6   1 144.691
-  12   1 11.091
-  13   2 43.027
-   1   2 172.127
-   0   1 400.002
-   9   1 87.898
-  11   1 13.121
-   7   2 58.073
-   4   2 48.441
-  13   2  9.383
-   4   2 101.786
-   4   1 212.012
-   5   2 39.302
-</file>
 ...and the following .pin file
@@ Line 505: / Line 145: @@
 with a uniformly distributed random variable. Increasing cfactor will therefore increase the catch.
-FIXME ** NOTE: The above formula should probably be changed to be independent of n.**
+FIXME ** NOTE: The above formula should probably be changed to be independent of n and have less variance. **
 We test for five different values of cfactor (0.3, 0.6, 0.9, 1.2, 1.5). The results are shown in the following plots: