Talk:WikiJournal of Science/Design effect

Plagiarism check
✅ Report from WMF copyvios tool detected only trivially short phrase overlap (e.g. "estimator for the variance of the weighted mean"). T.Shafee(Evo&#65120;Evo)talk 22:33, 8 February 2023 (UTC)

Editor notes
In the Common types of weights section, the term "reliability weights" probably needs a definition (was mentioned in the WP version). T.Shafee(Evo&#65120;Evo)talk 22:30, 8 February 2023 (UTC)


 * Thanks for the comment T.Shafee(Evo﹠Evo).
 * I've added a new reference for the definitions of types of weights (this one), and also decided to remove the term "reliability weights" from the article. This is because I couldn't find a good reference that defined it. For a discussion on this in crossvalided (stats in stackoverflow), see here. For the changes I've made, see here. Talgalili (discuss • contribs) 18:00, 2 September 2023 (UTC)

The preprint, as well as the current Wikipedia article, launches into technical language quite quickly, which isn’t ideal according to Wikipedia’s “Provide an accessible overview” guideline for what is called the lead section. Please attempt to provide a more accessible lead section. --Aoholcombe (discuss • contribs) 08:20, 4 June 2023 (UTC)


 * Thank you Aoholcombe, I agree with your comment. I've made a new abstract, and updated the introduction accordingly (you can see the changes here) Talgalili (discuss • contribs) 08:03, 31 August 2023 (UTC)
 * @Talgalili Second peer reviewer has submitted a second round of comments. Please view the PDF in this section and respond. Thanks.  OhanaUnited  Talk page  05:44, 5 January 2024 (UTC)

Comment 0
Comment 0


 * Hi, I am the editorial board member who had a readthrough. I am pro acceptance, but there were a few smaller mistakes/unclarities which should be fixed or clarified, see list below:

Response 0

Thanks a LOT for your review. I appreciate all the mistakes you caught, and fixed all of them. Thank you so much! Talgalili (discuss • contribs) 20:09, 26 March 2024 (UTC)

Comment 1
Comment 1
 * “When Deff>1, then the data collected is not as accurate as it could have been if people were picked randomly. On the other hand, if Deff<1, then the data is even more accurate than a simple random sample.”
 * If don’t think “accurate” is correct here, since the collected data in itself can’t be more or less accurate. It’s about the data in relation to the population. So I would suggest writing the following (which is more clunky, but also more precise)
 * “As a result, an analyst cannot estimate a with replacement variance for the numerator even if desired. The standard workaround is to compute a variance estimator as if the PSUs were selected with replacement.“
 * Is this correct, or should one of the “with replacement”s be “without replacement”?

Response 1

I agree that the word "accurate" is wrong here, since indeed the data is not more or less accurate, but rather the inference made with it. I would rather not use your proposed alternative since one of the requests in the template is that: "the lay summary (if included) capture the key points of the work while being understandable to a reader with only secondary school background?" So I worked to keep the text level of the abstract to be in a relatively basic level.

Instead, I propose to change the text to be:

“When Deff>1, then inference from the data collected is not as accurate as it could have been if people were picked randomly. On the other hand, if Deff<1, then the inference is even more accurate than it would have been if a simple random sample was used.”

What do you think? Talgalili (discuss • contribs) 17:00, 24 March 2024 (UTC)

Comment 2
Comment 2
 * The two paragraphs starting with “When the sampling design is not known upfront” and “When the sampling design isn’t set in advance” seem to be essentially duplicates. I don’t have a clear favourite, but it seems like the top version might be the more recent one, so maybe that one should stay.

Response 2

Great catch, thanks! I've removed the first version and kept the second one.

Comment 3
Comment 3
 * The author gives a formula for Kish’s design effect as
 * Deff = \frac{n \sum_{i=1}^n w_i^2}{(\sum_{i=1}^n w_i)^2}
 * and then another version where both parts of the fraction are divided by n^2
 * These are of course equivalent, but the two proofs that follow would be shorter and kind of nicer if the first version were used, instead of the second one.

Response 3

Fair point. In the definition, I'll keep both versions (as they are used interchangeably in the literature). However, I've now simplified both proofs so that they'll use the faster-to-get-to version of the formula. Thanks.

Comment 4
Comment 4
 * The proof in section “Assumptions and proofs” has little numbers on top of every equal sign; the numbers 6 to 11 are not needed

Response 4

Fair. I've simplified it further.

Comment 5
Comment 5
 * First paragraph of section on Spencer’s Deff says “Each item has a probability of p_k (k from 1 to N) to be drawn in a single draw”. Should that not be “M” (the population size) instead of “N”? If it is indeed N, then it should be defined what N is.

Response 5

Good catch, thanks. I've moved the notation in the whole section to use n and N (instead of mixing it with m and M).

Comment 6
Comment 6
 * Still in the section about Spencer’s Deff, it says that “Only if the variance of y is much larger than the mean then the right-most term is close to 0.” That is true, but the content of the paranthesis that follows is wrong. It’s 1/relvar(y) that will be approximately 0, not relvar(y).

Response 6

Good catch, thanks! Fixed.

Comment 7
Comment 7
 * I did not read for grammar, but noticed a few plural apostrophes.

Response 7

Thanks, fixed a couple of these.

Comment 8
Comment 8
 * In the section on Unequal selection probabilities x Cluster sampling, the brackets in the denominator should go outside the sum.


 * --Mstefan (discuss • contribs) 17:06, 11 March 2024 (UTC)

Response 8

Good catch! Fixed.

Peer review and editing timeline notes
On 8 Dec, in an email the anonymous explicitly granted a a public domain license for his review above. They also said they had skimmed the revision and thought it looked ok.

On 7 Dec, in an email the reviewer Charles DiSogra explicitly granted a a public domain license for his review above. He said he hoped to look at the revisions in the next week.

On 6 Dec, I sent some writing comments (below) to the author, who responded by making the appropriate changes.

Minor writing comments Looks like the revision accidentally introduced some redundancy into the preprint, which now says “The term "Design effect" was coined by Leslie Kish in” in both the first and third paragraphs of the Introduction. ''“it also matters if the design (e.g.: selection probabilities) are correlated with the outcome of interest”. “whether” is preferred over “if” in formal writing for this type of use of “if”. Also, there is a verb agreement problem, where “design” is singular but the verb (“are”) is plural. Also, I don’t fully understand the sentence, because “selection probabilities” hasn’t been introduced at this point in the article, and unfortunately it looks like it’s never explained properly even later, so I’m hoping you can fit a brief explanation of it in earlier.'' “ a researcher might approximate the Deft with calculating the variance” - I think “by” calculating the variance is better. ''“SRS with replacement (srswr)”. It’s highly unusual to not capitalize acronyms and, while I see that online some people don’t capitalize this acronym, I assume Wikipedia style is to capitalize all acronyms, so I think you should do that - similarly for srswor.'' ''“Also, let it be combined with an estimator that rakes to totals for several demographic variables.” Did you really mean “rakes”? Only because I’m not familiar with that use of the word.'' “some pairs of PSUs implying” - I think you should have a comma before “implying”. ''“This is, in fact, the default choice in the software packages that will handle survey data—e.g., Stata, R survey package, and the SAS survey procedures”. I suggest you shorten this to “This is the default choice in software packages such as Stata, the R survey package, and the SAS survey procedures.”'' I think that all of your headings that start with “Design effect” should probably say “The design effect”. ''“For example, we might decide”. I think you can delete “for example” because the previous sentence already says that, plus the word “might” further implies it.'' Where you wrote “enough of the bias”, can you reword, because unfortunately there is no indication of the criterion for enough, maybe you can change to “sufficient”? ''At this point I started making some changes directly to the preprint, because I noticed you were happy when a reviewer did some of that. However, so far I haven’t gotten any further than the “Design effect depends on sampling design and statistical adjustments” section.''

After the author responded to the above and we both made some more minor edits to the pre-print, a second round of comments was received from Reviewer 2 the first week of January 2024. On 6 January 2024, I notified the author of them and asked him to respond.

In Feb / March 2024, the author responded to the comments of Reviewer 2 and he replied to say he was happy with the author's responses.

In late Mar 2024, an editorial board member, Mstefan, with more mathematical expertise, looked through the preprint and made a number of comments (above) that the author responded to.

Aoholcombe

Author further revision
Today I have finished reviewing the paper again. I fixed a typo and made tens of grammatical improvements. I also added four new summary tables for, hopefully, improved readability.

I'm ready for a final go-over by the relevant editors and (hopefully) an acceptance.

Talgalili (discuss • contribs) 18:02, 2 April 2024 (UTC)

Editor further comments
I made a few more wordsmithing edits after checking some of them with the author.

--Aoholcombe (discuss • contribs) 20:07, 10 April 2024 (UTC)