Scientific Method for Wikimedians/Defining Research Question

This is chapter 6 of the course scientific method for wikimedians, it is a milestone, because it is the first chapter in part two "the research plan"

Before, in part one, we talked about knowledge, scientific facts, and the research method. It was a lot of theoretical information to obtain.

Most importantly, the scientific knowledge is created using a specific way called "the scientific method", this method includes very well-defined eight steps.

In the last chapter of part one, we discussed the first step of the scientific method, the research question. But we describe it externally and did not go into details, on how to define a research problem probely.

That was intentionally done like this as we will address the details of research question, and how to define it in this chapter.

To have a brief introduction on part two. this chapter is located and as I mentioned previously, this part is called the research plan.

It's dedicated to cover the first two steps of the scientific method. By the end of part two, you will be able to clearly define your research question, and design a solid research method to solve it.

Part 2 includes three chapters, they are:
 * chapter 6 defining the research question, where we will continue the work we have started in chapter 5. Addressing the first step of the scientific method.


 * chapter 7 the state of the art, this chapter is dedicated to step two of the scientific method. We will dive deep in this chapter on how you can search for reliable sources related to your research problem. You will also learn some tips of how you can well organize your research.


 * chapter 8 designing a research method, it is where you will learn how to design your own research method based on the problem you have selected and well defined.

This article will cover chapter 6, by showing its structure, we will cover six main points:
 * 1) First, we will provide a general context, in a reminder form, focusing on the scientific method, and the first step of it the research questions we covered the two in details in part one of these course
 * 2) Second, we will describe the research question, in fact it's not a question in the common use of the word. Instead, it is an open subject and that's why we use the word problem, in the previous chapter to describe this concept. Normally, you can describe the problem of your research in one page. But this page needs to be structured well and this structure will be addressed in this point starting from the third point until the end of the chapter we will focus on the part of the problem description.
 * 3) Third, and the first part of the paper, is the title, we will provide tips on how you can create it, and what you should avoid,
 * 4) Fourth, we will talk about the motivation of the researcher and the objective of the research itself. These two parts need to be included in the description of the problem.
 * 5) Fifth, we will show how to define your research problem, this includes two main perspectives, the context and the syntax. the latter can be performed in different ways and we will cover three of them, the problem syntax by question, the problem syntax by description, and the problem syntax by hypothesis.
 * 6) Finally, as usualwe finish that chapter with notes and reflections, that provides a general overview on how you can define your research problem.

Reminder: Scientific method & research question
Starting with a brief reminder of the scientific method, and its first step. The previous chapters describe that the scientific method includes eight steps, and it has a cyclic natural, this means that it is a continuous process. We also agreed that there is no strict starting point, and it's highly probably that the starting point is somewhere in between step 8 and step 2. Step 8,is the retest and step 2 is the state of the art.

The previous chapter also discussed the research problem, the hardest step of research, as it should be clear, detailed and well defined. Also the researcher should be aware of the current achievements in the domain of research, which is normally not the case. Thus, in this step, a help is needed from people who have more experience. We agreed also that there are several factors that they might not be related directly to the research problem, but you need to be aware of them when designing the research question, such as budget, available time, as well as the study ability of the problem. In term of safety, or the availability of sources, and tools, starting from here we will pick ageneral topic and start going deeper. Our research will be focusing on it, the goal is to show you how you can practically arrive to specifically define the research problem, the main topic. We are going to pick the context gender gap in English Wikipedia more specifically, we see that articles especially biographies on women are much more less than the articles or context available online. so here is our problem and the objective is to create a research question based on it and try to answer it.

Structure of the research question
Think of the research question as a one page document you have, this page is to be shown when you have been asked about the research its self. Thus your document should be clear, short, and specific. Clear means that it includes definitions that describe the problem, even for people who might be out of the domain. For example, if we take the gender gap, you need to explain what do you mean by the word gender, and what do you mean by the word gap. Here, you need to be creative, because there is no specific answer, but be aware that the lines you draw here will be the frame around your research later. Simple means that you should use words that are easy and understandable, think of people completely out of the domain, they need to read and understand your document, so avoid using sophisticated words, that require experts in the domain to understand the meaning and also avoid using ambiguous words, that can be understood in different ways. Specific means that the problem need to be well defined, so if someone read your paper, he will be able to tell exactly what you are looking for a well-structured paper will have the following sections: title section, motivations and objective section and the problem definition.

Here is a brief description for each section:
 * A title: is a line or two, that is used to describe the entire problem. In a brief text-based way normally, it's located at the top of your document.
 * The motivations of the researcher and the objectives of the research is the second part it is where the researcher expressed why did he choose the subject and what exactly he is looking to achieve, and
 * The third part is the definition of the product, it is where the problem will be addressed and described in details and note that we say the problem not the method not the results and not the answers to the problem can be defined indifferent ways we will study some of theme specially the definition by context and the definition by syntax regarding the context you need to create a time frame for your problem, as well as a space frame this means where exactly the problem is taking place regarding the syntax it is how you willdescribe the problem will you use the style of open questions when you use description and so on.

Title constructing
It is a line or two of texts that describe the problem in a simple way. The title normally located at the top of the questions, it includes a general description of the topic. A well-defined title needs to be: you can see in the previous title, that the semicolon sit in between the major title which is the gender gap and the minor title which is knowledge problem in English Wikipedia. Clearly, that the minor title explain what is the major title, it's also recommended to include one word, or two in the title to refer to the method, you will follow. We have not talked about that yet, but we will do that in chapter 8 of this course. Thus for the time being, I will not go in to details about the research method itself, but I will show you how you can include that. For example the same previous title can be changed as for: please note that the words comparative study are not directly related to the problem, but rather it refers to how the problem is addressed, and this part is completely subjective. Some might address the problem differently, so you need to keep in mind that this part is highly recommended, but not obligatory.
 * 1) first, simple and clear: simple means that it should not have complex words in it, while clear means that it should not have ambiguity. For example, a clear and simple title for the gender gap problem we have chosen, might be "gender gap knowledge problem in English Wikipedia". please note that this title includes almost all major keywords in the topic we have chosen, if you remember from the previous chapter, keywords are essential when you look for scientific Works related to your subject, and vice versa. If you want people who are interested in the subject you are working on, to find your work easily, it's highly recommended that you include as much as possible keywords in the title. All search engines search in the title when they are looking for specific results, you need to do that without making it complex, or ambiguous.
 * 2) Second, a well-defined title needs to be direct and refer to what the research is all about. Actually, some researchers choose attractive research title that give an impression that the work is addressing a very modern and open question. But when you go to the details, you will not find that, you will see the work is not what you expected. I need to be clear here, research is not marketing, and you are not selling a product. You need to be objective, and to tell people what is the problem as it is, without trying to make it look more important than it is. If we go back to the knowledge Gap example, this criterion is not respected, if you mention the same title, on the figure x, and instead of studying this, you go somewhere else in English Wikipedia. For example, you study editor you need to avoid that in all possible ways, simply when you choose your title you need to do your research as close to the title as possible.
 * 3) Third, a well-defined title need to be comprehensive, this means that each shoot covers all the aspects of the research. But you need to be sure that the title stays in between a line or two. Never write a title longer than this. The title is not a paragraph, and should not be. Thus, if you find your self using punctuation such as coma or full stop in the title, this is a bad sign, and you need to stop that immediately, and change the direction. On the other hand, if you use the semicolon in the title, this is a good practice. A good title is divided into two parts, the principle where the central idea of the research is directly mentioned, and a minor part, that explain the major one, the semicolon sit in between the major and the minor part. The title:

Motivations & objectives
The motivation part is subjective, this means that it can change from one author to another. In this section, you simply need to explain why did you choose this topic, is it because of a problem you are facing in your work; or maybe in your community; you need to show that. It's also nice to refer or cite your professors previous works, especially if he or she had proposed the subject for you to do your research onit. Clearly, this is where you provide a general context why you chose a subject, and how you think of writing a semi-paragraph, or aparagraph at most, consists of three to four lines, you can also highlight your background here, showing what qualifiers you have, and why you think you are capable to do this research. Remember, that it's not a market and you are not selling yourself. instead, you are simply describing what you have and what you think you can do. Regarding the objectives, it's completely a different story. Here you need to be objective as much as possible, you need to write a paragraph or two, that must include the following elements:
 * 1) First, the objective of the research: simply, why you are doing this research, what are you looking to prove, or to achieve by the end of your study; you need here to write using simple words, remember that the destinations are not scientists in the domain, nor experts. instead, you are addressing people who might be interested in what you are doing, so briefly, keep it simple and don't use complex terminology.
 * 2) Second, you need to show what are the application of your research: this is a little bit hard, you need to think of people or communities who might benefit from your research. Remember that you are addressing a problem, and that problem is real and it is affecting people somewhere. Thus, in this paragraph, you need to show who are these people; and how they will be affected, if your solution is to be implemented. If we talk about the gender gap again, the affected people out of the whole community of English Wikipedia, when this problem is solved, you will have more balanced encyclopedia, and the views presented in it will be more comprehensive. You can also think of women in English Wikipedia, they are underrepresented, and by solving this problem you are creating equality. Clearly, the benefits are not limited to women, but can easily extended to the whole society, and what do you need to do here, is to show that. be realistic and clear.
 * 3) Third, you need also to show where does your research fit: regarding, the human knowledge; and how it is going to enhance it. This means, you need to focus here showing the value your research will bring, if the subject is already being under study by another research; or if someone had already addressed it for you, need to explain how your research is going to be different. It's important to understand, if yourepeat the same study and you follow the same research method someone had created before, you will arrive to the same results and this is not a research. We have agreed before, that the research must create new knowledge. In this section, feel free also to explain what do you expect from your research. But you need to be realistic, and remember that scientific knowledge grows step by step in a very slow motion. and one study especially used in this case will not change the world entirely; but it will for sure push our understanding of it a little bit.
 * 4) You can also add a timeline to your research question explaining you're expecting advancement and that your research is applicable in real time. Normally, a research is not open in time, and you have a time limit. You need to respect this limit is normally decided even before you have given a subject to do your own research. It's nice to show that you understand that, and you are going to respect that when you are describing your research question.

Problem definition: Context, Syntax (Questions, description & hypotheses)
At the beginning of this section, it is important to define the problem which is addressing. This section continues where the title has started, showing what is the problem, however because title is limited in length, it is not enough to define the problem. So the objective is to make the problem restricted only to what the research will do, in other words, because the title might be explained in different ways, due to the lack of information resulted from its limited length. The problem definition section is used to make the problem clear, and well defined. please keep in mind, that no matter what way you will choose what we are going to discuss; later, you need to read a lot about the problem, the more you read, the more your understanding of the problem will enhanced, and you will be able to describe the problem better. of course you need to use the card system we showed in the previous chapter without being very well organized, you will not achieve too much. In this step, I will not talk about how you read other people's work; because this is what we will discuss in details in the next chapter "the state of the art", so if you are interested only in this subject please move on to the next chapter. the problem can be well defined in two different ways:
 * 1) first by context, this means making the difference clear on what to consider when addressing the problem, and do what not; and you can do that by describing the background of the problem, creating limits in term of time and space for example.
 * 2) The problem also can be defined by syntax, and this means you need to describe the problem itself, and not its surrounding and there is three different ways to do that:
 * first using description, this is the simplest way, it means that you describe what is the problem directly using short sentences and clear words.
 * Second using questions, it means that you ask questions and the research will try to answer them all, and this is a little harder from the first one for two reasons; A. the questions need to be clear and well-defined, if not, they can easily mislead you to somewhere else far away from the problem you are addressing. B. the questions need to be organized in classes or simply classified in groups, so they are linked together logically and form units. please note that the questions you are going to create are not going to be random questions.
 * Third way is to define the problem using hypothesis, and this is the hardest way to define the problem, if you remember the scientific method we had talked about previously, in the last two chapters. if you have started from the step 8 "the retest", this means that you have started from the others work, and it is highly properly that you will have already some hypothesis to test.

In the next lines, we will discuss the two types of defining the problem, and I will give you a comprehensive example that cover it all.

Defining the problem by context
Starting from defining the problem by context. I will show you a different aspects, you can use to not hold the problem by separating it out of its context, please note that you need to compare between what is included in the problem, and what is not. also, you don't have to use all the aspects together but the more you make your problem clear, the better you can address it later.

The first aspect you can use is time, you can pick a specific period of time and limit your problem to it. This is useful when the time range is large, and include different major changes in technology, For example, so you need to narrow the time to make the obtained data comparable, and let me give you two examples to clear the idea:
 * If you are studying how people travel, and how that affects their lives, you cannot simply address the entire human history in the same way. Forexample, in the Middle Ages, it was normal if a person born, lived his/her life and died in the same city or village, as travel was dangerous and hard. you cannot compare that to the last decade, where you can go to the other side of the planet in less than a day.
 * Another example, is related to Wikipedia itself, the information sources, and all that are available prior to 2006, where social media were not yet created, are completely different from what we have on the internet after that date. Clearly if you are studying data on Wikipedia, you need to consider separating that now.

Returning to the main point:
 * 1) we have the knowledge Gap, you can limit the problem to a specific period, for example, the covid period or the second decade of the century, this means the 10 years between 2010 and 2019. I will choose to limit the problem to the social media, and the title can be for example "knowledge problem in English Wikipedia in the social media era". Of course I need to write the reason of choosing this era, and how, I need to choose any of the options I have discussed before. but if I do, I need to justify my choice the same way avoid creating strange time frames, or creating frame that you cannot explain their structure. For example, picking the years 2011, 2015 and 2016. This is a bad sign, and it can be seen as you are trying to manipulate the results. So avoid that in all the possible ways.
 * 2) The second aspect is the space or the place where the problem is happening. For example, you can limit the problem to a specific geographical region or to a specific group of people. For example, when we talked about the gender gap, the problem space has already been Limited in two different ways, when we say we are studying the gender gap in English Wikipedia, Thus, it's limited only to people who speaks English on Wikimedia projects, and specally on one of the Wikimedia projects, and this project is English Wikipedia. as you can see, the scope is limited and people or content from other projects, such as wikidata or Wikimedia commons are not included.
 * 3) The third aspect is the subject you are studying itself. The subject can include subclasses or many groups related to each; nature for example. If you are studying discrimination, this is a very broad subject, it can be based on age, for example, against elderly, or on gender, against women, or it can be based on race, such as racism. Each of these example can be then further limited; in terms of time or space again, to make the problem as narrow as possible.
 * 4) The fourth aspect is not used a lot, but it is also a possibility, and a very important tool. If you look to make a comparative study, you can define the problem according to specific sources; for example, you can study the colonialism from the native perspective. When you choose that, you are creating a frame and limiting yourself, not to look to other sources. of course you might have concerns regarding the natural point of view; ands ubjectivity, what you need to do, is to explain why you have limited your sources like this; and that you are aware of the existence of other sources. But they are simply out of your scope you need, also to be very careful when showing what sources say, and not to mix them with your own opinion or conclusions. we will talk about that in details later in this course. For example, if we talk about the gender gap in English Wikipedia again, you can limit your sources to English sources only, or to non-english sources, or to any specific language; you just need to be sure of two things, that you are able to justify your choice; there is already enough data regarding your choice. for example, you might limit your research to sources in specific language, that is not well presented or not presented at all. In English Wikipedia, thus you are handicapping yourself.

Define the problem by syntax
Moving to defining the problem by syntax, we have said that there are three ways to define the problem by syntax; the questions, the description, and the hypothesis. When defined by syntax, no matter which way you have chosen, you need to satisfy three conditions:
 * 1) first you need to have a measure or essential idea, where all texts are trying to explain. It can be a principal sentence, if you use description way; a major question, If you choose to use questions; simply a central hypothesis, you will try to prove the major; or essential idea needs to be based on others work, and you need clearly to show how you cannot start from zero, and you should not try to do that. It's very important to include one or two differences in your problem definition, this shows that you are aware of general context, it is a good practice that is always recommended the major or the essential idea, needs to be continuing what is being done in the domain before. This does not mean that you cannot go against something; you can prove it's wrong. No, you can always do that. Instead you need to understand that each domain has a current direction; where the work is being done. This direction is not fixed, and it is changing slowly as years pass. The idea here is you cannot change the direction in the research domain in one single work, you can try to make a small changing and that is absolutely fine; because it is how science work. So simply avoid to go against scientific consensus, unless you have a very solid results approved by experts in the domain. It's not impossible, but it is rare to happen.
 * 2) Second, for defining the problem using description and questions, having the gender gap knowledge problem in English Wikipedia as an example. Please note that I will not use the definition by hypothesis now; because it requires a deep study to previous work, and cannot presented in the limited time of this course. However, I will show you later how to create a solid hypothesis. In fact using description way will limit you later to develop out the script research method; while using hypothesis will keep the possibility open for you to develop inductive research methods. so, if you are ready let me start by defining the problem by description. Please note that what I am giving next is just for the sake of example, and it might not be accurate or require you to make them narrow in order to define the problem better, and this is what I hope you will do in the Practical part of this chapter.

Comparision

 * defining by description will look like the following. In this study, the researcher will try to discover:
 * First the relationship between the gender, and the amount of biography in English Wikipedia If exists.
 * Second, the types of gender gaps and its relation to the context in English Wikipedia. For example, the list of biographies and the gender gap by subject.
 * Third, the differences and similarity between different types of gender gap.
 * fourth the properties of the gender gape, specially the gender gap related to woman biography.


 * A definition by questions will look like the researcher will try to answer.
 * whether there a relationship between the gender and the amount of biography in English Wikipedia?
 * Secondly what are the types of gender gaps is there a relationship between the type of gender gap and the type of context in English Wikipedia?
 * Thirdly, what are the differences between different gender gap and what are the similarities?
 * Fourthly does the problem have specific properties or does it shows a clear pattern every time it is detected? especially the case woman compared to man.

I hope the idea of defining the problem by syntax using description, and questions is clear. Keep in mind that the examples in these descriptions the lack of coherent and they are proposed just like this for the sake of example.

Define the problem by hypothesis
We have said before that using this way enable you later to develop inductive research method; while the first two ways limit you to descriptive research methods only. We have also agreed that this is the hardest way to define the problem; because it requires a solid knowledge of the research domain. When you define by hypothesis, you need to search for a relationship between two variables or more; and then try to understand how they affect each other. When you understand that, and you find that the relationship is true; you need to write that down as a claim to be tested later. Keep in mind that there is several types of relationships between variables. I will mention what I think it is the most important, and that it is highly property you will encounter, but keep in mind again that this is not inclusive:
 * 1) The first type of relationship is the causal: when two variables A and B have a causal relationship, this means that A is causing B to exist or causing it not to exist.
 * 2) The second type of relationship is the direct proportion: It means that, when A increase B will increase, accordingly. it means also that when A decrease B also will decrease.
 * 3) The third type of the relationship between two variables is also a proportion it's called the inverse proportion: it happens between two variables A and B affect each other. This means, when A increase B will decrease. please note that what I talked about is not unidirectional, it's also possible, and true. if B has an effect on A.

Before having an example of definition by hypothesis, let me talk about some properties of the hypothesis here:
 * 1) The first hypothesis needs to be a stand-alone claim, that is based on simple or complex scientific facts. it also needs to be present, this is very important when you define by hypothesis. When you define by description, the sentences might be in past or future tense as well as present. but defining by hypothesis, you are limited to the present tense; because the hypothesis is present and needs to be tested in the next step of the research.
 * 2) The hypothesis needs to propose clear options whether something exists or not; Or a relationship between two variables exists or not. Keep in mind that you are not limited to two options only, but the number of options need to be limited to cover all possible results that might be obtained.
 * 3) A hypothesis needs to have a physical implementation that can be tested: we have said before in chapter 4, that the hypothesis is close to physical reality; while theories tend to be abstract. So if you form a hypothesis, that links two or more variables; be sure that these variables can be represented and measured in the real world. thus, your hypothesis can be tested.
 * 4) A good hypothesis can predict future results based on current and past sets of data: We have talked about that before, thus, if you are interested please refer to chapter 4 of this course.
 * 5) Finally, the hypothesis needs also to be simple and easy to understand: Remember that you are addressing normal people; not experts in the domain. So keep your terminology simple as much as possible. After you create your hypothesis in a text form; add a small paragraph explaining it, and provide some information on the variables included in it. Keep in mind that theoretically, you are not limited to a single hypothesis. In fact your research can include more than one hypothesis, but it's highly recommended that you have one hypothesis only.

So you can focus your effort on it, you can also suggest a central hypothesis that all your research is based on it; and then several minor hypothesis that can support the central one.

In the next lines, there will be two different hypotheses I will create, to define the problem:
 * 1) The first hypothesis use three variables with direct proportion, while
 * 2) The second one is based on two variables with a causal relationship;

Again keep in mind, that the hypothesis I am proposing here are just for the sake of example; they might not be solid, and can be easily criticized; or enhanced. When we define a problem by hypothesis; you need to define variables and the relationship between them.
 * First hypothesis with three Variables

For the first problem I will use the total number of articles in English Wikipedia, to be my first variable, and the number of biographies of men and women, to be my second and third variables, respectively. Then I will study the variables, as a function of time, I will check the number of the articles every month between January 2012 and January 2022. My hypothesis is that the ratio between the number of biographies of women and the number of biographies of men is fixed. In other words, if I divided the number of biographies of women, by the number of biographies of men; I will always have the same number, no matter how the size of English Wikipedia is growing up. Please note, this is a hypothesis, a claim to be tested later, even if I find that it is not true; I still need to understand why! and to look for possible patterns, or relationships; the hypothesis can be written as follows:

Of course, I need after that to write down a text to explain in details the variables I used, and how I will get there; as well as, why did I choose this specific claim; based on what previous works or studies.


 * Second hypothesis with two Variables

The second hypothesis is based on a causal relationship; here I think that the gab exists on an age basis. So for certain ages, there might be no gap at all, or the gap might shrinks dramatically; compared to other ages. to do that, I will use two variables only; the number of biographies of women and the number of biographies of men, where I will divide the number into five classes as follows:
 * first biographies for people who are less than 18 years old.
 * Second class for people older than 18 but younger than 40.
 * the third class is for people between 40 and 60.
 * Fourth are people who have more than 60 years.
 * The fifth class is for death people.

Then, I will count how many articles of main biography and woman biographies exist in English Wikipedia; and try to study the gap as a function of age. as you can see here the relationship is causal; that means either the age of people is responsible for the gap or not. Again, this is a claim to be tested, it might be right or wrong. If it is wrong, this is not the end of the world. I need to understand why my proposal was wrong; and I need to propose other possible solutions to other people; so they can try to solve it later in their research. The second hypothesis in text form can be read as follows:

Notes of defining a research question
let us put everything we have learned together to form a general image of how to define the research question:
 * 1) first, we agreed that the research question is not a single question, it is not a question in the common sense of the word. Instead, it is a short description of what you will doing your research; why, and how you will do it.
 * 2) We also found that the research question includes three main parts: the title, the motivations and objectives, and the problem definition. The title is a short sentence, maybe two. but it is not a paragraph, it describes the research using as much keywords as possiblet. The motivations is where you show why did you choose the problem specifically, it can be a paragraph or two to three sentences. In this section, you need to be subjective and to provide personal aspects. on the contrary, the objectives part is where you need to be objective, it is a paragraph in size also and it is where you show what do you expect from your research, in term of results and achievements. and finally the definition of the problem, it is two to three paragraphes lengths, but it is fine if you feel that you need more to make the problem clear, and well-defined. The most important is that you will not exceeded the two pages limit.
 * 3) We have seen also, that there is two major ways to define the research problem; either by context, or by syntax.
 * If you want to define the problem by context, you need to make it different from its surroundings. Thus, here you are not addressing the problem itself; but rather you are separating it from its background, and context. There are different ways to do that, such as using time space, and sources, or using a mix of them. Remember that, you are not forced to use all of the aspects together; the most important is to make the problem well-defined, and as much narrow as possible. You need also to stay always able to justify your choices.
 * the second way to define the problem is to do it by syntax, and there is three ways to do that; by description, by questions, and by hypothesis. If you use the first two methods, you will be limited later to use descriptive research methods only. While the third technique will make you able to develop an inductive research method. We have also talked about the three ways, and gave examples to show how you can use each of them. we have seen that Define the problem by hypothesis is the hardest, because you need to understand the problem very well, and to have a solid knowledge in the domain; as well as mastering the research methodology. the definition by hypothesis is creating a claim of relationship between two variables or more, and these relationship can be causal or proportional. Causal means that a leads to b or vice versa. while, proportional means a increase when we do or vice versa or a decrease when B increase and vice versa.

This is the end of chapter six, defining the research question, forget not to check the Practical part and to answer the single long question there see you all soon