User talk:Egm6936.f09/Kolmogorov scales

= Old discussion page =

= Issues to resolve (reverse chronological order) =

Lecture notes on probability
[[media:Probability.djvu|Probability concepts and notation (djvu)]]

Uncertainty quantification (wiki)

Probability distributions
While working on the | probability density function (PDF) and the | mean value of a random variable, we were interested in different probability distributions. Here, we study some of the | common probability distributions and focus on the formulas, the history, the applications, and the person who discovered it. Some key points and references are:

AI Access: Web site that includes a glossary, definitions, and animations

NIST Dataplot: It is a free, public-domain, multi-platform software system for plotting, statistical analysis, and non-linear modeling.

Sean Meyn

Jeff Miller: Web page that includes Earliest Known Uses of Some of the Words of Mathematics among other various very interesting pages

Richard Tweedie, The Probability Web (Berkeley), Probability distributions

Exponential: When a specific event repeats itself at some amount of time such as cars crossing an intersection, lifetime of electronic components, number of customers at the grocery check-out, amount of money customers spend in a grocery store, an exponential distribution is used. Exponential distribution is widely used in the field o reliability, i.e., lifetime of a product, such as a car battery. Exponential distribution does not exhibit a memory of the event. For example, if the probability of the lifetime of a first battery to be at least t years is P then the probability of a second battery to last at least t years is again P. Some real data related to the faults found in a software system can be found in the book by Cook and Lawless (2007, Appendix D, Table D3, p.368) ,

Cauchy

Gamma

Log-normal

Weibull distribution
Moved to the issues resolved section.

Also as a new article at Weibull distribution.

Cauchy distribution
It is a special case of | student’s t-distribution. Prof. Loc Vu-Quoc from University of Florida explains the Cauchy distribution in detail in his Numerical Methods course (see Lecture 23 p.2 and Lecture 24). The single-slit diffraction application of the Cauchy distribution is analyzed in detail in Lecture 24.

Definition
The PDF of the Cauchy distribution located at $$\displaystyle x_0 $$ with half width at half maximum height $$\displaystyle \gamma $$ is given as (Pope, 2000, p.53, Eq.3.79, Prof. Loc Vu-Quoc, Numerical Methods Spring 2011 Lecture 23, p.2)


 * {| style="width:100%" border="0"

$$  \displaystyle f(x; x_0,\gamma) = \frac{1}{\pi} \frac{\gamma} { \left(x - x_0 \right)^2 + \gamma^2} $$     (1)
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }

A simulation of the Cauchy distribution can be performed using AI Access. An application of the simulation to the single-slit diffraction is shown in Lecture 24.

The blindfolded archer’s score
Mandelbrot and Hudson (2004, p.37) explains Cauchy distribution with a story called the blindfolded archer’s score. An archer stands in front of a target on an infinitely long wall. He starts shooting in random directions.

On Cauchy distribution in finance
Mandelbrot wrote a book on fractals in finance, in which there was long discussion on the Cauchy distribution, which has extensive applications in finance. The Cauchy distribution is an example of an "L-stable" distribution, introduced by Paul Le'vy, a math professor at the Ecole Polytechnique, France.

 NOTE:  In a quotation, to have the source listed in a separate line, and right-justified, see the Template:Quotation of mediawiki.

discussion 2011.07.26: turbulence in pipe flow
Kerem Uguz attended the Bifurcations and Instabilities in Fluid Dynamics (BIFD2011) Fourth International Symposium at Barcelona, Spain (July 18-21, 2011). Two of the invited talks were about turbulence in pipe flow. It was interesting to see that there is still a scientific curiosity to pressure-driven pipe flow 128 years after the original papers by Reynolds (1883a, 1883b). In fact there is a very recent paper published in Science (2011) by Björn Hof who was one of the invited speakers.

Why are people still interested in pressure-driven pipe flow?
First of all, after Reynolds’ papers  in pipe flow, both theoretically and experimentally, there are still uncertainties in determining the critical Reynolds number for which the flow becomes turbulent (See Darbyshire and Mullin (1995) ) and also setting efficient theoretical and experimental ways to understand turbulence. The question to answer is “At what Reynolds number the flow is persistently turbulent (sustained turbulence) or it is ultimately laminar?”. It is important to resolve this problem. It was shown that the transition to turbulence depends on the type and magnitude of the perturbations but a detailed understanding is still missing. Second, understanding turbulence might open doors to new theoretical approaches. Third, understanding turbulence gives the chance to control it. Consequently, a turbulent flow can be relaminarized, which in turn reduces the control costs.

The contents of the talks
The title of Björn Hof’s talk was “The Onset of Turbulence in Pipe Flow”. The key points of Hof were


 * Pipe flow is linearly stable
 * Flow is intermittent
 * Puffs occur and decay in the flow

His approach is to separate relevant processes: decay of turbulent puffs and spreading of turbulence. When the Reynolds number reaches a critical value the decay rate of puffs are of similar magnitude to the branching of the puffs. So at that critical Reynolds number, even though the puffs that occur in the pipe decay with time some new puffs occur in the pipe. So, sustained turbulence is obtained. A similar argument is also present in the paper by Darbyshire and Mullin (1995, check the quotation below). Isolated regions of turbulence (which is represented with puffs) are transient. They experimentally determined that critical Reynolds number as $$ \displaystyle 2040 $$. They showed that this critical Reynolds number is independent of the disturbance introduced into the system. The details are given in their Science paper (2011).

Bruno Eckhardt was the second invited speaker. He is the author of the introduction to the Theme Issue ‘Turbulence transition in pipe flow: 125th anniversary of the publication of Reynolds' published in the Philosophical Transactions of the Royal Society A. 2009 was the 125th anniversary of the publication of Reynolds' papers.

The title of Bruno Eckhardt’s talk was “Turbulence transition in shear flows: what can we learn from pipe flow?”. Eckhardt groups different critical Reynolds numbers:
 * Reynolds number near zero: flow is laminar
 * $$ \displaystyle Re_E = 81 $$: Energy stability of pipe flow. Joseph and Carmi (1969) determined that up to Reynolds number of $$ \displaystyle 81 $$ the energy content of perturbations decay monotonically.
 * $$ \displaystyle Re_{ECS} = 773 $$: Turbulence needs three dimensional structures as a scaffold that supports the spatio-temporal turbulent dynamics. These are called Exact Coherent Structures (ECS), which require prominent downstream vortices.
 * $$ \displaystyle Re_{turb} \approx 1600 $$: Darbyshire and Mullin (1995) determined experimentally a Reynolds number around $$ \displaystyle 1600 $$. They found that this number depends on how you put your disturbance. You need to be very careful near transition region. It is very difficult to reproduce the experiments in that region even with same initial conditions.


 * $$ \displaystyle Re_{space}=2040$$: Turbulence is transient. Spatio-temporal coupling matters.

Questions

 * What does “energy content of perturbations decay monotonically” mean?
 * How to trigger turbulence? This is a paper for which both Bruno Eckhardt and Hof contributed.
 * Localized coherent structures (edge state)

discussion 2011.06.28
[[media:vql.random.process.djvu|Random processes]]





See also Category:Egm6936.f10.

discussion on 2011.03.01
The aim is to solve Exercise 3.16 in (Pope, 2000, p.59), which is the proof of Eq.(7).


 * {| style="width:100%" border="0"

$$  \displaystyle \left(     \frac{u_1}   {\langle u_1^2   \rangle^{1/2}} - \frac{u_2}   {\langle u_2^2   \rangle^{1/2}} \right)^2 = \frac{u_1^2} {\langle u_1^2 \rangle} - \frac{2u_1 u_2} { \langle u_1^2 \rangle^{1/2} \langle u_2^2 \rangle^{1/2} } + \frac{u_2^2} {\langle u_2^2 \rangle} $$.     (1)
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }

As the left hand side of Eq.(1) is greater or equal to zero, so is the right hand side. Therefore,


 * {| style="width:100%" border="0"

$$  \displaystyle
 * style="width:95%" |
 * style="width:95%" |

\frac{2u_1 u_2} { \underbrace{ \langle u_1^2 \rangle^{1/2} }_{{u_1}^\prime } \underbrace{\langle u_2^2 \rangle^{1/2}}_{{u_2}^\prime }

} \le \frac{u_1^2} {({u_1}^\prime)^2} + \frac{u_2^2} {({u_2}^\prime)^2 }

$$.     (2)
 * 
 * }

Taking the mean of Eq.(2) yields


 * {| style="width:100%" border="0"

$$  \displaystyle \underbrace{ \frac{ \langle u_1 u_2 \rangle }  { {u_1}^\prime {u_2}^\prime } } _ {\rm{auto \; correlation}} \le \frac{1}{2} \left [
 * style="width:95%" |
 * style="width:95%" |

\frac{ ({u_1}^\prime)^2 } { ({u_1}^\prime)^2 } + \frac{ ({u_2}^\prime)^2 } { ({u_2}^\prime)^2 } \right ] = 1 $$.     (3)
 * 
 * }

Now, for the plus sign of the term, i.e.,


 * {| style="width:100%" border="0"

$$  \displaystyle \left(     \frac{u_1}   {\langle u_1^2   \rangle^{1/2}} + \frac{u_2}   {\langle u_2^2   \rangle^{1/2}} \right)^2 = \frac{u_1^2} {\langle u_1^2 \rangle} + \frac{2u_1 u_2} { \langle u_1^2 \rangle^{1/2} \langle u_2^2 \rangle^{1/2} } + \frac{u_2^2} {\langle u_2^2 \rangle} $$.     (4)
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }

Similar to the treatment in Eq.(2) yields


 * {| style="width:100%" border="0"

$$  \displaystyle
 * style="width:95%" |
 * style="width:95%" |

\frac{2u_1 u_2} { {u_1}^\prime {u_2}^\prime

} \ge - \frac{u_1^2} {({u_1}^\prime)^2} + \frac{u_2^2} {({u_2}^\prime)^2 }

$$.     (5)
 * 
 * }

Taking the mean of Eq.(5) yields


 * {| style="width:100%" border="0"

$$  \displaystyle \underbrace{ \frac{ \langle u_1 u_2 \rangle }  { {u_1}^\prime {u_2}^\prime } } _ {\rm{auto \; correlation}} \ge -\frac{1}{2} \left [
 * style="width:95%" |
 * style="width:95%" |

\frac{ ({u_1}^\prime)^2 } { ({u_1}^\prime)^2 } + \frac{ ({u_2}^\prime)^2 } { ({u_2}^\prime)^2 } \right ] = -1 $$.     (6)
 * 
 * }

Combining Eq.(3) and Eq.(6) gives

<span id="(7)">
 * {| style="width:100%" border="0"

$$  \displaystyle -1 \le \underbrace{ \frac{ \langle u_1 u_2 \rangle }  { {u_1}^\prime {u_2}^\prime } } _ {\rm{auto \; correlation}} \le 1 $$ (7)
 * style="width:92%; padding:10px; border:2px solid #8888aa" |
 * style="width:92%; padding:10px; border:2px solid #8888aa" |
 * <p style="text-align:right">
 * }

With Eq.(7) ends the derivation of Eq.(3.94) in (Pope, 2000, p.57) and also the solution to the exercise (3.16) on page59.

For the proof of Eq.(7), Pope suggests to use the Cauchy-Schwarz Equation, which is

<span id="(8)">
 * {| style="width:100%" border="0"

$$  \displaystyle = \|x\| \|y\| \underbrace{ | \cos \angle (x,y) |}_{\le 1} $$.     (8)
 * style="width:95%" |
 * style="width:95%" |
 * \langle x,y\rangle|
 * <p style="text-align:right">
 * }

or

<span id="(9)">
 * {| style="width:100%" border="0"

$$  \displaystyle
 * style="width:92%; padding:10px; border:2px solid #8888aa" |
 * style="width:92%; padding:10px; border:2px solid #8888aa" |

\le \|x\| \|y\|
 * \langle x,y\rangle|

$$ (9)
 * <p style="text-align:right">
 * }

As the proof of Eq.(7) does not include the Cauchy-Schwarz inequality, probably Pope misused the name, Cauchy-Schwarz (Pope, 2000, p.59).

= Issues resolved (chronological order) =

Weibull distribution
Also as a new article at Weibull distribution.

Introduction
Here, we present some real experimental data that fit the Weibull distribution, which is a more general form of the exponential distribution. The data can be found in a spreadsheet format, here. The probability density function (pdf) of the Weibull distribution is given as

<span id="(A1)">
 * {| style="width:100%" border="0"

$$  \displaystyle F_X (x) = \Big[  1 - exp \Big( - \frac{x}{p_2} \Big)^{p_1} \Big] $$.     (A1)
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right">
 * }

To fit to the experimental data, a generalization of the Weibull distribution is applied by introducing a third coefficient as

<span id="(A2)">
 * {| style="width:100%" border="0"

$$  \displaystyle F_X (x) = p_3 \Big[ 1 - exp \Big( - \frac{x}{p_2} \Big)^{p_1} \Big] $$.     (A2)
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right">
 * }

When a nonlinear fitting is applied, the parameters are found as

<span id="(A3)">
 * {| style="width:100%" border="0"

$$  \displaystyle p = [1.6407, 740.2473, 933.7323] $$.     (A3)
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right">
 * }

With the above parameters, the distribution looks like



FIG 1: The distribution and the data points

When we take $$ \displaystyle p(3) = 1 $$ as in the usual Weibull distribution, we obtain the pdf as



FIG 2. The Weibull distribution for $$ \displaystyle p(3) = 1 $$

The story about the experimental data
The data was first given in the paper by Dalal and McIntosh (1994) and reproduced in the book by Cook and Lawless (2007, Appendix D, Table D3, p.368). The example comes from a debugging process of a software system with around seven million noncommentary source lines. Cumulative staff days, $$ \displaystyle t $$, the cumulative number of faults detected, $$ \displaystyle N(t) $$, and the cumulative number of lines of code added, $$ \displaystyle C(t) $$, were recorded. Here is a table of selected data (1 data/10 data).

As seen from the table, over $$ \displaystyle 1,300 $$ total staff days were spent for the debugging process, $$ \displaystyle 870 $$ faults were detected, and more than $$ \displaystyle 342,000 $$ lines were added. The reason to fit a model to the data is to guess when to stop testing the code. The selected data and the full set is given in the following figure.



FIG 3 Selected and the full set data

At the beginning, the total number of faults found increases very slowly, then it rises quickly, and finally reaches a plateau with time. As the data represents an event that repeats itself at some amount of time and also from the shape of the curve, one may think of fitting an exponential distribution or a more general distribution, e.g., Weibull distribution that would also fit the early times.

Curve fitting with actual data
The general form of the Weibull distribution with 3 parameters is given in Eq.(A2). To fit this curve to the existing data, one may use a nonlinear least square analysis. First, an initial guess for the parameters is required. Then, the estimation of the coefficients is done with a nonlinear fit. If the initial guess is very far from the actual values, it is not possible to obtain converged values for the parameters. For example, an initial guess of

<span id="(A4)">
 * {| style="width:100%" border="0"

$$  \displaystyle p = [100, 10, 1] $$.     (A4)
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right">
 * }

does not yield converged estimates of the parameters. The output of Matlab is:

Warning: The Jacobian at the solution is ill-conditioned, and some model parameters may not be estimated well (they are not identifiable). Use caution in making predictions.

Parameter estimation
So, it is important to understand the role of each parameter. Here, we present two graphs where $$ \displaystyle p_3 = 1 $$ for both cases.



FIG 4. Effect of parameters when $$ \displaystyle p_3 = 1 $$, a) Study of $$ \displaystyle p_1 $$, b) Study of $$ \displaystyle p_2 $$.

The first figure aims to study the effect of the first parameter, $$ \displaystyle p_1 $$. Note that the exponential distribution is recovered when $$ \displaystyle p_1 = 1 $$. For $$ \displaystyle p_1>1 $$, the curve increases slowly at the beginning, which is the case in our problem (Figs 1-3) and $$ \displaystyle p_1 $$ turned out to be $$ \displaystyle 1.64 $$. The parameter $$ \displaystyle p_2 $$ shows how fast the curve increases after it passes through this slow increase region. Finally, a good initial guess for the parameter $$ \displaystyle p_3 $$ is the maximum of the data $$ \displaystyle N(t) $$, i.e. $$ \displaystyle 870 $$.

An alternative method for parameter estimation
Another way to estimate the parameters $$ \displaystyle p_1 $$, $$ \displaystyle p_2 $$, and $$ \displaystyle p_3 $$ is to take two times the logarithm of the generalized Weibull distribution as

<span id="(A5)">
 * {| style="width:100%" border="0"

$$  \displaystyle ln\bigg(-ln\big(1-\frac{F_X }{p_3} \big) \bigg) = p_1 ln(x) - p_1 ln(p_2) $$.     (A5)
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right">
 * }

As explained in the previous section, a good guess for $$ \displaystyle p_3 $$ would be around the maximum value of $$ \displaystyle N(t) $$, which is $$ \displaystyle 870 $$. A plot of the left hand side versus $$ \displaystyle ln(x) $$ gives p1 as the slope and $$ \displaystyle - p_1 ln(p_2) $$ as the intercept. Using a spreadsheet, if we insert $$ \displaystyle p_3=880 $$ so that the natural logarithm of the last couple of data exist, the slope becomes $$ \displaystyle 1.7177 $$ and the intercept $$ \displaystyle -11.195 $$. Hence, $$ \displaystyle p_1 $$ is $$ \displaystyle 1.728 $$ and $$ \displaystyle p_2 $$ is $$ \displaystyle 678 $$. Recall that the result of the nonlinear fit was $$ \displaystyle p = [1.6407, 740.2473, 933.7323] $$. So an initial guess of $$ \displaystyle p = [1.728, 678, 880] $$ is a good starting point. If there were only two parameters, this analysis would have given the end result without the need for a nonlinear fit.

Detailed Octave Procedure
Copy the first two columns in Exponential Data into soffice (or excel), then save as .csv file, e.g., junk.csv. Use vim to replace blank space with comma, and save the result into the file exponential.cdf.csv: <ul> cat junk.csv | sed 's/ /,/g' >! exponential.cdf.csv </ul>

Load the 1st column into matrix x and the 2nd column into matrix y with the octave commands: <ul> x = csvread ('exponential.cdf.csv','A1:A158') </ul> <ul> y = csvread ('exponential.cdf.csv','b1:b158') </ul>

Define the inline function to do the curve fitting: <ul> F = inline ("p(3) * ( 1 - exp ( - ( x/p(2) ).^p(1) ) ) ","x","p") </ul>

Define the initial parameters in the matrix "pin": <ul> pin = [2 ; 600 ; 870] </ul>

Then give the nonlinear least square curve fitting command: <ul> [f,p,kvg,iter,corp,covp,covr,stdresid,Z,r2]=leasqr(x,y,pin,F); </ul>

Now plot the data (x,y) and the fitted function (x,"f"): <ul> plot (x,y, "+;y;", x, f , "-;f;") </ul>

Save the figure in png format for uploading to google doc: <ul> print -dpng weibull.fit.png </ul>

Detailed Matlab Procedure
First, create the model <ul> modelFun = @(p,x)p(3) * ( 1 - exp ( - ( x/p(2) ).^p(1) ) ) </ul>

Then, assign an initial guess for the parameters p(i) <ul> startingVals = [2 600 870]; </ul>

The command nlinfit gives the coefficients based on the model and the starting values <ul> coefEsts = nlinfit(x, y, modelFun, startingVals) </ul>

We give an initial guess <ul> startingVals = [2 600 870]; </ul>

The parameters are found as <ul> coefEsts = [1.6404 740.2474  933.7324] </ul> compared to the optimal result from Octave: <ul> p (result) = [1.6407, 740.2473, 933.7323] </ul> which is very close to what was found with Octave leasqr command.

= Reading = == The Lady Tasting Tea: How statistics revolutionized science in the 20th century ==