Wednesday, November 7, 2012

When truths don't commute. Inconsistent histories.

A short introduction to Consistent Histories after some trivial appetizer

When the uncertainty principle is being presented, people usually – if not always – talk about the position and the momentum or analogous dimensionful quantities. That leads most people to either ignore the principle completely or think that it describes just some technicality about the accuracy of apparatuses.

However, most people don't change their idea what the information is and how it behaves. They believe that there exists some sharp objective information, after all. Nevertheless, these ideas are incompatible with the uncertainty principle. Let me explain why the uncertainty principle applies to the truth, too.

Every proposition we can make about objects in Nature and their properties may be determined by a measurement and mathematically summarized as a Hermitian projection operator \(P\),\[

P = P^\dagger, \quad P^2=P.

\] The first condition is the hermiticity condition; the second one is the "idempotence" condition (Latin word for "the same [as its] powers") that defines the projection operators. The second condition implies that eigenvalues have to obey the same identity, \(p^2=p\), which means that the eigenvalues have to be \(0\) or \(1\). We will identify \(0\) with "truth" and "true" while \(1\) will be identified with "lie" and "false".

In some sense, you could say that \(P^2=P\) is more fundamental and \(p\in\{0,1\}\) is derived. The very claim that there are two truth values, "true" and "false", may be viewed as a derived fact in quantum mechanics, a result of a calculation. This is a toy model of the fact that many seemingly trivial facts result from calculations in quantum mechanics and some of these facts are only approximately true under the everyday circumstances and they are untrue at the fundamental level.

The first condition, hermiticity, implies that eigenstates of \(P\) associated with eigenvalue "false" (\(0\)) are orthogonal to those with the eigenvalue "true" (\(1\)). This is what allows us to say that the probability that the state disobeys a condition if it obeys the condition is 0 percent, and vice versa. They are mutually excluding. The proof of orthogonality is\[

\braket{\text{yes-state}}{\text{no-state}} &= \bra{\text{yes-state}} P^\dagger \ket{\text{no-state}} =\\
&= \bra{\text{yes-state}} P \ket{\text{no-state}} = 0,\\


\] In the first step, I used the freedom to insert \(P^\dagger\) in between the states because when it acts on the eigenstate bra yes-state, it yields \(1\) times this state because this state is an eigenvalue-one eigenstate (I used the Hermitian conjugate of the usual eigenvalue equation). In the second step, I erased the dagger which is OK because of the hermiticity. In the final step, I acted with \(P\) on the no-state ket vector to get zero – because the no-state is an eigenvalue-zero eigenstate. So I got zero. The opposite-order inner product is also zero because it's the complex conjugate number (or you may prove it by a proof that is mirror to the proof above).

Let me just give you examples of projection operators corresponding to different propositions. For example, the statement "\(x\) of a particle belongs to the interval \((a,b)\)" is represented by the projection operator\[

P_{a\lt x\lt b} = \int_a^b \dd x\,\ket x\bra x.

\] It's keeping the position-eigenstate components of any vector \(\ket\psi\) that belong to the interval and erases all others. Now, the proposition that the "electron's spin relatively to the \(z\)-axis is equal to \(\hbar/2\) i.e. up" is represented by the projection operator\[

P_{z,\,\rm up} = \frac 12+ \frac{J_z}{\hbar}.

\] It's a simple linear function that moves the values \(J_z=\mp \hbar/2\) to \(0\) and \(1\), respectively. I hope you are able to write down the projection operator for a similar "up" (or "right") statement relatively to the \(x\)-axis:\[

P_{x,\,\rm up} = \frac 12+ \frac{J_x}{\hbar}.

\] Now, is the electron's spin "up" relatively to the \(z\)-axis? Is it "up" relatively to the \(x\)-axis? Those are perfectly meaningful questions that may be answered by a measurement. Because the truth value is either "false" or "true", we may obtain classical bits of information by a measurement.

However, my point is that the truth values of "\(z\) up" and "\(x\) up" propositions can't be sharply well-defined at the same moment. Indeed, it's because the commutator of the two projection operators is nonzero:\[

[ P_{z,\,\rm up}, P_{x,\,\rm up}] = \frac{iJ_y}{\hbar} \in\{-\frac i2,+\frac i2\}.

\] The commutator of the two projection operators – that just represent the numbers \(0\) and \(1\) when the corresponding propositions about the spin is false or true, respectively – is equal to a multiple of \(J_y\) and the eigenvalues (i.e. possible values) of this multiple are \(\pm i/2\). Because zero isn't among the eigenvalues of the commutators in this case, there exists no vector \(\ket\psi\) for which \[

\nexists \ket\psi:\quad (P_z P_x-P_x P_z)\ket \psi = 0.

\] Yes, I started to omit "up" in the subscripts. This non-existence means that it can't possibly happen that \(P_x,P_z\) would simultaneously have some values from the set \(\{0,1\}\). While we may rigorously prove the logical statement that the only possible values of these (and other) projection operators are zero or one, we may also rigorously prove the statement that a physical system can't have a well-defined sharp answer to both questions, "\(z\) up" and "\(x\) up".

If we have an appropriate apparatus, we can immediately answer the question whether the spin was "up" relatively to a given axis. So the questions associated with the projection operators \(P_x\) and \(P_z\) are totally physical and operationally meaningful. Also, by rotational symmetry, both of them are clearly equally meaningful. Nevertheless, they can't simultaneously have sharp truth values!

These projection operators represent potential truths that don't commute with each other. If you talk about the truth values of \(P_{x}\), your logic is incompatible with the logic of another person who assigns a classical truth value to \(P_z\). It's just not possible for both propositions to be "certainly true". It's not possible for both of them to be "certainly false", either. However, it's also impossible for one of them to be "true" and the other to be "false". ;-) They just can't have classical truth values at the same moment!

Because some people often like to repeat Wheeler's notion that the information is more fundamental in physics – and yes, no, it doesn't really mean much although I sometimes philosophically agree with such a priority – they usually think that such vacuous clichés may protect the classical world for them because the information is surely behaving classically. But it isn't. Quantum mechanics says that the information is counted in quantum bits or qubits (the electron's spin above is mathematically isomorphic to any qubit in quantum mechanics) and the Yes/No answers to most pairs of questions don't commute with one another which means that they can't be simultaneously assigned truth (eigen)values for a given situation.

This was just a trivial introduction. We will use it by realizing that "consistent histories" that would mix "different logics", i.e. statements about \(J_z\) and \(J_x\) at the same moment, are clearly forbidden. We will see formulae that prohibit them, too.

Consistent Histories

Fine. We may finally start to talk about the Consistent Histories interpretation of quantum mechanics. Wikipedia and other sources start by screaming lots of nonsense that it's surely an attempt to debunk the Copenhagen interpretation. Such misconceptions have occurred because virtually all the people talking about "interpretations" are activists and imbeciles who have promoted the fight against quantum mechanics as defined by Bohr and Heisenberg to their life mission. Some of them say such silly things because they don't historically know what Bohr and Heisenberg were actually saying. Most of them are saying such silly things because they refuse to understand basic things about modern physics. Many people belong to both sets.

At any rate, once some of them start to understand what the Consistent Histories interpretation says, they realize that it's not the "weapon of mass destruction" used against the Danish capital that they were dreaming about. In some sense, the Consistent Histories interpretation is a homework exercise:
Apply the Copenhagen interpretation to a collection of arbitrary sequences of measurements at various times and discuss which collections are permissible as interpretations of alternative histories. With the help of decoherence, show that your formulae clarify all issues surrounding the so-called "measurement problem" i.e. that quantum mechanics in its Copenhagen interpretation is a complete theory that produces meaningful predictions for microscopic as well as macroscopic systems.
Of course, they feel utterly disappointed. The Consistent Histories approach is refusing to offer them the "classical mechanisms" and "classical information about the system" and "preferred, 'real' choices of bases and operators" and all other things they were expecting. Instead, the fathers of the Consistent Histories join Bohr and Heisenberg in announcing that quantum physics is different than any theory within the general classical framework. It doesn't assume any objective information about the reality. The probabilities are intrinsically incorporated to the foundations of the theory, just like they have always been. But there is no engine or mechanism that "produces" the probabilities in a way that could be fully described by a classical model. No surprise, the Consistent Histories interpretation was coined by mature physicists such as Murray Gell-Mann, James Hartle, Roland Omnès and Robert B. Griffiths.

I should get into some formulae. Readers are recommended to read e.g. this simple and pioneering 1992 text by Gell-Mann and Hartle. They use the Heisenberg picture and it makes the formulae simple. I agree with them it's more natural to use the Heisenberg picture, especially in such discussions (but also in other contexts), but because the people who tend to misunderstand the foundations of quantum mechanics almost universally prefer the Schrödinger picture, I will translate the Consistent Histories wisdom into the picture of the guy who didn't quite respect complementarity or uncertainty (either \(X\) or \(P\)) and who lived both with his wife and with his mistress. :-)

(Well, if you want to hear some defense, he's had children with three mistresses in total and he justified the relations by saying that he "sexually detested his wife Anny".)

It's not so hard to summarize the definition of "weakly consistent" and "medium consistent" histories. What is a history? In the picture we use, the operators are constant in time and the states evolve according to Schrödinger's equation. I will assume that the Hamiltonian is time-independent and the unitary evolution operators from time \(t_1\) through later time \(t_2\) will be denoted \[

U_{t_2,t_1} = \exp(H\frac{t_2-t_1}{i\hbar}).

\] Let's use the value \(t=0\) for the initial state and the time \(t=T\) with some \(T\gt 0\) for the "end of the history". From the beginning through the end, an initial pure state evolves as\[

\ket{\psi}_{\rm initial} \to \ket{\psi}_{\rm final} = U_{T,0} \ket{\psi}_{\rm initial}.

\] Because the density matrix is a combination of ket-bra products\[

\rho = \sum_i p_i \ket{\psi_i}\bra{\psi_i},

\] we may also immediately write down the evolution for the (initial) density matrix:\[

\rho\to U_{T,0}\rho U^\dagger_{T,0}.

\] The daggered evolution operator at the end appeared because of the bra-vectors in the density matrix: they also evolve.

Now, the operator of a history will be the operator \(U_{T,0}\) with some extra, a priori arbitrary, projection operators inserted between the evolution over different intervals into which \((0,T)\) will be divided. We will search for "collections of coarse-grained histories". In the collection, individual elements i.e. histories will be labeled by the Greek letters such as \(\alpha\). Mathematically, the value of \(\alpha\) will store all the information about the moments at which we inserted projection operators as well as the information which projection operators.\[

\alpha\leftrightarrow \{ n_\alpha, \{t_{\alpha,1},t_{\alpha,2},\dots t_{\alpha,n_{\alpha}}\},
\{i_{\alpha,1},i_{\alpha,2},\dots i_{\alpha,n_{\alpha}}\}

\] where \(i_\alpha\) are subscripts distinguishing all possible projection operators \(P_{i_\alpha}\) that we use in any history in the collection. Here, \(n_\alpha\) is the number of projection operators we are inserting in the \(\alpha\)-th alternative history, the labels \(t_j\) specify the value of time \(t\) where we are inserting the projection operators, and \(i_j\) say which projection operators we insert at the \(j\)-th insertion.

You may see that the history operator \(C_\alpha\) will be a generalization of \(U_{T,0}\) of this sort:\[

C_\alpha &= U_{T,t_{\alpha,n_\alpha}} P_{i_{\alpha,n_\alpha}}\cdot\\
&\cdot U_{t_{\alpha,n_\alpha},t_{\alpha,n_\alpha-1}} P_{i_{\alpha,n_\alpha-1}}\cdot
&\quad \cdots\\
&\cdot U_{t_{\alpha,2},t_{\alpha,1}} P_{i_{\alpha,1}}\cdot\\
&\cdot U_{t_{\alpha,1},0}.

\] I increased the font size because of the nested subscripts and wrote it on many lines. But the operator is exactly what you expect. You cut \(U_{T,0}\) into \(n_\alpha+1\) evolution operators over intervals and insert the appropriate projection operators to the \(n_\alpha\) places. The insertions and evolution operators at the "earlier times" appear on the right side from their friends linked to "later times"; the usual time ordering holds because the operators on the right are the first ones that act on the initial ket state.

It was a messy formula and I won't write it again. (My formula in Schrödinger's picture differs by the usual transformations by evolution operators, times some possible additional evolution operator, from the Heisenberg-picture formulae in the paper by Gell-Mann and Hartle.)

How can you interpret the history operator? Well, it's like the evolution with \(n_\alpha\) "collapses" in between. However, instead of a discontinuous step in the evolution at which Schrödinger's equation ceases to hold (this totally wrong description occurs at many places, including the newest book by Brian Greene), you should interpret the inserted projection operators differently. They're insertions that are needed to calculate the probability that the history \(\alpha\) will be realized.

This interpretation is needed because the separation into the histories from the particular set is surely not unique, and therefore can't be objective. You may always make the splitting to the histories "less finely grained" and the formalism will calculate the probabilities of these "less finely grained" histories, too. It is clearly up to you – within some limitations – how fine and accurate questions you ask about the evolution which is why you surely can't consider the insertions of the projection operators to be "objective collapses".

Now, how do we calculate the probability that the particular history will take place? It's simple if we assume a pure initial state \(\ket\psi\). What happens with the state? Well, it evolves by the evolution operators \(U_{t_{j+1},t_j}\) over the intervals and at the critical points, the pure state is projected by the projection operators. It is kept non-normalized so we pick the multiplicative factor of the complex probability amplitude associated with the projection operator. We do so for every projection operator in the history so that gives us the product of the complex probability amplitudes associated with all measurements. Finally, we must square the absolute value of this product to get the probability out of the total amplitude.

If you think about the action of \(C_\alpha\) on the initial state as well as the usual Born rule to calculate the probabilities of various measurements (plus the product formula for probabilities of composite statements), you will realize that the probability of the history \(\alpha\) which I will write as \(D(\alpha,\alpha)\) is given by\[

D(\alpha,\alpha) = \bra{\psi} C_\alpha^\dagger\cdot C_\alpha \ket\psi.

\] The first, bra-daggered part of the product, is needed because we calculate the probabilities from the squared absolute values of the complex probability amplitudes that we picked from the projection operators. By the cyclic property of the trace, that can be rewritten as\[

D(\alpha,\alpha) = {\rm Tr}\zav{ C_\alpha \ket\psi
\bra{\psi} C_\alpha^\dagger}.

\] We may easily generalize this formula to a mixed state which is just some combination of \(\ket\psi\bra\psi\) objects. By linearity, we get:\[

D(\alpha,\alpha) = {\rm Tr}\zav{ C_\alpha \rho C_\alpha^\dagger}.

\] Here, \(\rho\) is the initial state at \(t=0\). So the Consistent Histories interpretation allows us to pick a collection of histories and calculate the probability of each history in the collection by the formula above. Finally, I must say what it means for the histories to be "consistent".

Well, if we "merge" two nearby (or any two) histories \(\alpha\) and \(\beta\), we get a less fine history called "\(\alpha\) or \(\beta\)". I have assumed that all the histories in the set are mutually exclusive and the total probability is guaranteed to be one. The probability of "\(\alpha\) or \(\beta\)" must be equal to the sum of probabilities, \(D(\alpha,\alpha)+D(\beta,\beta)\), but even this "\(\alpha\) or \(\beta\)" thing is a history so its probability must be given by the same formula for \(D\), one involving the history operator\[

C_{\alpha\text{ or }\beta} = C_\alpha + C_\beta.

\] Because \(D(\gamma,\gamma)\) is bilinear in \(C_\gamma\) and/or its Hermitian conjugate, the addition formula needs the mixed \(\alpha\)-\(\beta\) terms to cancel. The additivity of the probabilities therefore requires\[

{\rm Re}\,D(\alpha,\beta) = 0.

\] The imaginary part doesn't have to be zero because it cancels against its complex conjugate term. The condition above, required for all pairs \(\forall \alpha,\beta\) in the collection of histories, is known as "weak consistency" (originally "weak decoherence") condition.

Now, it's very unnatural to require that just the real part of the off-diagonal entries \(D(\alpha,\beta)\) for the histories' probability vanishes. The reason is that the phase of \(C_\alpha\) is really a matter of conventions and in realistic situations, the phases of \(C_\alpha\) and \(C_\beta\) may even change independently, almost immediately. So instead of the "weak consistency" condition, it is more sensible to demand the "medium consistency" condition\[

\forall \alpha\neq \beta:\quad D(\alpha,\beta) = 0.

\] The matrix of probabilities for the histories, \(D(\alpha,\beta)\), must simply be diagonal and the diagonal entries calculate the probability of each history for us. It's that simple.

Any collection of alternative histories satisfying the medium consistency condition may be "asked" and quantum mechanics gives us the "answers" while all the identities for the probabilities of composite propositions such as "\(\alpha\) or \(\beta\)" will hold as expected. So one will be able to use "classical reasoning" or "common sense" for the answers to all these questions.

It's important to realize that the job for quantum mechanics isn't to "calculate the right questions" or the "right collection of alternative histories" for us. There is no canonical choice. To say the least, there's clearly no preferred "degree of fine or coarse graining" we should adopt. Too coarse graining will be telling us too little; too fine graining will lead us to a conflict with the consistency condition – this conflict really has the origin in the uncertainty principle. You simply can't expect too many things to be specified too sharply. If you tried to fine-grain the histories "absolutely finely", the histories would resemble the classical histories summed in Feynman's approach to quantum mechanics. But they're clearly not consistent. In particular, we know that they can't be mutually excluding because even in the classical limit, many histories in the vicinity of the classical solution contribute to the evolution, as Feynman taught us. This fact also manifests itself by nonzero diagonal entries between histories that are too close to each other (e.g. because the projection operators on states or "cells in the phase space" are clearly not mutually exclusive if the two cells overlap).

The right attitude is somewhere in between – collections of coarse-grained histories for which the consistency condition holds accurately enough, i.e. histories that obey the uncertainty principle etc. sufficiently satisfactorily, but also histories that are fine enough for us to be satisfied with the precision we need. The precise location of the "compromise" clearly cannot be objectively codified. To choose how accurately we want to distinguish histories is clearly a subjective choice. It's up to the observer.

It should be obvious to the reader that there can't exist any "only right degree of coarse-graining". So there can't exist any "only right set of consistent histories". The choice of the right questions, alternative answers, and the degree of accuracy is up to the observer who chooses the logic. It is inevitably subjective and non-unique. The projection operators don't represent any "objective collapse". Instead, the way how they're inserted encodes the question that an observer asked – and I have written down the explicit formula for the answer, namely the probability of a given history, too.

All physically meaningful questions may be summarized as the questions about the probabilities of different alternative histories in a consistent collection, given a known initial state encoded in a density matrix. If you find several collections of consistent histories, good for you. You may perhaps succeed even if there won't be any "unifying finely grained collection" that would allow you to fully answer all the questions from the two collections. The collections may perhaps look at the physical system from a totally different angle. But if they're consistent, they're allowed.

This is clearly a complete and consistent interpretation of quantum mechanics. It tells you exactly what you may ask and what you're not allowed to ask, and for the things you may ask, it tells you how to calculate the answers. They agree with the experiments. All the criticism of this interpretation is clearly pure idiocy and bigotry.

Let me just mention two representative examples of histories that are not consistent.

Start with Schrödinger's cat described by the density matrix \(\rho\). Let the killing device evolve. At the end, try to define two histories that project the cat to some random macroscopic superpositions of the "common sense" dead and alive stated such as\[

0.6\ket{\rm dead}+0.8i\ket{\rm alive},\quad 0.8i\ket{\rm dead}+0.6\ket{\rm alive}.

\] The functional \(D(1,2)\) will be nonzero because the matrix of probability – the final density matrix after decoherence – is off-diagonal in this "uncommon sense" basis.

In principle, you could think that if the probability of "dead" and "alive" will be exactly equal, the matrix \(D\) will be a multiple of the identity matrix – and the identity matrix has the same form in all bases, including bases of unnatural superpositions. In principle, it's right and you have the freedom to rotate the bases arbitrarily.

In practice, you can't rotate them because the evolution of the cat will be producing and affecting lots of environmental degrees of freedom. If you choose a slightly more fine-grained history for the "dead portion" of the evolution than for the "alive portion", or vice versa, the relevant part of \(D(\alpha,\beta)\) will cease to be a multiple of the identity matrix: the entries on the diagonal of \(D\) will be divided to smaller pieces in the "dead cat branch" of the matrix. Because you want your calculation to be independent of the precise level of coarse-graining or the number of degrees of freedom that you treat as the environment, even in the special case when some of the diagonal entries of \(D\) are exactly equal, you won't really be allowed to rotate the basis while preserving the consistency condition "robustly".

Conventional low-energy situations won't really allow you "qualitatively different choices of the collection of consistent histories" that wouldn't be just some "coarse-graining of quantum possibilities around some classical histories". However, the black hole complementarity actually represents a great example of non-uniqueness of the solution to the condition of consistency of histories. The infalling and outgoing observer are using qualitatively different consistent collections of history operators acting on the same (or overlapping) Hilbert space.

Finally, let me also mention that the consistency condition may seemingly allow you to choose "\(z\) up" and "\(x\) up" histories from the beginning of the article in the same collection. The Consistent Histories formalism simplifies dramatically if we only want to show this simple point. The evolution may be completely dropped, the history operators reduce to simple projection operators, and we essentially consider\[

D(x,z) = {\rm Tr} \zav{P_x \rho P_z}.

\] If you write \(\rho\) as a combination of the identity matrix and three Pauli matrices (or, equivalently, multiples of \(P_x,P_y,P_z\)), you will find out that the trace above vanishes as long as \(\rho\) contains no contribution from \(P_y\). So if the expectation value of \(J_y\) in the initial state vanishes, the off-diagonal elements will be zero. (The latter claim may also be easily seen by calculating \(D(x,z)-D(z,x)\) from the commutator \([P_z,P_x]\).)

However, such a collection of histories will fail to obey the logical condition I haven't mentioned yet:\[

\sum_\alpha C_\alpha = {\bf 1}.

\] This should be valid as an operator equation so it's stronger than \(\sum_\alpha D(\alpha,\alpha)=1\). So it's not allowed to consider "alternatives" that aren't really orthogonal to each other.

In practice, the equation \(D(\alpha,\beta)=0\) is never "quite accurate" so we always ask questions about alternative histories that are only approximately consistent although the accuracy quickly becomes sufficient for all practical and most of impractical purposes. That's a manifestation of the fact that classical physics – and classical reasoning in general – never kicks in quite exactly.

Let me mention that aside from the "weak decoherence" and "medium decoherence" conditions above (the medium one clearly implies the weak one), Gell-Mann and Hartle also discussed a "strong decoherence" condition which would imply both of the previous two but which is too strong and would kill almost all choices of "history collections" whenever the initial state is highly mixed. The condition said that one could express all products \(C_\alpha \rho\) as\[

C_\alpha \rho = R_\alpha \rho

\] where \(R_\alpha\) is a projection operator, a "record projection". So one wants to work with the "medium decoherent" histories.

Cosmic GDP drops 97% since peak star

Staunch Chicken Littles such as Alexander Ač love to talk about "peak oil", a hypothetical moment (and, in their opinion, a predictable and important moment) at which the global oil production reaches its global maximum.

But Phys.ORG has discussed an even more far-reaching peak of something, namely "peak star". The popular article is based on this paper:
A large H\(\alpha\) survey at \(z=2.23,\, 1.47,\, 0.84\, \&\, 0.40\): the \(11\,{\rm Gyr}\) evolution of star-forming galaxies from HiZELS (arXiv, published in Monthly Notices of the Royal Astronomical Society)
If we denote the number of produced stars per year as the "cosmic GDP", the years at which the cosmic GDP were maximized belongs to the distant memories. Since that time, the star production slowed down considerably. In fact, the "cosmic GDP" has decreased by a whopping 97% since that moment!

You would surely think that the Universe must be a horrible place to live if the "cosmic GDP" is just 3% of the value in those "good old times". Well, you would surely be wrong. Most of us didn't even know that the star production is so slow relatively to the maximum.

The peak was reached about 3 billion years after the Big Bang and I assure you, the world is a much better place today. During the "peak star", there wasn't even any Sun and the Earth – a planet that biased environmentalists consider more important than billions of other planets in the Universe ;-) – didn't exist, either.

I find it rather likely that 3 billion years after the Big Bang, our visible Universe contained no intelligent life but I am sure that many folks who think that "intelligent life is an inevitable omnipresent trash that immediately erupts almost everywhere" will disagree. We don't really know the answer.

It's also being estimated that despite the infinite length of the future life of our Universe that will increasingly approach the empty de Sitter space (its form of energy, the cosmological constant, already makes up over 70% of the energy density in the Universe, so we're already "pretty close" to the empty de Sitter Universe of the asymptotic future), the total number of stars in the cosmic history book will only increase by 5% since this moment. (Well, more precisely, we are talking about the total mass of the stars rather than the number.)

So if you measure the total "amount of fun in the life" as the integral of the product of the number of stars and (i.e. over) time, \(\int \dd t\,N_{\rm stars}\), then about 95% of the fun in the Universe has already taken place and almost nothing is awaiting us! We could also commit collective suicide and we could at most lose 5% of the fun events in our history.

The only problem with this pessimistic, nihilist conclusion is that we know very well that the "total amount of fun happening annually in the Universe" is proportional neither to the number of stars nor to their total mass. As I have mentioned, most of us believe that the life in the Universe is much more fun today than it was during the "peak star" 11 billion years ago.

The star production was needed for our Solar System to be born but many other events were needed for us to be here and to have some fun, too. The latter events depend on the existence of the stars and they're inevitably delayed by a certain period of time. And the things that decide about the GDP or the fun in life today – when the existence of the Solar System may be taken for granted – may proceed at a much faster rate so that those 7.5 billion years of the Sun's remaining life may be enough for a lot of fun – fun that is "almost" infinitely larger than what we have already seen (think about the speculations about the "technological singularity" which may be inaccurate but they're right about the point that the progress or GDP may continue to grow).

This objection is uncontroversial and kind of amusing in the case of "peak star". However, my point is clearly more far-reaching ideologically. My point is that the "amount of fun happening on the Earth every year" is clearly not proportional to the crude oil production, either. It isn't proportional to the electricity that is consumed, it isn't proportional to the number of SUVs or solar panels or soybeans that are sold, it is proportional to nothing particular that may be associated with the life in a given era.

The fun in the life is a totally independent quantity which depends on many things and the relationship of the fun in the life to other quantities is indirect, indeterministic, and it is constantly changing. Moreover, the relevant quantities today, such as the economists' GDP, are changing (and mostly increasing) by several percent per year while the 97% drop in 11 billion years corresponds to a modest 0.00000003% decrease per year which is negligible.

We have mentioned some of the reasons why "peak oil", much like "peak star" – even if we could determine or predict when it exactly occurs, which we can't – has no implications for the things we really care about in the world.

And that's the memo.

Tuesday, November 6, 2012

RSS AMSU: 2012 seems to be 11th warmest on record

For almost 15 years, the climate alarmist bigots have been dreaming about another warm year that would dethrone 1998 as the warmest year on the satellite temperature record. It's totally clear by now that year 2012 won't become this divine signal of their holy global warming they have been desperately waiting and praying for. In fact, after the first 10 months, it doesn't seem to make it into the top ten.

The UAH AMSU satellite temperature product recently experienced some problems and had to upgrade to a new version. In this ranking texts, I usually refer to RSS AMSU which seems to be an advantage now.

The ranking of the years 1998-2011 according to RSS AMSU is:

{1998, 0.54871},
{2010, 0.47591},
{2005, 0.330003},
{2003, 0.32197},
{2002, 0.314142},
{2007, 0.258975},
{2001, 0.247164},
{2006, 0.229666},
{2009, 0.225882},
{2004, 0.202923},
{1995, 0.158027},
{2011, 0.147427},
{1999, 0.102792},
{1997, 0.102523},
{1987, 0.0982575},
{2000, 0.0915137},
{1991, 0.081},
{1990, 0.0751534},
{1988, 0.0669781},
{1983, 0.066137},
{2008, 0.0502459},
{1996, 0.0463962},
{1994, 0.0285479},
{1981, 0.0207808},
{1980, 0.0146995},
{1979, –0.0941425},
{1993, –0.117159},
{1989, –0.119378},
{1986, –0.138775},
{1982, –0.1734},
{1992, –0.179363},
{1984, –0.223995},
{1985, –0.260586}.

Sorry for the ludicrous excess precision; I didn't want to spend the time to round the numbers. The average temperature anomaly during the first 10 months of 2012 was +0.2018 °C. If this remained the score for the whole year, 2012 would be the 11th warmest year of the RSS AMSU record.

It's very likely that the November+December 2012 average anomaly will deviate from +0.2018 °C by less than 0.2 °C and because these two months have a 5 times smaller impact than the previous 10 months, I expect the anomaly to change by less than 0.04 °C from the current 0.2018 °C. It seems "very likely" to me.

So 2012 is "very unlikely" to drop to the rank 12th or lower. However, if the last two months of the year will be substantially warmer, 2011 can make it up to the 6th or 7th place, not higher.

BSP (CZ): I don't know. Frost destroys the sunshine...

As I mentioned a month ago, the widely expected 2012-2013 El Niño seems to be delayed, to say the least, and the ENSO neutral conditions continue even though in the latest weekly ENSO report, they're very close to the El Niño threshold again. Without a clear El Niño this winter season, it seems very likely to me that even 2013 will refuse to become the warmest year. That would extend the period without a new warmest year to a whopping 15 years.

Obama-Romney: TRF poll

Update: About 1/3 of U.S. TRF readers voted for Obama and 3/5 of them expected Obama to win. Only 1% of TRF readers voted for Obama but expected Romney to win. Among the non-TRF readers, Romney slightly won the popular vote but by the electoral votes, Obama safely defended his presidency. Dow Jones collapsed by more than two percent after the results.
This poll is very simple and unsurprising.

I want the U.S. readers – which make up 1/2 of the TRF visitors – to report whom they voted for and whom they expect to win. Well, I only mean the U.S. readers who are not crying of being tired of Bronko Báma and Mitt Romney yet.

Who did you vote for, whom do you expect to win? free polls 

If you didn't actually physically vote but you have an opinion, please pretend that you did vote. Your vote counts here.

Let me ask Unamerican readers to stay silent because this choice isn't really our business, the business of Unamericans. ;-) I would like to know how politically symmetric or asymmetric the readership of TRF is.

Monday, November 5, 2012

Why subjective quantum mechanics allows objective science

Short answer: Because subjective knowledge (and ignorance) is and has always been compatible with objective science and quantum mechanics simply transmutes all of science to a novel treatment of fundamentally subjective knowledge.

I've had an exchange about the subjective/objective nature of the observation in quantum mechanics with Arnold Neumaier, a mathematician in Vienna.

In my answer, I clarified that what is sometimes called the "collapse of the wave function" is actually a subjective process – it's a change of someone's knowledge because he or she or it or they is/are learning about the value of an observable. This "collapse" is the change of the subjective probabilistic distributions which is also why it may occur "faster than light". The collapse "only occurs in your head".

This basic principle – which I consider absolutely essential for the right understanding of the basics of quantum mechanics – is too counterintuitive for most people and Arnold Neumaier isn't an exception. So he protested:
If this were really true, one still had to explain why we get objective science out of our subjective measurements. Therefore, there may not be more subjectivity than is in the error bars.
But this widespread "argument" is a childish logical fallacy. The objective character of the reality – as assumed by any theory of classical physics and even the "pre-scientific classical reasoning" – is (or would be) a sufficient condition to enable objective science.

However, that doesn't mean that it is a necessary condition. Since the 1920s, physicists have known that it is neither a necessary condition nor the correct way to protect the world against contradictions that could result from a generic conglomerate of "subjective viewpoints". Many processes, especially the macroscopic ones, are predicted by quantum mechanics to proceed in a way that admits a classical description with an objective reality.

But what's at least equally important, many others don't. At the fundamental level, quantum mechanics authoritatively and indisputably states that there exists no objective reality that would explain all subjective viewpoints as its reflections. Arnold Neumaier asks what's the quantum mechanics' explanation for the absence of contradictions despite the non-existence of objective reality; but his question is phrased as a rhetorical one because he isn't really interested in the answer even though the answer is arguably the most important finding of the 20th century science.

Let me discuss a few manifestations of the subjective character of existence implied by the basic postulates of quantum mechanics and explain why it leads to no contradictions.

Wigner's friend: "collapse" isn't objective

Wigner's friend is a guy closed in a lab with Schrödinger's cat. The cat dies if and when some radioactive nucleus decays – its fate is decided by a quantum-style microscopic process that may only be predicted probabilistically. Wigner is outside the lab and may still describe the whole lab, including his friend, in terms of the linear superpositions of all, including those macroscopically distinct, states that follows from Schrödinger's equation.

The question is whether the fate of the cat was decided already when Wigner's friend looked at the cat, or only when Wigner himself looked at the whole lab including the cat and his friend.

Wigner's friend may be certain that he has already made a measurement so the fate of the cat was determined rather early. However, Wigner himself only learns about the fate when he does his own measurements, so the state of the cat is determined much later.

The answer to the question "When the fate of the cat became decided and ceased to be murky?" is therefore subjective. Note that with macroscopic processes that admit a classical description, you could claim that the state of the cat was decided "immediately" and this assumption won't drive you into contradictions. But if you considered smaller, more inherently quantum objects and processes, any assumption that the system already had some particular values of observables is enough for you to be driven to wrong predictions. It's very important that quantum mechanics only describes the state of the physical systems as a "murky probabilistic superposition" of different possibilities.

Let me repeat it differently: You are allowed to assume that the observed quantities have already been facts since the moment when the information carried by them got imprinted to the environment many times and "irreversibly" decohered. If you assume that the observation became a "fact" before it decohered, you will be driven to contradictions with experiments.

However, it's also important to notice that decoherence, while extremely fast as soon as it kicks in, is never perfect. The off-diagonal elements of the density matrix in a particular basis never go to zero exactly. (Well, they are zero in some basis because every Hermitian matrix may be diagonalized but in general, it will be a basis that mixes macroscopically distinct states to a comparable extent.)

Decoherence is one of the "irreversible" processes, much like the growth of entropy in thermodynamics. But the microscopic description of this irreversibility – whether we mean the statistical physics description of the increasing entropy, or the quantum mechanical description of the origin of decoherence – shows that the "impossibility to reverse things" is never absolute. In statistical physics, it's just unlikely that the entropy will go down; it is not impossible. Analogously, in quantum mechanics, the loss of the information about the relative phase of two complex probability amplitudes is a problem that may be "reverted" to some extent.

But once the increase of the entropy becomes macroscopic, the chances of returning the entropy to the original low value become exponentially tiny and negligible. We say that it's impossible. Similarly, the entanglement of the measured system with the environment quickly becomes so complex that we give up all the hopes to "disentangle" this entanglement.

There's no objective moment at which we may say that it has become impossible to reverse the processes. In practice, people will agree whether it's possible or not but in principle, one may imagine a more accurate extraterrestrial engineer who is capable of reversing processes we consider hopelessly irreversible.

Returning to Wigner's friend, there can't be any contradiction between Wigner and Wigner's friend because the question "When the fate of the cat became decided?" must be answered by an operational procedure and everyone understands that there's no "canonical" procedure to do so, so the result of any procedure will reflect the particular idiosyncrasies of the procedure. Wigner's friend may prepare records of the dead/alive cat taken well before Wigner returned to the lab. But Wigner may always disagree and say that these photographs have been in a linear superposition of macroscopically different states up to the moment when he returned to the lab.

There can't be any sharp contradiction because the question is an ill-defined question about philosophy, the kind of question you should avoid according to the "shut up and calculate" dictum. Moreover, no one really cares about "when the fate of the cat got decided". We may mathematically derive that if a nucleus decays, the engine will immediately kill the cat. But we don't know whether the nucleus was "objectively" in the decayed state or a linear superposition and we don't really care. What we really care about is what the fate is. Is the cat alive, or dead?

Concerning the latter question, there won't be any contradictions. The evolution in quantum mechanics \[

\ket{\text{dead cat}} \to \ket{\text{dead cat}}\otimes \ket{\text{sad Wigner's friend}} \otimes \ket{\text{sad Wigner}}

\] and similarly for "alive cat" and "happy men" guarantees correlations – using the most general quantum description, it guarantees entanglement – between the state of the cat, the state of the Wigner's friend's brain, and the state of Wigner's own brain. We may show – by a simple calculation in quantum mechanics establishing the evolution above (not by classical dogmas about the objective reality!) – that if the cat dies, it will make the "same" impact on Wigner and Wigner's friend if both of their brains are measured. If the cat survives, the measurement of the two men's brains will yield compatible results, too.

So the two men will agree whether or not the cat is alive if both of them perform the measurement. But men – and other physical systems – don't have to agree about the question whether a measurement has taken place. A measurement is a process by which you are gaining the information and whether you are gaining the information – or you want to gain it – is a subjective matter. So people may disagree about the moment.

In classical physics, we were allowed to assume that there existed an objective world that someone could in principle know in the full entirety and accuracy. Individual people's knowledge reflected this objective reality and the ignorance (and statistical tools used to describe the imperfect knowledge) were just reflections of the individuals' imperfection that could have been avoided in principle.

Quantum mechanics shows that our world doesn't work in this way, however. The probabilistic character of the values of any observables is a fundamental property of the laws of physics in our Universe. It is inevitable that the value of most observables we can measure is uncertain and "probabilistically mixed" even a femtosecond before these observables are measured. There is no agent, not even God, who would know the state of the observables a moment before they're measured. The very assumption that such a perfect being exists mathematically contradicts the fact that the operators don't commute with each other; physically, such an assumption will either lead to predictions that disagree with the experimentally measured correlations, or with locality as demanded by the special theory of relativity.

So the question "whether some observation has already become a fact and when" doesn't have any objective, canonical answer – even though many people using the same conventions and models may usually agree. But this agreement only reflects their shared taste and social conventions (i.e. the same values of "tiny probabilities" that they're already willing to identify with zero when they discuss irreversibility of various processes). It doesn't reflect any objective reality that would exist in principle.

Purity of the "right state" is a subjective question, too

People often try to imagine that many other questions have "objective answers", too. One important example is the question:
Is a particular physical system described by a pure state, or a mixed state (density matrix)?
Different observers may have different answers to this question, too. And there are many reasons for that.

First of all, the subjective character of the answer directly follows from my previous point, namely the conclusion that "the moment when the measurement took place is a subjective matter". Imagine that Wigner and his friend study any physical system, for example the spin of an electron. Wigner and his friend agree that in the initial state, it is determined by a given density matrix \(\rho\). So it's mixed. But once Wigner's friend measures the spin with respect to the \(z\)-axis, he will find out it's either "up" or "down" and the state of the electron will inevitably become pure – for Wigner's friend. However, Wigner himself will continue to evolve the whole lab via Schrödinger's equation. That means that he will ignore any hypothetical "discontinuous change" associated with the spin measurement and his description will continue to build on a mixed state. The state will be mixed for Wigner but pure for Wigner's friend.

(Even if you appreciate the discontinuity of "purity" of a state, you won't be able to measure how much "mixed" it is because neither the state vector nor the density matrix are observables. Physically, they don't come with apparatuses that could spit out a particular eigenvalue after one measurement – and the state vector isn't even an operator in any sense. The density matrices and state vectors – up to the overall phase – may only be "measured" by many times repeated experiments with the same initial state but this can't be counted as a "measurement" of any property of a single repetition of the experiment.)

There is another reason why different observers will disagree about the purity of the state. This reason is simple: a basic justification of the "density matrix" formalism is to allow for people's individual ignorance. While a pure state describes a "maximally well-known state of a physical system" in quantum mechanics, a density matrix allows you to "add the same kind of ignorance that already existed in classical physics".

Take the electron's spin. The density matrix \(\rho\) is a Hermitian \(2\times 2\) matrix with non-negative eigenvalues \(p, 1-p\). Their sum equals \(1\) because of the "total probability" normalization for the trace. If \(p=0\) or \(p=1\), the density matrix describes a pure state and may be written as \(\ket\psi\bra\psi\) for some pure state \(\ket\psi\). Moreover, for the particular state of the electron's spin, each pure state \(\ket\psi\) may be identified with the "up-spin" state with respect to a particular axis in \(\RR^3\).

If you choose both eigenvalues to be \(p=1/2\), the density matrix is one-half times the identity matrix (which is why it has this form in any basis) and it describes the "maximum ignorance" about the spin. Such a density matrix is "maximally mixed". If you don't know anything about the spin of an electron, you should assume that its state is given by this particular density matrix – a highly mixed state. However, someone else may be aware of some previous measurement of the spin that was conserved etc. So he may actually know that the spin was "up", for example. He will use a pure state to describe the electron's spin.

Again, this disagreement between the people when it comes to the state's purity can lead to no contradictions. The person who uses the mixed state will predict (twice) lower probabilities for the outcomes that depend on the spin's being up. But his predicted probabilities will be nonzero so they won't be incompatible with these events. Moreover, he will understand that the lower predicted probabilities – relatively to the guy who knew the spin was "up" – were just due to his ignorance about the "lucky" initial conditions "up". The mixed nature of the state may be said to be due to some "extra ignorance" and it's not too shocking that ignorance is subjective.

This discussion about the purity is quantum mechanics' complete counterpart of a similar discussion in classical physics. We may either describe a mechanical system by its coordinates and momenta \(x_i(t)\) and \(p_i(t)\), or we may specify a probabilistic distribution on the phase space, \(\rho(x_i,p_j;t)\). The latter may be interpreted as a tool to deal with the imperfect subjective knowledge by some people but it's possible to imagine that some "right" configuration of \(x_i(t)\) and \(p_i(t)\) exists at each moment. In quantum mechanics, density matrices play the role of the probabilistic distributions on the phase space.

However, there's still a fundamental difference between classical physics and quantum mechanics.

In classical physics, it was possible to know the positions and momenta, at least in principle, and if we knew them, everything was unambiguously determined. The state of "maximum knowledge" in classical physics implied unambiguous predictions for everything. In quantum mechanics, the state of a "maximum knowledge" is any pure state. And even if we know that the system is in a pure state, we are only able to make probabilistic predictions for most observables.

Imagine that you use the density matrices for everything and substitute \(\rho=\ket\psi\bra\psi\) if you had a pure state instead. Then the "pure density matrices" will only differ from others by their list of eigenvalues – all of them are zero except for one that equals one. My point is that these "special density matrices" may look qualitatively similar to all other density matrices (especially in a random basis) and we have universal formulae of the type \({\rm Tr}(\rho P)\) to calculate the probabilities of \(P\) (a proposition represented by a projection operator) out of any density matrix \(\rho\), whether it is pure or mixed.

This trace formula therefore unifies the treatment of the "probability-distribution-on-phase-space-like" aspect of the probability in quantum mechanics – which already existed in classical physics and you may think that it's avoidable – with the "unavoidable" probabilistic character of the predictions that follow even from the pure state. This unification really tells you that the "probabilistic nature of the pure states" is exactly as natural and obeying the same mathematical rules as the "probabilistic nature artificially incorporated via the density matrix formalism". But it's unavoidable, too.

Physically meaningful questions have to be associated with a linear operator on the Hilbert space

In this text, I argued that many philosophical questions such as "what may count as an observation", "when did an observation exactly take place", "is the state of the physical system pure or mixed" are questions that depend on the particular observer and his description of the reality, his standards of "how much of irreversible phenomena is really irreversible" and "how small probabilities may be practically identified with zero", among other things.

These questions are not really "practically relevant" for the working of the world. What is "practically relevant" are the questions associated with actual observables and all observables are, according to the basic postulates of quantum mechanics, represented by linear Hermitian (or, in some special cases, unitary or normal) operators. Such observables include positions and momenta and angular momenta and spins of particles, numbers of particles in a given state, intensities of fields, and so on.

For these observables, the evolution equations of quantum mechanics guarantee correlations or "entanglement" that is the ultimate reason why observers will never disagree about "hardcore practical questions" whenever they are known to agree from the experience. All these correlations are analogous to the evolution that I have already mentioned,\[

\ket{\text{alive cat}} \to \ket{\text{alive cat}}\otimes \ket{\text{happy Wigner's friend}} \otimes \ket{\text{happy Wigner}}.

\] This derived fact about the evolution of an initial state of the cat-friend-Wigner physical system allows you to conclusively prove that "if the cat stays alive, both Wigner and his friend will be happy".

On the other hand, there is no rule that the observers must agree about the philosophical questions mentioned three paragraphs above, beneath the title of the section. These are not true physical questions: they're unphysical gibberish you should avoid while you "shut up and calculate". There is no linear operator on the Hilbert space that would have eigenvalues \(0\) for pure states and \(1\) for mixed states, i.e. that would answer the question "Is the state pure?". An "unpure" admixture qualitatively changes the answer to the question whether a density matrix is pure so of course, such an operator would have to be discontinuous on the space of density matrices and discontinuous operators can't be linear. So indeed, quantum mechanics doesn't imply the objective character of the answers to these unphysical questions – they actually do depend on the observer and there's no empirical evidence that there's anything wrong about the fact. The only thing that contradicts the subjective nature of these answers is people's stubbornness, bigotry, and psychological obstacles preventing people from abandoning classical physics.

It's one of the basic principles of quantum mechanics – or "shut up and calculate" quantum mechanics – that all physically meaningful questions about the Universe may be expressed by a linear Hermitian operator, an observable. Quantum mechanics gives you the universal rule to predict the probability of different answers and nothing else can be predicted. If your question isn't talking about the value of any observable, then it is unphysical gibberish. Way too many people – including people considered to be physicists – still haven't learned to think in the quantum way. They keep on trying to "reduce" important questions about our world into a language of philosophers and other cranks, a language that implicitly makes many totally wrong assumptions such as the assumption that there fundamentally exists an objective reality in the classical sense.

Instead of specifying observables (linear operators on the Hilbert space) and calculating their eigenvalues and their probabilities of individual eigenvalues given some knowledge about the state, they keep on asking whether some "cloud here" affects another "cloud there" or whether it "collapses", assuming that the clouds objectively exist in the classical sense. That's not a good starting point to understand the essence of modern physics.

@nd that's the memo.

Sunday, November 4, 2012

Steven Weinberg defends linear collider, science

Last week, Steven Weinberg gave a talk in Arlington, Texas. This is the questions-and-answers part of the talk:

It's 28 minutes. At the end of the regular talk, he mentioned various indirect advantages from building a new linear collider etc. He makes a good joke when they prepare an award for his memorable talk: How did you know in advance that my talk would be memorable? I would be inclined to make exactly the same comment.

A person asked about new physics. Weinberg focused on the identity and detection of the dark matter particle. The same person and another one wanted to ask about string theory – that's what people are really excited about. Weinberg said it's extraordinarily mathematically powerful and it hasn't been possible to compare its characteristic predictions with experiments.

He says that the International Space Station was allowed to go at the same time when the Superconducting Supercollider was cancelled – even though it was 10 times more expensive and has produced no science. He strengthens the claim by saying that the astronauts have never produced any science. Of course, some people in the audience are stunned, others applaud. ;-)

The linear collider would measure the properties of the Higgs and all the things much more accurately. I have some doubts whether this information is worth $10 billion dollars, especially because it is somewhat likely (40%?) that they would exactly agree with the Standard Model, within the ILC precision. And if there were a disagreement, it would still fail to clarify where the disagreement comes from, what are actually the new particles and physical phenomena that are responsible for the deviation. Imagine that the ILC finds out that the diphoton decays of the Higgs boson are indeed 70% more frequent than the Standard Model says. Would we be fully excited and satisfied? Nope. It would only be a justification to build a collider that may actually find the new beasts that are responsible. So why wouldn't we build this collider immediately?

For those reasons, I would tend to think it's better to save a little bit more money and build a new SSC-like collider that exceeds the LHC by its superior brute force, by the energy.

Weinberg said that he was attracted to theoretical physics by reading popular books when he was at the high school. He also mentions that for a long time, he believed he had to know everything before he starts to do research, so he was reading lots of books. Then he learned better. Well, I still think that his previous "mistake" was very valuable because it gave us Weinberg who really did and does know everything about the particle physics and cosmology of his era.

Most of the time, nothing comes out of the research so it may be frustrating. Sometimes, something comes out. If you love it, do it, if you don't, then don't do it...

Weinberg's Arlington talk was a part of a broader linear collider conference (its web).

Saturday, November 3, 2012

Quantum casino: less than zero chance

Guest blog by Johannes Koelman, the Hammock Physicist

Human thought has led to a variety of remarkable and profound insights. Many of these insights are well established and have been embraced by a significant portion of the global population.

The earth being round, the atomistic nature of matter, our unremarkable place in the universe, and us being a product of evolution, all being examples of such insights. Other insights, although unanimously embraced by experts, have a long way to go for a larger population to accept them. More than for any other subject, this holds for quantum physics. No other product of human thought is as profoundly mysterious as quantum theory.

Unfortunately, the subject is not easy to digest and, to make things worse, it is often misrepresented in the pop-science literature. Those eager to understand the quantum are often fed with confusing slogans and misleading analogies.

A year ago I dedicated two blogposts (here and here) to the mysteries of the quantum. Judging from the reactions, these posts must have been useful to at least a few of the readers. The simple thought experiment presented (dealing with Albert's socks) allows the reader to explore the weird world of quantum physics, an experience that likely will challenge the reader's view on reality. A view that - for all of us - is heavily biased by sensory perceptions limited to classical (non-quantum) physics.

Yet, I feel one question remained insufficiently discussed in my earlier posts. And that is the question how physicists deal with quantum reality. Let's see if I can fill this gap by further exploring Albert's quantum socks.


Albert's chest of quantum drawers, although most spooky in its behaviors, appears rather unremarkable and featureless from the outside. Three rows, each consisting of three drawers, make up this piece of furniture. Each night, when Albert is sound asleep, his housekeeper fills this cabinet with fresh socks. The next morning, whenever Albert pulls open a drawer, he is presented either with a single sock or an empty drawer. Unfortunately for Albert, he can not open just any number of drawers. Each morning, his search for fresh socks is limited to the opening of three drawers: any three drawers forming a horizontal row or any three drawers forming a vertical column.

Initially, out of habit, each morning Albert opens a horizontal row of drawers. He does this without giving much thought to which of the three rows to open. After a few days, he starts noticing a pattern. Whichever row he opens, either all three drawers are empty, or two drawers each contain a sock leaving one empty drawer. In other words, a row of drawers always contains a total even number of socks. Many days pass by, and Albert never encounters a row not containing an even number. That makes sense: each row containing an even number of socks tells Albert that each morning the chest must be filled with an even number of socks. An observation comfortably compatible with the notion that socks come in pairs.

One morning, Albert decides to deviate from his fixed ritual, and he opens a vertical column of drawers: the three leftmost drawers. Interestingly, this time he observes an odd number of socks distributed over the three drawers opened. The next day, he opens the same leftmost column of drawers, and again observed an odd number of socks.

Albert gets curious about the other columns. What number of socks will they reveal? The next morning he opens the middle column. Again an odd number of socks. The next day he once more checks the middle column. Once again he observes an odd number of socks.

Albert is a clever guy, and he now realizes he can predict with certainty that the rightmost column of drawers must behave differently from the two columns already inspected. The rightmost drawers for sure must contain a total number of socks that is even. This is obvious as an odd number of socks in the rightmost drawers, added to the number of socks in the middle column (observed to be odd) and the leftmost column (also observed to be odd) would result in an odd total number of socks in the chest. A contradiction, as he had already observed, based on opening horizontal rows, that the total number of socks in the chest is always even.

The next morning he eagerly opens the three rightmost drawers. To his astonishment he observes an odd number of socks.

The next few mornings he randomly checks the various columns. Each attempt reveals an odd number of socks. Albert realizes something must have gone wrong. Maybe his housekeeper coincidently changed from filling the chest with an even number of socks to filling it with an odd number of socks, just at the time he switched over from opening horizontal rows to vertical columns? The next morning Albert again opens a horizontal row. An even number of socks stares him in the face. He starts randomly switching between horizontal rows and vertical columns. Horizontal rows always deliver even numbers, vertical columns odd numbers.

This drives Albert crazy. The results he is obtaining are logically impossible. "At any given morning if I would open three rows", Albert reasons, "I would end up with an even number of socks. However, would I open three columns, I would end up with an odd number of socks. Yet in both cases I would have opened the same nine drawers. How can this be?"


"Nice story" you might react, "but nothing more than a fantasy. Obviously chests of drawers like these logically can't exist in our world".

Well, as a matter of fact, they do. I can put that even more strongly: all evidence points at our world consisting solely of devices like Albert's drawers. It just happens to be that these drawers tend to be very, very small and that we generally observe the aggregate behavior of many drawers without being able to distinguish critical details such as the distinction between even and odd counts.

Many physicists deal with quantum reality day in, day out. How have they come to terms with a physical reality that presents us with devices like Albert's chest of drawers? Physicists don't seem to lose any sleep over the many counter-intuitive notions and apparent paradoxes surrounding quantum theory. Are they oblivious to the utterly strange world view that the quantum represents?

Not at all. The point is that the vast majority of physicists simply have stopped worrying and have embraced a practical approach described by the catch-phrase:
"Shut up and calculate!"
This phrase was coined by Cornell physicist and science communicator David Mermin, in an attempt to mock the widely-accepted Copenhagen interpretation of quantum physics. Mermin was not at ease with the Copenhagen position put forward by Niels Bohr in the famous Bohr-Einstein debates. Ironically, two things happened following the publication of Mermin's phrase. Firstly, Mermin's own ideas on how to interpret quantum physics started to converge towards the Copenhagen interpretation. Secondly, the "shut up and calculate" phrase achieved the status of a popular instrumentalist approach to quantum physics equated with eschewing all interpretation.

Let me try to offer you the "shut up and calculate" approach as a choice for coming to terms with the quantum. To do that, I need to explain how to calculate sock counts for Albert's drawers.


The mathematical machinery behind quantum devices like Albert's chest is compelling and carries a great beauty. It is based on a generalized probabilistics that adds likelihoods not as a linear sum, but rather as a Pythagorean sum. I would love to present this quantum math to you. But alas, the math is too involved to explain in a blog like this.

Fortunately, the complex quantum math built on Pythagorean sums can be mapped on the much simpler classical math based on straightforward linear sums. Physicists apply such mappings all the time when studying what is known as "the classical limit" of quantum systems: the behavior of large quantum systems that can be accurately represented with classical (non-quantum) laws of physics.

Eugene Wigner, one of the quantum pioneers, discovered in the early 1930's that something weird happens when approximating quantum physics with classical probabilities. He discovered that one can describe quantum systems with probabilities that add linearly, but these probabilities are no longer guaranteed to be non-negative.

Paul Dirac, Wigner's brother in law, later wrote a paper that discusses the use of concepts like negative energies and negative probabilities in quantum physics:
"Negative energies and probabilities should not be considered as nonsense. They are well-defined concepts mathematically, like a negative of money."
Years later the idea of negative probabilities received increased attention when Richard Feynman started to popularize the idea:
Trying to think of negative probabilities gave me a cultural shock at first, but when I finally got easy with the concept I wrote myself a note so I wouldn’t forget my thoughts..."
In describing Albert's chest of drawers, how far do we get with negative probabilities? Before you read my negative probabilities description for Albert's chest, I urge you to try it yourself first. It's a straightforward and instructive exercise.

Ok, here is a probabilistic description of Albert's drawers. A total of 10 configurations are relevant:
  1. Nine configurations consist of two drawers in a single row being filled, and all other drawers being empty. These nine configurations each carry a 1/6 probability.
  2. One additional configuration features an empty chest of drawers. This configuration carries a minus 1/2 probability.

It is easy to see that all probabilities add up to unity, as they should. The negative probability for the 'all empty' state is a strange beast, but a beast that doesn't show up in the end result when calculating observable probabilities.

For instance, let's calculate the probabilities for observing the various numbers of socks in the bottom row of Albert's chest. In the above table you can read off that we have seven realizations with an empty bottom row, six with +1/6 probability, and one with –1/2 probability. The total probability for an empty bottom row is therefore +1/2. Furthermore, we have three realizations with a bottom row filled with two socks, each with a probability of +1/6. The total probability for finding two socks in the bottom row is therefore +1/2. So we have two equal likelihoods of finding zero or two socks, each with a positive (+1/2) value, and we are guaranteed to retrieve an even number of socks from the bottom row.

Repeating the same calculation for a column leads to the result that a column renders one sock with unit probability, hence a guarantee for an outcome of an odd number of socks.

What happened in both these cases is that the negative probability gets cancelled by positive probabilities describing the same outcome for the row or column under consideration. As a result, rows lead to even numbers and columns to odd numbers, and never does a negative probability turn up its ugly head.

This obviously doesn't hold when determining the number of socks in the entire chest of drawers. A negative probability does show up prominently, and it does not get cancelled as there are no positive probabilities describing the same configuration. However, we don't need to worry about this, as the configuration with negative probability does not correspond to a possible observation. Only three drawers (one row or one column) can be opened, not the whole chest.

What this means is that one can extract only a limited amount of information out of Albert's chest. Getting more information out of it (opening more drawers) is impossible as it would lead to negative probabilities to become evident. This is a manifestation of Heisenberg's uncertainty principle for Albert's chest.

Heisenberg uncertainty, destructive interference, spooky action at a distance, violation of Bell's inequalities, all of quantum physics can be mapped on negative probability models. So why don't physicists use such negative probability models? The answer is simple: negative probability models are rather clumsy representations of what is better described using "Pythagorean probabilities".

But as a model to get some understanding of the fundamentals of quantum physics, negative probability models do carry a value largely unexplored by pop-science writers.

All of this places some of Einstein's quotes in a different light.

God does play dice. And malicious he is. Or how would you characterize anyone who throws dice loaded to render negative chances, and who manages to keep this so well-hidden from us?

Disclaimer by LM: The views and opinions expressed in this article are those of the author but they do reflect the position of TRF. By the way, if you want the relevant operators etc., the closest discussion on TRF is this treatment of the GHZM experiment. For those who want the realization of this particular "drawer", it's caller the Mermin-Peres magic square (thanks, Johannes) and it acts on just two qubits (two spins). The \(3\times 3\) operators are\[

\,1\otimes \sigma_z& \sigma_z\otimes 1& \sigma_z\otimes\sigma_z\\
\sigma_x\otimes 1\,& \,1\otimes \sigma_x& \sigma_x\otimes \sigma_x\\
-\sigma_x\otimes \sigma_z&- \sigma_z\otimes \sigma_x &\sigma_y\otimes \sigma_y\\

\] Each pair of operators in the same row commutes; each pair in the same column commutes. Check it. The third entry in each row is a product of the previous two so the "even number of socks in each row" is guaranteed. I added minus signs on the third row so that the product in each column is \(-1\), too. Note that for the individual drawer measurements, \((+1)=\)"yes sock" and \((-1)=\)"no sock". So the correlations are guaranteed regardless of the state vector.

It's left as an exercise to the reader to determine the state vector for which the probabilities are as indicated by Johannes; and to find out how the Wigner distribution which has 9 times +1/6 and 1 times –1/2 should be constructed.

This magic square has been used as a tool in quantum pseudo-telepathy and exaggerations of this sexy term are guaranteed. ;-)

Friday, November 2, 2012

Supersymmetric Lagrangians

Supersymmetry has been discussed many times on this blog but a particular question on the Physics Stack Exchange today,
Why is the lightest Higgs not a free parameter in SUSY? (SE)
convinced me to write a new, not too long text about the way to construct minimally supersymmetric Lagrangians for \(d=4\) supersymmetric quantum field theories.

Just to be sure, the poster above asked why the Higgs mass seems to be freely adjustable in the Standard Model but there are various constraints on the Higgs mass in the Minimal Supersymmetric Standard Model – for example, the lighter Higgs mass can't be too much heavier than the Z-boson.

I answered that a heavy Higgs boson in the Standard Model also leads to some trouble such as instabilities but the bulk of my answer – which you can read if you click at the SE link above – was dedicated to an explanation why the Higgs masses can't be arbitrarily scaled in supersymmetric theories.

A short answer is that the quartic coupling \(\lambda\) in the \(\lambda h^4\) quartic (fourth-order) self-interaction of the Higgs field – which is an increasing function of the Higgs mass, assuming a fixed given vacuum expectation value (vev) – is no longer arbitrary in the MSSM. Instead, it is given by various combinations of \(g^2\) and \(g^{\prime 2}\) gauge couplings for the \(SU(2)\times U(1)\) electroweak gauge group. This follows from some insights from "101 SUSY model building". In this text, I would like to sketch how the \(\NNN=1\) supersymmetric Lagrangians in \(d=4\) may be constructed in some more detail.

This text may be viewed as a continuation of various previous SUSY texts on TRF, especially Supersymmetry: transformations of superspace. All the material I describe here may be found in SUSY textbooks such as the book by Michael Dine.

Fine. First of all, you must realize that supersymmetric quantum field theories are "just a subset" of quantum field theories. Unlike string theory, they're not "generalizing" quantum field theories in any sense. On the contrary, they are restricting quantum field theories, they are choosing a subset of them that exhibits the new nice symmetry, supersymmetry. So you may write all the superfields "in components" and you obtain an ordinary quantum field theory with \(j=0\) scalar fields, \(j=1/2\) spinor fermionic fields, and \(j=1\) gauge fields.

Now, I will be mostly focusing on renormalizable quantum field theories in \(d=4\). Roughly speaking, they're theories whose coupling constants are classically dimensionless or they have the units of a positive power of mass. Couplings with units of a positive power of length (i.e. negative power of mass) – those that you would be forced to place in front of very complicated high-mass-dimension terms such as \(h^{10}\) – would produce "nonrenormalizable theories", theories in which multi-loop Feynman diagrams would be increasingly more divergent because high powers of the loop energy, \(p^{k}\), would naturally arise to cancel the powers of these "naughty coupling constants" that have the units as negative powers of energy.

We also reduce our attention to fields with spin at most \(j=1\). The fields with \(j=3/2\) already require a local supersymmetry to get rid of some unphysical polarizations – so they're inevitably theories of supergravity if they're consistent at all. Fields with \(j=2\) must be linked to gravity in one way or another and fields with \(j\gt 2\) don't admit interactions consistent with the crucial new gauge symmetries at all.

And I will assume at most two-derivative terms in the Lagrangian. These conditions are fair but they reduce the room for possible quantum field theories dramatically.


First, which fields can we use to play the game? When it comes to their spin (or, more generally, representations under the Lorentz or super-Poincaré algebra), we may only use two possible types of fields:
  • Chiral superfields, unifying \(j=0\) and \(j=1/2\) excitations
  • Vector superfields, unifying \(j=1\) and \(j=1/2\) excitations
Note that both multiplets contain both bosonic and fermionic polarizations and their spins differ by \(\Delta j=1/2\). It has to be so because we're doing supersymmetry, stupid.

Using the superspace which is very helpful for \(\NNN=1\) theories in \(d=4\), we may rewrite a chiral superfield (a superfield is a field that depends on normal bosonic spacetime coordinates as well as the new, Grassmann or fermionic coordinates \(\theta^\alpha\) and/or \(\bar\theta^{\dot\alpha}\)) in terms of ordinary fields (that only depend on the bosonic coordinates \(x^\mu\)) as:\[

\Phi(x,\theta)= \phi(x)+\sqrt{2}\theta\psi(x)+ \theta^2 F(x).

\] That's very simple. This superfield is called "chiral" – the adjective is linked to "hand" in Greek and in physics, it always represents (left-right-asymmetric) things that distinguish the left from the right (hand but not only hand) – because the superfield only depends on \(\theta^\alpha\) which are left-handed spinors but it doesn't depend on the complex conjugate \(\bar\theta^{\dot \alpha}\) which are the right-handed spinors.

If I screwed the usual conventions for which indices are left and which are right, then I apologize but the error doesn't matter too much because the complex (or Hermitian) conjugate field to a chiral superfield always has to exist as well and it depends on the opposite \(\theta\)'s, and has the opposite handedness.

In the formula for \(\Phi\), the first term is a complex scalar scalar field \(\phi(x)\). The second term, with the \(\sqrt{2}\) conventional normalization, contains a Weyl spinor field \(\psi_\alpha(x)\) which must be contracted with the coordinates \(\theta^\alpha\) to get rid of the spinor index \(\alpha=1,2\). The final term is proportional to \(\theta^2\) which is the product of the two components of the two-component spinor \(\theta^\alpha\) – I won't write it with the values of indices because it would reveal that my notation is ambiguous as the superscripts may denote indices as well as powers. Note that this \(\theta^2\) is Lorentz-invariant. The dynamical part of the term is \(F(x)\) which is again a bosonic field but an auxiliary one. It's an F-term in the field.

The mass dimensions of the component fields \(\phi,\psi,F\) are \({\rm mass}\), \({\rm mass}^{3/2}\), and \({\rm mass}^2\), respectively. This is clear from the fact that \(\theta^\alpha\) has the dimension of \({\rm length}^{1/2}\). This simple scaling already tells you that \(|F|^2\) is a pretty nice term that may appear in the ordinary Lagrangian. I wrote the absolute value because the action has to be real but \(F\) is inevitably complex, much like \(\phi\) and \(\psi_\alpha\): chiral things have to be complex because they may be used as variables in holomorphic functions such as the superpotential.

Now, the other type of the player is the vector superfield, unifying a Yang-Mills gauge field with a Majorana \(j=1/2\) spinor. Indeed, the information in a \(d=4\) Majorana spinor is the same as in the \(d=4\) Weyl spinor but here I choose the "Majorana" terminology because the gauge fields are naturally "real", not complex and holomorphic, so this should hold for the spinor term in it, too. And "real spinors" are called Majorana spinors.

A vector superfield isn't chiral so it depends both on \(\theta^\alpha\) and \(\bar\theta^{\dot \alpha}\):\[

V &= i\chi -i\chi^\dagger-\theta\sigma^\mu \theta^* A_\mu +\\
&+ i\theta^2 \bar\theta \bar\lambda -i\bar\theta^2\theta \lambda + \frac 12\theta^2\bar\theta^2 D.

\] I should have used \(\bar\theta\) before as well (update: fixed retroactively) but now I introduced the bars for the \(\theta\)'s with the dotted indices. Note that the normal "gauge field" term with the vector index is multiplying the \(\theta\bar\theta\) structure exactly because the product of the two spinors of opposite chirality transforms as a vector (we may have inserted the 4D Pauli matrices \(\sigma^\mu\) in between the two spinors). The field \(A_\mu\) is the normal gauge field, with the units of mass. Again, you see that the last term – multiplied by all the four theta components – namely the D-term has the units of \({\rm mass}^2\) again, so \(D^2\) may appear and will appear in the ordinary Lagrangian.

We have already noticed but let me repeat it again. Even though the main bosonic field in the vector superfield \(A_\mu\) has a Lorentz vector index, we could actually write this field as one of the terms in a superfield that is a scalar \(V(x,\theta,\bar\theta)\) without any vector indices! The different components of the vector are encoded in the dependence on the theta's.

Now, what about the gauge symmetry? For \(U(1)\) fields, we liked to write the ordinary gauge transformation as\[

A_\mu\to A_\mu + \partial_\mu \lambda

\] or so. But this is obsolete because \(A_\mu\) is just one component in the expansion of a superfield. We want to construct supersymmetric theories so we must use the whole superfield \(V\) with the component \(A_\mu\) and all the other components, too. Consequently, \(\lambda\) must be promoted to a superfield, too. If you study how it can work so that you reproduce pretty much the same dynamics, you will realize that \(\lambda\) must be promoted to a chiral superfield and the gauge transformation acting on a \(U(1)\) vector superfield is\[

V \to V+i\Lambda -i\Lambda^\dagger

\] where \(\Lambda\) generalizes \(\lambda\) and is a chiral superfield. We added the complex conjugate term as well because \(V\), while not a chiral field, was real (or Hermitian, as an operator). Note that if we decided \(\Lambda\) to be a non-chiral, vector-like superfield, it would make all the degrees of freedom in \(V\) unphysical. That would be too much of a good thing, too much of a gauge symmetry.

You must have observed that our supersymmetric version of the gauge transformation only transforms \(V\) by a multiple of \(\Lambda\) rather than its spacetime derivatives. But it's OK because the old gauge field \(A_\mu\) is the \(\theta\bar\theta\)-proportional term in \(V\) and if you rewrite the gauge transformations in components, you will see that \(A_\mu\) de facto picks the derivative of \(\Lambda\).

For vector superfields, it's also useful to define the "gauge-invariant field strength" superfield generalizing the non-supersymmetric \(F_{\mu\nu}\) as\[

W_\alpha = -\frac{1}{4} \bar D^2 D_\alpha V

\] where the \(D\) objects are the superderivatives (dimensionally, they're "square roots" of normal bosonic derivatives). This "gauge-invariant field strength" superfield has a spinor index, unlike all previous superfields we have discussed (which were scalars so far), but unlike \(V\), it is moreover chiral and "fermionic". So the leading component without \(\theta\)'s is \(-i\lambda_\alpha\), a multiple of the gaugino spinor field, \(\theta_\alpha\) multiplies \(D\), the D-term, plus some multiple of \(F_{\lu\nu}\), and the \(\theta^2\) monomial multiples some first spacetime derivatives of \(\lambda^{*\alpha}\). I don't want to go into these things because much of the expansion of superfields into components involves a messy algebra and we have certainly gotten to that point already. ;-)

However, the \(U(1)\) gauge transformation of a charged chiral superfield is simple:\[

\Phi\to e^{-iq\Lambda} \Phi.

\] It's almost the same as it has been in non-supersymmetric theories. Moreover, those exponential formulae may be rather easily generalized from \(U(1)\) to non-Abelian gauge groups.

Gauge-invariant kinetic terms

Great. We want some actions that are renormalizable, gauge-invariant, and healthy. How do we construct the kinetic terms that produce e.g. the familiar Klein-Gordon term \(\partial_\mu \phi\cdot \partial^\mu\phi\) when expanded into components? Well, the superspace formula is actually easier than in non-supersymmetric theories once again. Instead of the spatial derivatives with \(\mu\)-indices carefully contracted, the kinetic term for the chiral superfield is simply\[

\LL_{\rm kin} = \int \dd^4 \theta \sum_i \Phi_i^\dagger \Phi_i.

\] There ain't no explicit derivatives here. This term is integrated over the whole four-dimensional fermionic part of the superspace. And that's the reason why the two spatial derivatives are produced at the end, out of the auxiliary components. You need to work hard to understand why all these things work but at the end, it's just some hard algebra. However, the term above – which I already summed over all the chiral superfields labeled by the index \(i\) – isn't gauge-invariant assuming that these fields carry charges. How do we make it gauge-invariant?

In non-supersymmetric theories, we would have to replace the partial derivatives \(\partial_\mu\) in the kinetic terms by the covariant derivatives \(D_\mu=\partial_\mu -ieA_\mu\). What about the supersymmetric theories in the superfield formalism? Well, we just insert \(e^V\) in between \(\Phi_i^\dagger\) and \(\Phi_i\):\[

\LL_{\rm kin} = \int \dd^4 \theta \sum_i \Phi_i^\dagger e^V \Phi_i.

\] Because many things simplify in the superspace, this simple insertion does the job. The exponential of the vector superfield \(V\) simply does the right "quasi gauge transformation" of the chiral superfield that guarantees that the result is gauge-invariant. Well, you may check the gauge invariance of this term. We have said how \(V\) transformed under a gauge transformation – we added \(\Lambda\) and \(\Lambda^\dagger\) to it. But when exponentiated, \(\exp(i\Lambda)\) and \(\exp(-i\Lambda^\dagger)\) – sorry if I switched the signs – exactly undo the same exponential factors that, as we have said, appear in the gauge transformation rules for the chiral superfields \(\Phi_i\) and \(\Phi_i^\dagger\). So the result is gauge-invariant.

We have said some things about the "non-chiral" part of the Lagrangian, one integrated both over \(\theta\) and \(\bar\theta\). But there's an interesting "chiral" part of the Lagrangian, the so-called superpotential \(W\). It only depends on the chiral superfields and it must depend holomorphically (or physicists could often say "analytically", ignoring the fact that for mathematicians, the latter adjective is somewhat more constraining). The superpotential term in the action is\[

\LL_W = \int \dd^2\theta\, W(\Phi_i) +\text{c.c.}.

\] The c.c. (complex conjugate) term to the first one depends on the opposite \(\theta\)'s and the fields \(\Phi^\dagger_i\), of course. The first interacting \(d=4\) supersymmetric theory that was studied was the so-called Wess-Zumino model. It had one chiral superfield and a quadratic-cubic superpotential, \(a\Phi^2+b\Phi^3\). That's the most general form that produces renormalizable interactions.

The funny thing is that if you carefully derive the equations of motion not only for the "old-fashioned crucial" components of the superfields but also for the auxiliary terms \(D_i,F_i\), you will realize that\[

F_i^* = -\pfrac {W}{\Phi_i}

\] is an equation of motion that "eliminates" the auxiliary term \(F_i\). However, if you study the superpotential part of the Lagrangian which matters – if you integrate over two theta's, what's left is the "highest-order" F-term-like component – you will realize that the superpotential Lagrangian may be rewritten into components – into the purely non-supersymmetric QFT language – as\[

\LL_W = \pfrac{W}{\Phi_i} F_i + \frac{\partial^2 W}{\partial \Phi_i\Phi_j} \psi_i\psi_j

\] which is capable of producing some potential terms for the scalars and some mass-like terms for the fermions. If you insert our equation of motion for \(F_i\), you will realize that the superpotential actually induces – in the non-supersymmetric language – a normal potential (this is not a vector superfield, just a clash of notation)\[

V = |F_i|^2 = \abs{ \pfrac{W}{\Phi_i} }^2.

\] So if you want to determine the "normal" potential, you differentiate the superpotential with respect to individual chiral superfields and square the absolute value of the result. That's why at most cubic superpotentials produce quartic ordinary potentials: the derivative of a cubic function is a quadratic one and the square of the latter is a quartic function.

Well, similar algebra applies to the vector superfields as well. The normal potential will also have contributions from \(D^2\), the squared auxiliary terms in the vector superfields, much like it had the \(|F|^2\) terms from the chiral superfields:\[

V =\sum_i \abs{F_i}^2 +\sum_a \frac{1}{2 g_a^2} (D^a)^2.

\] We have used some convention about whether or not the gauge coupling \(g_a\) is included in the normalization of the vector superfields; you know similar choices that you do in non-supersymmetric gauge theories. The equation of motion for the D-terms of the vector superfields tells you\[

D^a = \sum_i (g^a \phi_i^* T^a \phi_i).

\] So these D-terms are bilinear in the bosons from the chiral superfields, with charges (or matrices of the generators) playing the role of the coefficients. And the \((D^a)^2\) term in the "normal potential" therefore produces quartic (fourth-order) terms in the scalars.

That's the way and the only way how the quartic self-interaction for the Higgs fields are generated in the Minimal Supersymmetric Standard Model.

Indeed, there can't be any cubic term \(h^3\) in the superpotential \(W\) because \(W\) has to be gauge-invariant but you surely can't produce a gauge-invariant singlet as the third power of a charged doublet field. So all the quartic terms in the "normal potential" \(V\) have to arise from the D-terms, from the vector potentials. And that's why the quartic coupling constant(s) \(\lambda\) is (are) linked to various combinations of \(g^2\) for various factors in the gauge group. And that's why supersymmetry predicts inequalities for the Higgs masses, e.g. that at the tree level, the lightest Higgs boson has to be lighter than the Z-boson. (This inequality gets loosened already if you include one-loop corrections, especially from the top quark loops, and Higgs boson masses up to \(130\GeV\) would be compatible with the MSSM as a result.)

The most general Lagrangian

Using the pieces we have already encountered, the most general Lagrangian with at most two derivatives may be written as\[

\LL &= \int \dd^4\theta\, K(\Phi_i,\Phi^\dagger_i)+\\
&+ \int\dd^2 \theta\, W(\Phi_i) + \text{c.c.}+\\
&+ \int \dd^2\theta \,f_a(\Phi) (W_\alpha^{(a)})^2+\text{c.c.}

\] The first term is non-chiral and depends on the so-called Kähler potential which is a non-holomorphic function of the chiral superfields, i.e. a function of them and their complex conjugates. For the normal free scalars, you need \(K\) composed of \(\abs{\Phi_i}^2\) terms. For them to be gauge-invariant, you have to insert the \(e^V\) objects in between.

The functions determining the terms on the remaining two lines are holomorphic fields of the chiral superfields. That's also why we have to add the complex conjugate terms by hand; the action has to remain real. It's the superpotential \(W\) and the gauge coupling function \(f\) – separate for each factor of the gauge group – which has the form\[

f(g^2) = \frac{8\pi^2}{g^2}+ia+{\rm const.}

\] The imaginary part \(ia\) automatically and inevitably produces a term "counting the instantons" proportional to \(F\wedge F\). The real part is dominated by the \(1/g^2\)-like term but the function may be shifted by loop corrections once you start to calculate them.

At any rate, if you only want to define a classical theory that will produce a renormalizable quantum theory, the form of the supersymmetric theory is extremely constrained. All the potential-like interactions must be encoded in the at most cubic, gauge-invariant superpotential \(W\); it's the most adjustable information about the theory that knows about the Yukawa couplings and similar properties of "pure matter". The Kähler potential must be de facto quadratic and is uniquely determined. The gauge coupling functions reduce to the constant gauge couplings (but they also know about the "axionic" imaginary parts).

So the structure is pretty much determined once you choose your gauge group; invent your collection of chiral superfields and their charges or representations; and choose their holomorphic function \(W\).

It may be useful to mention that the functions \(K, W, f^a\) determine the dynamics even if you consider local i.e. gauged supersymmetry – i.e. if you study theories of supergravity. However, supergravity isn't renormalizable, anyway, so it isn't justified to demand that you will get a renormalizable theory. For this reason, these functions are pretty much unrestricted in supergravity theories. Of course, \(W\) and \(f^a\) must still be holomorphic functions. But the functions are often non-polynomial and the Kähler potential may actually define the Kähler potential of a curved Kähler manifold.

I guess that most readers who managed to penetrate up to this point agree that supersymmetric theories in \(d=4\) are elegant, determined just by a few choices, but they still give you all the dynamics you need to describe the real world: kinetic terms for scalars, fermions, and gauge fields; gauge couplings for charged fields and Yang-Mills gauge fields themselves; Yukawa couplings; cubic and quartic couplings for scalars.

When you expand the nice superspace formulae into components, you obtain messy equations. But you shouldn't consider it as supersymmetry's fault. It's your problem that you need to rewrite formulae in a messy way to find out what's really going on. Nature doesn't have to write any messy formulae: She knows how to calculate and control Nature directly by the elegant laws and principles. ;-) Moreover, the theories you obtain by rewriting all the superfields in terms of components fields are nothing of "unprecedented messiness"; they're nothing else than the non-supersymmetric quantum field theories you used to study before you learned about supersymmetry, with some values of the couplings and other parameters (that are related to the parameters of the SUSY theories by various transformations and redefitions but that may be constrained by additional constraints implied by SUSY).

For decades, people would study \(\NNN=1\) supersymmetric theories in \(d=4\) as the only kind of supersymmetric field theories that may be relevant for the Universe around us. But as I have mentioned in several recent blog entries, the gauge fields in Nature around us could actually be parts of \(\NNN=2\) supermultiplets – manifestations of a larger supersymmetry that only holds for the gauge fields but not for the matter fields (the latter must remain in chiral superfields). That would be even more extraordinary, of course, because the gauge fields would preserve a greater fraction of the ultimate stringy beauty of Nature, beauty that has to be contaminated by various symmetry-breaking processes for Nature to get rid of Her sterility.