Martin Ueding (Posts about Physics)https://martin-ueding.de/enContents © 2020 <a href="mailto:mu@martin-ueding.de">Martin Ueding</a>
<p><a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/80x15.png" /></a> This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.</p>
Fri, 30 Oct 2020 13:34:00 GMTNikola (getnikola.com)http://blogs.law.harvard.edu/tech/rss- Writing a PhD Thesishttps://martin-ueding.de/posts/writing-a-phd-thesis/Martin Ueding<div><p>In September 2017 I have finished my Master's degree in Physics. I was offered a PhD position by my supervisor and gratefully accepted the opportunity. During the master thesis I started to work with lattice simulations, supercomputer programming and was getting into it. Although I already did not want to persue a full career in science, I still wanted to do a bit more research in physics before leaving for the industry.</p>
<p>The thing is that in Germany usually only 25 % of the employees in institutes have permanent positions. And these are usually people who do administration part time. These people are full time professors, administrators of machine shops, computer clusters or something else. The majority of positions are temporary contracts. The rationale seems to be that research benefits from the exchange of ideas, and if people move around the instituions, knowledge is spread. This completely ignores the fact that these are people, eventually wanting to start a familiy and the like. One usually does not get a consecutive contract at the same institution and have to move, often somewhere within the EU. The real chance for a permanent position would be to have it mixed with something permanent, we have IT administrators that are part-time administrators and part-time researchers/teachers. But then it is not really a career in research, it is merely something in academia. All these things set my long-term route, but I did not want to leave research at that moment. So I took the opportunity to do research for another three years.</p>
<p>At the beginning of a PhD the topic is not clear cut. My advisor had a few ideas, and I mostly started working with my valued PhD student coworker Markus to work on his project. There I helped to refactor a C++ code which did tensor contractions. Over the time I learned more of the code, had more and more ideas on improving it. Together we worked on it a lot, it was a really great time. Also I helped to improve the analysis that he needed for his data. Over time it became our analysis, I wrote most of it in the early days. He explained some of the mathematical theory behind it, I implemented a bunch of statistical transformations. On some days we sat there until very late to make some plots really readable, pretty and informative. As the project came to a conclusion, he finished up his dissertation and eventually handed it in. I was super happy for him to finish, and also sad to see him go and move to a different city.</p>
<!-- END_TEASER -->
<h2 id="the-beginning">The beginning</h2>
<p>One thing that I saw with the three PhD students that started years before I did was the choice of programming language. The professor has his analysis library written in R, and that was rather feature complete. But the students did not like R and rather wanted to use Python. I can fully understand this feeling, Python is a nicer language than R is. But there was already so much tested code in R that they effectively set themselves back 1 or 1.5 years each. The result was a partially functioning code, which was only used by two or three of the PhD students. I had seen this pattern during my master thesis and decided that learning R cannot be that hard. In the end I learned R in a few weeks, started to mock other people's R code for not being idiomatic after a couple of months. Everyone who came after me is now asked to also learn R and use the code, it is now the official code of the whole group. In 2020 we even uploaded it to CRAN to make it usable for everyone easily!</p>
<p>Overlapping with Markus's analysis, my advisor has sensed the opportunity to compute three pion scattering on the lattice, there weren't any determinations with multiple references frames available at the time. So I got tasked to write a code which does the group-theoretical projection. Markus has taught me how the mathematics work, so I <em>just</em> had to implement it. Looking for the right tools took a few days, then I settled on <em>Wolfram Mathematica</em> for the task. I haven't used it before properly, so I took two days to learn the <em>Wolfram Language</em> and developed away. It worked really well, and I got the results that Markus got much faster as I could make use of all his knowledge. Unfortunately I could not ammend his code as it was too tailored for his specific task. I set out to make the code do all the mathematical steps. I know that I sometimes make mistakes and don't want to redo all the steps afterwards when I step a mistake. This was great, as everything was pretty consistent. It took a bit more time to get everything done in Mathematica compared to simply doing the simple steps by hand. It turned out that in the end I had like 200,000 terms in my expression. I was glad that I let the computer do it all the way.</p>
<p>The projection and contraction code were tested on multiple levels. This way I was confident in my intermediate results. Some people have spent months trying to track down some obscure error <em>somewhere</em> in their software stack. I did not want to do that, and rather spend a little more time to verify each level while I was working on it. This has saved me a bunch of time in retrospect.</p>
<h2 id="the-intermediate-period">The intermediate period</h2>
<p>One of the freedoms at the university was that I could work with my personal laptop. There were workstations available, but nobody was really taking care of them. They had some dated Debian installed on them, and I wanted to use my Fedora with a modern C++ compiler. So I bought a second docking station and set up my desk in the office the same as at home. I could just take the laptop out of one dock, change places and then dock in again. This was very practical, but it has one crucial disadvantage: One does not have any separation between work and home. I could work and play both at home and the office. Often I would continue to work at home and until late in the evening. Also I would do personal stuff in the office and get nothing done there.</p>
<p>I tend to err on the side of doing more work than needed. Some people procrastinate and then scramble to find time at the end. As time now and later is the same, a week full work at the beginning or the end does not really matter. So I started off well in the beginning and slowly eased as I saw that progress was going good. At some point I noted the times I put into work such that I could better allow myself to take the evening off. This is a pattern that one needs to observe, and act accordingly. During the last year of my thesis I noted the time and just stopped working after I reached around 40 hours a week. And then it got even better. I realized that I have a very productive four hours in the morning. I would do these with lots of concentration and then relax in the afternoon, letting my thoughts roam freely and sort themselves.</p>
<p>After almost exactly two years I hit a dip in my motivation. That seems to be a common theme among PhD students, as I learned later. I looked at the things that I still had to do, and also saw how long the project together with Markus took more time than we have anticipated. I thought that doing a second project would take another two years. Most PhD students that I have seen in our group took four years to finish. And they would transition into the industry and sometimes not even have handed in their thesis then. I feared that I would be stuck at the university for another two years with unclear prospects. I thought about quitting, moving to the industry, having a 9-to-5 job without having to constantly think about getting my thesis done before the contract runs out.</p>
<p>I have talked to my family, to my friends. They have encouraged me to do what I feel would be right. They said that nobody requires a PhD to be worth something, I already had a master's degree and that was already plenty qualification. Other friends in other departments have told me that they are now in their sixth year of their PhD studies and I thought that I should quit before I end up as an eternal and unpaid PhD student. And then I also talked to my advisor and the postdocs in our group. They told me that they can understand this phase. And if I wanted to quit, it would of course be possible, but they would be sad to see me go. My advisor told me that I could do it in three years. This gave me more confidence. And we set out a plan of a minimal thesis. Looking at all the things that I thought I had to do and the things that he set out as a base line, I realized that it was doable within a year. Even conservatively, it should be possible. So I did not quit, took this iternary and continued to work. And in the end I passed the baseline around six months before the end, and surpassed it significantly with three months before the end. In retrospect I don't even know why I did not ask my advisor about this first, perhaps I needed to sort my emotions with the help of family and friends first.</p>
<h2 id="writing-up">Writing up</h2>
<p>Writing the thesis was surprisingly fast. My writing style is that I let some topic float in my head and toss it around on occasion, mostly subconcious. And then some day I have the urge to write it all down. I would create a new section in my thesis, and write five pages about some particular thing. A few days later something else was done in my head and I could write it down. This way I was usually excited about writing, and I just had plenty to write about. Within a few weeks I produced like 50 pages of thesis. And in the coming weeks it was already at 100 pages.</p>
<p>As part of the COVID pandemic, I have worked from home the last six months of my program. It was not that difficult from a technical side, as I have the exact same screen and docking station setup at home as I have at the office. The major difference was lunch. We usually went to the campus canteen at noon, had a lunch and talked a lot, socialized. This was all gone, and I had to cook myself every single day. The lack of a routine was hard at first. After a while I got used to it, and towards the end I had a really good routine of working the fresh morning hours and just quitting when I felt my concentration exhausted.</p>
<p>If one has a job like working in a supermarket, on an assembly line or something like that, one cannot take any work home. Also I imagine that there are opportunities to get promotions, but one does not have to burn oneself out to advance. With a PhD program, it is really bizarre. My work contract would be just part-time. I was supposed to work on the thesis in the other half, my free time. But what exactly do I do in the work group? I work on projects, and one of them became my main thesis project. So there is on way to really set this apart. And the problem was that I would not be guaranteed the PhD title at the end of the three years. However, I only got the money for three years. Basically I either manage to finish in time, or I fall off the employment cliff and would have to either finish the thesis in parallel to a full time job (unrealistic) or finish it while living off savings (unpleasant). This made me to work really hard, and almost always think about the thesis. As it was my first PhD thesis, I did not know how much work was needed. People around me had issues finishing within three years, some were even at it for like seven years. Without a reference, I just worked a lot. Sometimes I had the feeling of self-exploitation, and that is not a good sign. I am not sure how to fix this without making the PhD a fixed outcome after the three years. Perhaps it should be with a master thesis where the time is limited and one either has to hand it in (and accept the grade) or just quit altogether. But not this endless lingering and potentially endless projects that delay and derail everything.</p>
<p>Although my work contract said that I did not have to do teaching, I still had to do teaching. I decided that although it was fun, it was time that I would not have for my thesis. So I took the classes which were much less work then the others. In the end I did the computer physics classes like three times each. Over time I just got more efficient in reviewing the homework of the students. I guess I did not learn as much as other students who taught along different lectures every semester, but it wasn't really a priority for me. A little into the PhD I was asked to take care of the computers in the theory department, instead of teaching. I accepted this job, and it turned out to be even less work than teaching. It was effort to set it up, but once it was automated, it was really easy. The downside was that I needed to fix things at the worst times, and people would come to my office with their computer problems, asking for a fix. Perhaps I would do it again, perhaps I would choose teaching instead.</p>
<p>The PhD defense needs to be done with four lecturers. In my case it was surprisingly easy to find the other three. I just asked my second advisor for the master thesis, he quickly agreed. And I pitched my thesis to two other professors and got a confirmation on the same day. I would have thought that this was some lengthy process, yet it was done so very quickly.</p>
<p>In the end I managed to turn in my dissertation some six weeks before the contract ran out. It was rather relaxed, and at the time I did not feel like anything major had happened. I just did not know what to do right after, as “working on my thesis” wasn't applicable any more. It was a really strange feeling, and over the weeks I finally got the sense of having it done.</p></div>EnglishPhysicshttps://martin-ueding.de/posts/writing-a-phd-thesis/Tue, 13 Oct 2020 22:00:00 GMT
- Fahrgeräuschresonator zwischen Häusernhttps://martin-ueding.de/posts/fahrgerauschresonantor/Martin Ueding<div><p>Wenn ein Haus mit der Front parallel zur Straße steht und gegenüber noch ein Haus ebenfalls parallel steht, ergibt sich ein wunderbarer Resonator. Im Bild sind die beiden grauen Blöcke die Häuser, das rote ein Auto auf der Fahrbahn zwischen den beiden Häusern.</p>
<p><img alt="" src="https://martin-ueding.de/posts/fahrgerauschresonantor/strasse.svg"></p>
<p>Der Abstand der Hauswände ist ungefähr 15 m. Im ersten Stock sind die Fenster auf vielleicht 5 m Höhe. Damit hat man einen Winkel von 33° von der Fahrbahn direkt zum Fenster. Die Strecke, die der Schall zurücklegt ist dann 9.0 m. Aus der anderen Richtung mit Reflexion an der Hauswand ist der Steigungswinkel nur noch 13°. Die Gesamtstrecke für den Schall ist dann 23.0 m.</p>
<p>Wir haben also einen Gangunterschied von 14 m. Bei einer Schallgeschwindigkeit von 330 m/s sind die Resonanzfrequenzen dann Vielfache von 23.5 Hz. In einem Schallspektrum müsste man dann so Interferenzlinien sehen, wie sie beim <a href="https://de.wikipedia.org/wiki/Doppelspaltexperiment">Doppelspaltexperiment</a> vorkommen.</p>
<p>Mit der Android-App <em>Spectroid</em> habe ich dann einfach am Fenster das Schallspektrum aufgenommen, während ein Auto vorbeigefahren ist. Die Zeit verläuft nach oben, unten ist alt, oben ist neu. Zur Seite sind die Frequenzen aufgetragen. Links sind die tiefen Frequenzen, rechts die hohen. Je heller es ist, desto stärker war diese Frequenz zu dem Zeitpunkt vertreten.</p>
<p><img alt="" src="https://martin-ueding.de/posts/fahrgerauschresonantor/spektrum.jpg"></p>
<p>In der Ellipse sieht man, wie es erst lauter und dann wieder leiser wird. Das Auto nähert sich, und fährt wieder weg. Und dann ist da noch bei 43 Hz, also dem doppelten der grob abgeschätzten Resonanzfrequenz, ein signifikanter Beitrag. Es ist auch zeitlich beschränkt auf die Zeit, während der das Auto genau zwischen den Häusern war.</p>
<p>Man kann hier also gut eine Interferenz von Wellen in einem Resonanzraum zwischen zwei parallelen Häusern beobachten. Den Effekt kann man auch ohne Spektralanalyse wahrnehmen: Es wummert unangenehm, wenn ein Auto vorbeifährt.</p></div>GermanPhysicshttps://martin-ueding.de/posts/fahrgerauschresonantor/Tue, 22 Sep 2020 22:00:00 GMT
- Fit Range Determination with Machine Learninghttps://martin-ueding.de/posts/fit-range-determination-with-machine-learning/Martin Ueding<div><p>One of the most tedious and error-prone things in my work in Lattice QCD is the manual choice of fit ranges. While reading up on Keras, deep neural networks and machine learning and how experimental the whole field is, I thought about just trying the fit range selection with deep learning.</p>
<p>We have correlation functions $C(t)$ which behave as $\sum_n A_n \exp(-E_n t)$ plus noise. The $E_n$ are the energies of the state $n$, the $A_n$ are the respective amplitudes. We are interested in extracting the smallest of the $E_n$, the ground state energy. We use that for sufficiently large times $t$ the term with the smallest energy dominates the expression. Without loss of generality we say $E_0 < E_1 < \ldots$ and formally write
$$ \lim_{t \to \infty} C(t) = A_0 \exp(-E_0 t) \,. $$</p>
<p>By taking the <em>effective mass</em> as defined by
$$ m_\text{eff}(t) = - \log\left(\frac{C(t)}{C(t+1)}\right) $$
we get $m_\text{eff}(t) \sim E_0$ in the region of large $t$. There are more subtleties involed (back-propagation, thermal states), which we will ignore here. The effective mass is expected to be constant in some region of the data where $t$ is sufficiently large such that the higher states have decayed; yet the exponentially decaying signal-to-noise-ratio is still sufficiently good. An example for such an effective mass is the following.</p>
<p><img alt="" src="https://martin-ueding.de/posts/fit-range-determination-with-machine-learning/effmass_example.png"></p>
<!-- TEASER_END -->
<p>Fitting a constant to the effective mass allows to extract the energy of the ground state, $E_0$. In the above image one can see such a manually chosen fit range. It starts after the excited states that come from above have decayed and stops before the noise takes over. Such a <em>pleateau</em> must have all data points statistically compatible with the fitted value, fluctuations shall be $\chi^2$ distributed. This is fancy for saying that most should lie within one error bar, some within two error bars and only very few within three error bars or more to the fitted line.</p>
<p>For my dissertation I have to determine around 500 of these ranges, and it is getting boring rather quickly. Especially after every change in the data, this needs to be re-done. So perhaps after doing a few hundred of them, I could train a neural network to do this work for me? Already at this point I know that even if I should find such a solution to this problem, it would need a lot of vetting from my peers before it would be taken credible. Therefore I will still need to verify all the fit ranges by hand. Still I find it an interesting side project to look at.</p>
<p>For this project I again use a <a href="https://jupyter.org/">Jupyter Notebook</a>, which is just as great of a platform for Python as <a href="https://rmarkdown.rstudio.com/">R Markdown</a> is for R. I can recommend it over working with a script file in both languages.</p>
<h2 id="transferring-the-data">Transferring the data</h2>
<p>I have all my analysis data in R. Machine learning with Keras is done in Python. So I have used <a href="http://dirk.eddelbuettel.com/code/rcpp.cnpy.html">RcppCNPy</a> to export my data from R into the NumPy format. There is a limitation that only 1D and 2D data structures can be exported. Also one needs to keep in mind that R has the <a href="https://en.wikipedia.org/wiki/Row-_and_column-major_order">column-major layout</a> that FORTRAN uses whereas NumPy uses the row-major layout of C. I transpose the tensor in R using <code>aperm</code> before storing it with <code>npySave</code>.</p>
<p>From my analysis I have a lot of things available, but the neural network can likely only look at the effective mass. The actual correlator varies on large scales and from what I read the neural networks like data that is somewhat normally distributed. I also need to export the uncertainties of each point. The central values may fluctuate around the plateau within their errors. Just looking at the central values is not enough. I need both as input, although I am not sure how values and errors should be fed into the neural network.</p>
<p>I export the data for a particular ensemble only. This might be a problem with generalization to other ensembles, but then the lattice spacing and pion mass would also need to be input to the neural network. I want to keep it simple. On the cA2.60.32 ensemble we always have a time extent of $T = 64$ such that half the time (correlator is symmetric and therefore redunadant) is 32 slices. The resulting data tensor will be of shape $(N, 32, 2)$ for $N = 142$ measurements. 32 time slices and the two features (value, error).</p>
<p>In R I have the transposed structure with shape $(2, 32, N)$. So make sure that I have the data correctly transferred, I make a plot of the 7th correlator in R:</p>
<pre class="code literal-block"><span></span><code><span class="n">hadron</span><span class="o">::</span><span class="nf">plotwitherror</span><span class="p">(</span>
<span class="n">x</span> <span class="o">=</span> <span class="m">1</span><span class="o">:</span><span class="nf">length</span><span class="p">(</span><span class="n">all_data</span><span class="p">[</span><span class="m">1</span><span class="p">,</span> <span class="p">,</span> <span class="m">7</span><span class="p">]),</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">all_data</span><span class="p">[</span><span class="m">1</span><span class="p">,</span> <span class="p">,</span> <span class="m">7</span><span class="p">],</span>
<span class="n">dy</span> <span class="o">=</span> <span class="n">all_data</span><span class="p">[</span><span class="m">2</span><span class="p">,</span> <span class="p">,</span> <span class="m">7</span><span class="p">])</span>
</code></pre>
<p>And then I do the same thing with my NumPy data structure. Keep in mind that R is 1-indexed and Python is 0-indexed.</p>
<pre class="code literal-block"><span></span><code><span class="n">ax</span><span class="o">.</span><span class="n">errorbar</span><span class="p">(</span>
<span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">33</span><span class="p">),</span>
<span class="n">data</span><span class="p">[</span><span class="mi">6</span><span class="p">,</span> <span class="p">:,</span> <span class="mi">0</span><span class="p">],</span>
<span class="n">data</span><span class="p">[</span><span class="mi">6</span><span class="p">,</span> <span class="p">:,</span> <span class="mi">1</span><span class="p">],</span>
<span class="n">marker</span><span class="o">=</span><span class="s1">'o'</span><span class="p">,</span>
<span class="n">linestyle</span><span class="o">=</span><span class="s1">'none'</span><span class="p">)</span>
</code></pre>
<p>This gives me the same looking plot and I am confident that I have the data just the way that I want it.</p>
<p>The fit ranges (target values) are easier, I have tensor of shape $(N, 2)$ which contains the beginning and end of the fit range as an integer.</p>
<h2 id="choosing-the-network-model">Choosing the network model</h2>
<p>As the correlator data that I analyze is a time series, there are two options that I already saw covered in the book:</p>
<ol>
<li>
<p>A recurrent neural network (RNN), made with LSTM or GRU layers.</p>
</li>
<li>
<p>A convolutional neural network (convnet), made with convolutional and pooling layers.</p>
</li>
</ol>
<p>I think that we do not really need that much global information, we want to check locally for a plateau. So we will first start with a convolutional layer. Perhaps later we try the recurrent neural network as well. Luckily Keras is so easy to work with that one can just exchange the building blocks and train the network again.</p>
<h2 id="encoding-the-target-data">Encoding the target data</h2>
<p>Then we need to figure out a way to encode the target data. Just having two integers is likely not going to work very well. If we were to target only a single integer, we would use a one-hot encoding for the numbers, a softmax activation function and categorical crossentropy as loss function. We have two integers, so perhaps we need to have a non-sequental network graph to generate two one-hot encoded outputs.</p>
<h3 id="marking-the-plateau">Marking the plateau</h3>
<p>An alternative would be to mark the plateau region by having the plateau region all 1's and everything around all 0's. The neural network would basically give the chance of a point belonging to a plateau on every single time slice.</p>
<p>This is easily generated from the given data. One just has to be careful that the <a href="https://github.com/HISKP-LQCD/hadron">hadron</a> fit routine takes <code>tmin</code> and <code>tmax</code> being inclusive-inclusive whereas Python slicing takes them to be inclusive-exclusive. Also they are 0-based array indices, therefore just a treatment on the <code>tmin</code> is needed.</p>
<pre class="code literal-block"><span></span><code><span class="n">target</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">n_meas</span><span class="p">,</span> <span class="mi">32</span><span class="p">))</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n_meas</span><span class="p">):</span>
<span class="n">tmin</span><span class="p">,</span> <span class="n">tmax</span> <span class="o">=</span> <span class="n">labels</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">:]</span>
<span class="n">target</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">tmin</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span><span class="n">tmax</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
</code></pre>
<p>This encoding then looks like the following with <code>ax.imshow(target)</code>:</p>
<p><img alt="" src="https://martin-ueding.de/posts/fit-range-determination-with-machine-learning/Bildschirmfoto_20200531-20:26:56-c5a-Auswahl.png"></p>
<p>The training process also needs a loss function and a metric to judge the success. Looking at the <a href="https://www.tensorflow.org/api_docs/python/tf/keras/losses">documentation for the losses</a> we can see that there are a bunch of them. The <em>categorical crossentropy</em> is not applicable here, so we just try the <em>mean absolute error</em> which is defined as <code>mean(abs(y_true - y_pred))</code>. We therefore get a penality for every point that is marked as a pleateau but should not and vice versa.</p>
<p>For the activation I am not sure what to use, we will just go with a sigmoid function to amplify it towards either a 0 or 1.</p>
<p>In order to measure success in the end I use the <em>false positive</em> and <em>false negatives</em> metric. This way we can see how many of the $142 \times 32 = 4544$ result elements were computed incorrectly and how it is biased.</p>
<p>One problem with this approach certainly is that the fit range does not need to be consecutive. Holes in the fit range could be represented this way, but we do not want to allow this.</p>
<h3 id="one-hot-encoding-start-and-end">One-hot encoding start and end</h3>
<p>An alternative approach would be to use one-hot encoding for the start and also for the end. The <a href="https://www.tensorflow.org/api_docs/python/tf/keras/activations/softmax">softmax</a> transformation also has an <code>axis=-1</code> default argument which means that is just applied by that axis and we can have both start and end in the same result data.</p>
<p>The encoding is straightforward.</p>
<pre class="code literal-block"><span></span><code><span class="n">target</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">n_meas</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">32</span><span class="p">))</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n_meas</span><span class="p">):</span>
<span class="n">tmin</span><span class="p">,</span> <span class="n">tmax</span> <span class="o">=</span> <span class="n">labels</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">:]</span>
<span class="n">target</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">tmin</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">target</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">tmax</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
</code></pre>
<p>And the result is just as expected, here is just the first fit range shown:</p>
<p><img alt="" src="https://martin-ueding.de/posts/fit-range-determination-with-machine-learning/one-hot.png"></p>
<p>For the loss we can use the <em>categorical crossentropy</em>, and the metric will be <em>accuracy</em>. This then tells us how many starts and ends have been determined correctly.</p>
<h2 id="dense-approach">Dense approach</h2>
<p>Before trying anything more fancy, we can just go ahead with a simple dense model. Chollet writes that one should try with the simplest model first and then justify the expense of trying more complex models by the simples ones not performing well.</p>
<h3 id="using-marked-plateau">Using marked plateau</h3>
<p>The simple dense model that we will try first is defined as such:</p>
<pre class="code literal-block"><span></span><code><span class="n">network0</span> <span class="o">=</span> <span class="n">keras</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">Sequential</span><span class="p">()</span>
<span class="n">network0</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s1">'relu'</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="mi">2</span><span class="p">)))</span>
<span class="n">network0</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Flatten</span><span class="p">())</span>
<span class="n">network0</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s1">'sigmoid'</span><span class="p">))</span>
<span class="n">network0</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="n">optimizer</span><span class="o">=</span><span class="s1">'rmsprop'</span><span class="p">,</span>
<span class="n">loss</span><span class="o">=</span><span class="s1">'mean_absolute_error'</span><span class="p">,</span>
<span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="n">keras</span><span class="o">.</span><span class="n">metrics</span><span class="o">.</span><span class="n">FalsePositives</span><span class="p">(),</span>
<span class="n">keras</span><span class="o">.</span><span class="n">metrics</span><span class="o">.</span><span class="n">FalseNegatives</span><span class="p">()])</span>
</code></pre>
<p>The model therefore looks like this after compilation:</p>
<pre class="code literal-block"><span></span><code>Layer (type) Output Shape Param #
=================================================================
dense_27 (Dense) (None, 32, 128) 384
_________________________________________________________________
flatten_16 (Flatten) (None, 4096) 0
_________________________________________________________________
dense_28 (Dense) (None, 32) 131104
=================================================================
Total params: 131,488
Trainable params: 131,488
Non-trainable params: 0
</code></pre>
<p>We then train the network with these options:</p>
<pre class="code literal-block"><span></span><code><span class="n">history</span> <span class="o">=</span> <span class="n">network0</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span>
<span class="n">data</span><span class="p">,</span> <span class="n">target</span><span class="p">,</span>
<span class="n">epochs</span><span class="o">=</span><span class="mi">300</span><span class="p">,</span>
<span class="n">batch_size</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span>
<span class="n">validation_split</span><span class="o">=</span><span class="mf">0.2</span><span class="p">)</span>
</code></pre>
<p>The loss and metric look like this over the epochs:</p>
<p><img alt="" src="https://martin-ueding.de/posts/fit-range-determination-with-machine-learning/network0-loss-1.png"></p>
<p>A mean absolute error of 0.12 means that 12 % of the time slices results are incorrect as the absolute error per slice is either 0.0 or 1.0. And looking at the rate of false positive and false negatives, we see that we have like 8 % false positives and 4 % false negatives.</p>
<p>The encoding of the plateaus shows us that the network has not really learned that much about the data but really just assumes pretty much the same range for most data sets.</p>
<p><img alt="" src="https://martin-ueding.de/posts/fit-range-determination-with-machine-learning/network0-target-actual-1.png"></p>
<p>Taking the difference between actual and target shows that there are many mistakes and that this model is not that great.</p>
<p><img alt="" src="https://martin-ueding.de/posts/fit-range-determination-with-machine-learning/network0-target-actual2-1.png"></p>
<p>We are not overfitting, perhaps one should just give it more freedom? Not, it does not seem to get any better than the 12 % error rate.</p>
<p><img alt="" src="https://martin-ueding.de/posts/fit-range-determination-with-machine-learning/network0-loss-2.png"></p>
<p>That is the baseline that we would have to beat.</p>
<h3 id="using-one-hot-start-and-end">Using one-hot start and end</h3>
<p>We can also try this model using the other encoding of the target data. I am not quite sure how that works exactly with the activation because the dense layers cannot have a shape but must be flat. So I try to reshape and then use the softmax activation later on.</p>
<pre class="code literal-block"><span></span><code><span class="n">network0</span> <span class="o">=</span> <span class="n">keras</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">Sequential</span><span class="p">()</span>
<span class="n">network0</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s1">'relu'</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="mi">2</span><span class="p">)))</span>
<span class="n">network0</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Flatten</span><span class="p">())</span>
<span class="n">network0</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s1">'relu'</span><span class="p">))</span>
<span class="n">network0</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Reshape</span><span class="p">((</span><span class="mi">2</span><span class="p">,</span> <span class="mi">32</span><span class="p">)))</span>
<span class="n">network0</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Activation</span><span class="p">(</span><span class="s1">'softmax'</span><span class="p">))</span>
<span class="n">network0</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="n">optimizer</span><span class="o">=</span><span class="s1">'rmsprop'</span><span class="p">,</span>
<span class="n">loss</span><span class="o">=</span><span class="s1">'categorical_crossentropy'</span><span class="p">,</span>
<span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s1">'accuracy'</span><span class="p">])</span>
</code></pre>
<p>The results are devastating. It starts to overfit pretty much right from the start:</p>
<p><img alt="" src="https://martin-ueding.de/posts/fit-range-determination-with-machine-learning/network0-loss-3.png"></p>
<p>When just looking at the start of the fit range, it does not look appealing either.</p>
<p><img alt="" src="https://martin-ueding.de/posts/fit-range-determination-with-machine-learning/network0-target-actual-3.png"></p>
<p>In the difference plot one can see that the start of the fit range is off by a few elements.</p>
<p><img alt="" src="https://martin-ueding.de/posts/fit-range-determination-with-machine-learning/network0-target-actual2-3.png"></p>
<p>Given that 80 % of the data has been used for training and that it is overfitting, this does not look too good. One could try to regularize this model to make the overfitting less pronounced, but I fear that this won't make it any better.</p>
<h2 id="convolutional-approach">Convolutional approach</h2>
<p>The convolutional layer can combine information from the local neighborhood. This makes a lot of sense for finding a pleateau because it should identify parts where the central values have no trend (linear coefficient) but also no curvature (quadratic coefficient).</p>
<p>We also need to somehow make it use the uncertainty as well as the central values. The central values $m_\text{eff}(t)$ may vary around $\Delta m_\text{eff}(t)$, but not much more. Basically $m_\text{eff}(t) \pm \Delta m_\text{eff}(t)$ is the corridor where it may vary. With a 2D convolutional layer the neural network might be able to pick up this information somehow and massage it into features like “constant within errors” and “upwards/downwards trend within errors”.</p>
<p>The target encoding using 1's in the pleateau region and 0's elsewhere seems to make sense here.</p>
<p>The network that I have chosen is the following:</p>
<pre class="code literal-block"><span></span><code><span class="n">network1</span> <span class="o">=</span> <span class="n">keras</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">Sequential</span><span class="p">()</span>
<span class="n">network1</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Conv2D</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s1">'relu'</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">)))</span>
<span class="c1">#network1.add(keras.layers.Reshape((30, 32)))</span>
<span class="c1">#network1.add(keras.layers.MaxPooling1D((2,)))</span>
<span class="c1">#network1.add(keras.layers.Conv1D(64, (3,), activation='relu'))</span>
<span class="c1">#network1.add(keras.layers.MaxPooling1D((2,)))</span>
<span class="c1">#network1.add(keras.layers.Conv1D(64, (3,), activation='relu'))</span>
<span class="c1">#network1.add(keras.layers.MaxPooling1D((2,)))</span>
<span class="n">network1</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Flatten</span><span class="p">())</span>
<span class="n">network1</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dropout</span><span class="p">(</span><span class="mf">0.3</span><span class="p">))</span>
<span class="n">network1</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s1">'relu'</span><span class="p">))</span>
<span class="n">network1</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s1">'sigmoid'</span><span class="p">))</span>
<span class="nb">print</span><span class="p">(</span><span class="n">network1</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span>
<span class="n">network1</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="n">optimizer</span><span class="o">=</span><span class="s1">'rmsprop'</span><span class="p">,</span>
<span class="n">loss</span><span class="o">=</span><span class="s1">'mean_absolute_error'</span><span class="p">,</span>
<span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="n">keras</span><span class="o">.</span><span class="n">metrics</span><span class="o">.</span><span class="n">FalsePositives</span><span class="p">(),</span>
<span class="n">keras</span><span class="o">.</span><span class="n">metrics</span><span class="o">.</span><span class="n">FalseNegatives</span><span class="p">()])</span>
</code></pre>
<p>It starts with a convolutional layer that uses a 3×2 stencil to pick up the error from the other feature dimension. This way it should be able to build stencils that resolve a trend within errors. As there are only linear transformations, it likely cannot do a $t$-test, so we might see limitations.</p>
<p>I let it go directly to a dense classification network in the hope that this would become a somewhat diagonal thing and pick out the applicable stencils that the convolution has learned.</p>
<p>Keras provides the following summary of the model:</p>
<pre class="code literal-block"><span></span><code>Layer (type) Output Shape Param #
=================================================================
conv2d_39 (Conv2D) (None, 30, 1, 64) 448
_________________________________________________________________
flatten_39 (Flatten) (None, 1920) 0
_________________________________________________________________
dropout_3 (Dropout) (None, 1920) 0
_________________________________________________________________
dense_70 (Dense) (None, 128) 245888
_________________________________________________________________
dense_71 (Dense) (None, 32) 4128
=================================================================
Total params: 250,464
Trainable params: 250,464
Non-trainable params: 0
</code></pre>
<p>The results are slightly worse than with the pure dense network.</p>
<p><img alt="" src="https://martin-ueding.de/posts/fit-range-determination-with-machine-learning/network1-loss-1.png"></p>
<p>From the target-actual-plot I would even think that it shows less individuality for each measurement but treats them mostly the same.</p>
<p><img alt="" src="https://martin-ueding.de/posts/fit-range-determination-with-machine-learning/network1-target-actual-1.png"></p>
<p>And in the difference plot it also looks disheartening.</p>
<p><img alt="" src="https://martin-ueding.de/posts/fit-range-determination-with-machine-learning/network1-target-actual2-1.png"></p>
<p>Adding all the additional blocks made of convolutional and pooling layers does not improve anything. This does not surprise me as for this problem we don't really need more complicated global features (like with image classification) but rather need the spatial information.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I have now tried a few different models and parameterizations of the data. It feels as if either there is insufficient data to actually solve this problem in a satisfactory fashion or I am not experienced enough to find a good neural network for this problem. I haven't tried the recurrent layers yet, perhaps they also won't work that well.</p>
<p>Last year we had a discussion of this exact problem with machine learning specialists and they deemed this a hard problem. If a simple dense or convolutional network would have been the answer, they likely would have suggested it. Therefore I am happy to have played around with it, but also am willing to just leave it at this for now.</p>
<p>Even if this would reproduce exactly the fit ranges that I have chosen, it would be unclear how the systematic error from chosing the fit range would be treated. The neural network cannot really explain its reasoning like a human could try, so one would be stuck with a sortof black box in the analysis chain.</p></div>EnglishKerasMachine LearningPhysicshttps://martin-ueding.de/posts/fit-range-determination-with-machine-learning/Sun, 31 May 2020 22:00:00 GMT
- What I do in my Master Thesishttps://martin-ueding.de/posts/master-thesis-introduction/Martin Ueding<div><p>All the matter around is is made up from atoms. The atoms are really small,
about $10^{-8} \, \mathrm{cm}$. That is $0.000\,000\,01 \, \mathrm{cm}$. Each
atom consists of a nucleus and electrons. The movement of the electrons around
the nucleus is described by quantum mechanics. The image I drew below is not
very accurate; the accurate thing is really hard to draw. The atomic nucleus is
just a ten-thousandths of the size of the whole atom; the atom is mostly empty!
Inside the nucleus, there are protons and neutrons, those are called nucleons.
Each of the protons and neutrons consist of three quarks. The quarks are bound
together by the strong force which is mediated by gluons.</p>
<p><img alt="" src="https://martin-ueding.de/posts/master-thesis-introduction/hierarchy.svg"></p>
<p>The hierarchy of (1) matter, (2) atoms, (3) atomic nucleus, and (4) nucleons.</p>
<!-- END_TEASER -->
<p>The size of the quarks and gluons is ridiculously tiny, it is $10^{-13} \,
\mathrm{cm}$ as indicated in the drawing. This is the scale which is covered in
my thesis.</p>
<p>Quarks have a so called "color charge" which is like the electric charge just
that it comes in three colors. There is no actual color, the behavior can just
be pictured well with red, green, and blue light. No color is black, all three
colors is white. The opposite of a color is its complementary color. The gluons
interact only with colored objects, they do not "see" objects which are black
or white. One characteristic of the strong force is that the quarks are never
seen alone. They always come in bundles of two or three quarks that are black
or white together. You can see that the proton consists of each color each, it
is a white object together.</p>
<p>There are two major complications which set the strong force apart from the
other forces:</p>
<ul>
<li>
<p>The force is strong, so it has a major effect on everything. Compare this
to gravity: All the objects around us fall to the ground, but they do so in
one piece. It is not like gravity would tear everything apart. Also things
to not attract each other measurably, they just fall down towards earth.
The strong force is like sticky glue, everything strongly sticks together.
One cannot look at a quark or a gluon without having to look at all the
things that also stick to it.</p>
</li>
<li>
<p>The force carrier particles also interact with each other. This is very
hard to imagine. With light, we know from experience that there is no
noteworthy interaction between light particles. If one takes two
flashlights and crosses them, the beams of light pass through each other.
If one could make flashlights which have beams of strong force, it would
look very different: The beams would stick to each other and form a blob of
light, like a ball of wool.</p>
</li>
</ul>
<p>These two complications mean that one cannot do computations with the normal
tools that a theoretical physicist has. This is an example for a process that
holds two parts in a proton together:</p>
<p><img alt="" src="https://martin-ueding.de/posts/master-thesis-introduction/interaction-diagram.svg"></p>
<p>The force carriers create new particles from the vacuum (quantum fluctuations)
which get annihilated again quickly. The strange thing is that more complicated
processes are more likely to occur! This means that in order to compute what
actually happens, one needs to include arbitrary complex processes. Of course
this is not feasible, and there is nowhere one can start working. One needs a
different approach.</p>
<p>One approach is to just "ask" nature what happens and make a particle physics
experiment. Experimentalists can "weigh" the proton (which can occur on its own
because it is "white") and obtain its mass. They cannot measure the quarks
because they have color and stick to all the gluons and quantum fluctuations.
The theory one has about the strong interaction in principle predicts what the
mass of the proton is when one would know the masses of the quarks. Besides the
complexity that one cannot weigh them directly, the theory is so hard to
evaluate, that this prediction cannot be made with an analytic computation.</p>
<p>Another approach is to use a computer and simulate the quantum fluctuations of
the vacuum inside some box. One cannot simulate the whole universe but taking a
box which is large enough to contain a whole proton is the best approximation
one can do. The computer cannot simulate to infinite precision inside the box,
so one has to limit oneself to a couple of points inside that box. Effectively
one reduces time and space to a grid, just like on graph paper. There the
number of possibilities is limited to a finite number of integrals. This number
is still huge, for 64 lattice points we have $536\,870\,912$ integrals to
solve. Solving an integrals takes at least 100 points, so we have to look at
$53\,687\,091\,200$ values. Luckily, most of these are almost zero, so we can
safely ignore them. Using a couple of tricks, one can just look at the most
important ones.</p>
<p>Unfortunately, it still takes a lot of computing power to simulate this. One
step that occurs often is the solution of a system of equations that in the
largest simulations has $40\,532\,396\,646\,334\,464$ unknowns. Here one needs
additional tricks, but it will stay horrible complex. In order to get some
results, one needs a supercomputer. They do not make programs magically faster,
one has to work hard as a programmer to utilize the power that the computer
actually has. Actually, it is not one computer, but many computers with a
powerful network. Each of the computer has multiple CPUs (processors) and GPUs
(graphics cards). Each CPU has multiple cores, on each core there can be
multiple concurrent threads. In each thread one can cram four similar
operations at a time. The GPU has lots of cores, each core can do 32 similar
operations at a time. Each unit has its own memory (RAM). When writing a
program, one has to think really hard about splitting the problem into small
but similar tasks that can be distributed among all those computing resources.</p>
<p>On a computer which has around $10\,000$ processor cores, it still takes days
or weeks to run a simulations. It means that one has to have a massive budget
of computing time. Running that simulation on a laptop is just not feasible.</p>
<p>There are multiple projects out there for doing those kind of simulations. They
usually have between $300\,000$ to $700\,000$ lines of programming code.
Printing that would give up to $10\,000$ pages. That is more than two times the
whole "Harry Potter" series! Converting the printed code to soccer fields, it
would be around 0.03 to 0.09 soccer fields filled with paper.</p>
<p>Either way, this is not something that a single person can write in a year. In
my thesis work, I extend existing code with more physics and try to make things
faster than they already are.</p>
<hr>
<ul>
<li>
<p>When converting that into soccer fields, I noticed that the soccer field is
not even standardized, it can vary between $10\,800 \, \mathrm{m^2}$ and
$4050 \, \mathrm{m^2}$. The FIFA/UEFA suggestion is $7140 \, \mathrm{m^2}$.
I have used that.</p>
<p>I have used that A4 paper has an area of 1/16th square meter. On a soccer
field you can therefore fit $114\,240$ pages of A4 paper. On a page of A4
paper you might get 70 lines of code. That makes $7\,996\,800$ lines of
code on a soccer field.</p>
</li>
</ul></div>EnglishPhysicshttps://martin-ueding.de/posts/master-thesis-introduction/Sat, 05 Nov 2016 23:00:00 GMT
- Twin paradox resolvedhttps://martin-ueding.de/posts/twin-paradox/Martin Ueding<div><p>In special relativity, there is the twin paradox. I'll have to take a little
detour into special relativity to explain why it arises. If you know what that
is about and how it comes about, you can skip this section.</p>
<p>Due to the constant speed of light that special relativity is built around,
time will seem to pass slower for object and people moving relative to you. The
faster they go, the slower <em>their</em> passing of time will seem to you. In the
extreme case of almost reaching the speed of light, time will almost halt.</p>
<p>This can be verified with muons from the atmosphere of the earth. Muons have a
short lifetime $\tau$ after which they decay into lighter particles. The
lifetime is so short that there is no way that they could get very far. Yet you
can measure the flux of muons at high altitude and again at the ground and see
that less of them decayed than you would expect given their speed $v$ and
lifetime $\tau$. The characteristic distance $d$ that they can move is $d =
\tau v$. So what causes that?</p>
<!-- END_TEASER -->
<p>The muons move very fast relative to us. That means that their time will be
slowed down by a factor of $\gamma$ where</p>
<p>$$\gamma := \frac{1}{\sqrt{1 - \frac{v^2}{c^2}}}$$</p>
<p>is the so called <em>Lorentz factor</em>. When $v \to c$, we have $\gamma \to
\infty$. Since the time slows down by $\gamma$, time will pass more and more
slowly.</p>
<p>The huge velocity of the muons then extends the distance they can travel
without decaying. The characteristic length now is $d = \gamma \tau v$ which
can be sufficiently long to get all the way down to the ground to measure them.
So this an effect verified by experiment.</p>
<p>This is all fine until you ask one question:</p>
<blockquote>
<p>The muons do move relative to us. So to <em>them</em> it will look like <em>our</em> time
is slowed down as well. How does one decide who will be the younger one?</p>
</blockquote>
<p>Since you cannot see the age of a muon directly, it is easier to think about
twins. One of them stays on the earth, the other takes a spaceship and travels
with sufficient speed somewhere. Then he turns around and travels back with a
sufficient speed again. The two twins meet again on the earth. One of them will
be older than the other one. But which one? The astronaut twin could just say
that he was not moving the whole time but the one who stayed on earth was
receding from him.</p>
<p>The important thing to keep in mind is that the twins have to meet in one place
to compare their age. Any attempt to compare their age at different positions
will introduce non-trivial effect since the speed of light also is the maximum
speed of information. Once one of the twins would send a message containing his
age to the other, it would take a long time for the message to arrive if the
other twin is receding still. More fundamentally even, the very notion of
simultaneity depends on the velocity as well.</p>
<h2 id="usual-solutions">Usual solutions</h2>
<p>The problem is not really symmetric since the astronaut twin has to turn
around. Often people say that this will mark the astronaut as the one being
younger (which is the result indeed). The astronaut left the <em>intertial frame</em>
and the one on earth is the only valid inertial frame that you can take.</p>
<p>It is correct that the frame of the astronaut twin is not an inertial frame the
whole time, there is no SO(1, 3) transformation (or proper orthochronous
Lorentz transformation) that will get you from one frame to the other. This is
because the astronaut twin has to accelerate to get back to earth -- that will
make his frame non-inertial. Due to this, some people say that <em>general</em>
relativity is needed to resolve this problem.</p>
<h2 id="solution-with-special-relativity">Solution with special relativity</h2>
<p>You do not really need general relativity here. No masses that cause gravity
are taken into account here, and the Einstein equation $G^{\mu\nu} = 8 \pi G
T^{\mu\nu}$ will just become $G^{\mu\nu} = 0$ which is solved by flat space
(plus gravitational waves, if you'd like). The case of flat space is already
covered by special relativity (mostly) so we do not really need it here.</p>
<p>The curve length $s$, appropriately defined by something like</p>
<p>$$\mathrm ds^2 := \mathrm dt^2 - \mathrm dx^2 - \mathrm dy^2 - \mathrm dz^2$$</p>
<p>is Lorentz invariant, it will be the same in each inertial system. That is what
defines the Lorentz transformation, actually.</p>
<p>Since the movement is only in the $x$ direction and the $y$ and $z$ directions
are not affected, I will only write the $t$ and $x$ components with $y = 0$ and
$z = 0$ everywhere.</p>
<p>This curve length is the self-perceived time (i.e. age) of an observer
traveling along the curve. In order to obtain the time that has passed for
either twin, one needs to compute the curve length of their respective world
line. So what are the world lines? One can take a simple one which contains an
arbitrary short acceleration time. It will looks like that:</p>
<p><img alt="image1" src="https://martin-ueding.de/posts/twin-paradox/kink.svg"></p>
<p>I will choose the affine parameter of the curve to be $\sigma$. It will not be
the time $t$ or the self-time $s$, it is just a parameterization of the curve.
Since it does not change anything, I will set $c = 1$ here. The velocity $\beta
= v/c$ is the speed of the astronaut twin.</p>
<p>The world line of the twin staying on earth (hence $E$) looks like this:</p>
<p>$$\begin{aligned}
E(\sigma) =
\begin{pmatrix}
1 \ 0
\end{pmatrix}
\sigma
\end{aligned}$$</p>
<p>The astronaut twin will have a more complicated world line which consists of
two pieces:</p>
<p>$$\begin{aligned}
A(\sigma) =
\begin{cases}
\begin{pmatrix}
1 \ \beta
\end{pmatrix} \sigma
& \sigma < 1 \
\begin{pmatrix}
1 \ \beta
\end{pmatrix} +
\begin{pmatrix}
1 \ - \beta
\end{pmatrix} ((\sigma - 1)
& \sigma > 1
\end{cases}
\end{aligned}$$</p>
<p>The next step is to integrate the curve length over $\sigma$ from 0 to 2. At
$\sigma = 1$ one has the reversal of velocity from the astronaut twin. The
curve is not differentiable at $\sigma = 1$ but that does not hurt for the
integration. One could even exclude that one point since it has zero measure.</p>
<p>The first curve length is given by:</p>
<p>$$\begin{aligned}
s_E
= \int_0^2 \mathrm d \sigma \, |\dot E(\sigma)|
= \int_0^2 \mathrm d \sigma \,
\left|
\begin{pmatrix}
1 \ 0
\end{pmatrix}
\right|
= \int_0^2 \mathrm d \sigma \, \dot E^0(\sigma)
= \int_0^2 \mathrm d \sigma \, 1
= 2
\end{aligned}$$</p>
<p>The twin who stayed on earth will age by 2 units of time. What about the
astronaut twin?</p>
<p>The modulus squared of the derivative with respect to $\sigma$ is the same for
both parts of the curve. Therefore this can be written as <em>one</em> integral.</p>
<p>$$\begin{aligned}
s_A
= \int_0^2 \mathrm d \sigma \, |\dot A(\sigma)|
= \int_0^2 \mathrm d \sigma \,
\left|
\begin{pmatrix}
1 \ \beta
\end{pmatrix}
\right|
= \int_0^2 \mathrm d \sigma \, \sqrt{1 - \beta^2}
= 2 \sqrt{1 - \beta^2}
= \frac{2}{\gamma}
\end{aligned}$$</p>
<p>The minus sign in $\sqrt{1 - \beta^2}$ comes from the minus sign in the metric
tensor. This is where $\mathrm ds^2 := \mathrm dt^2 - \mathrm dx^2$ has been
used.</p>
<p>The interesting thing now is the ratio of the two which is just $\gamma$! That
means the faster the astronaut twin will be a factor of $\gamma$ younger than
the other one. The one on earth will be a factor of $\gamma$ older than the
astronaut.</p>
<p>This will hold true in any <em>intertial</em> frame that special relativity calls such
a frame. We cannot assume such a frame to exist for the astronaut. It only
exist for the first or the last part of his journey but not for the whole one.</p>
<h2 id="the-parabola">The parabola</h2>
<p>In case you do not like the kink, one can choose a different curve where the
astronaut travels on a parabola that is opened to the left. The
parameterization (with $\sigma \in [-1, 1]$ this time) of the world line then
looks like the following:</p>
<p>$$\begin{aligned}
A(\sigma) =
\begin{pmatrix}
\sigma \ \frac 12 \beta \sigma^2
\end{pmatrix}.
\end{aligned}$$</p>
<p>In the space-time diagram the two world-lines can be visualized like so:</p>
<p><img alt="image2" src="https://martin-ueding.de/posts/twin-paradox/parabola.svg"></p>
<p>The corresponding tangent vector along the curve then looks like this:</p>
<p>$$\begin{aligned}
\dot A(\sigma) =
\begin{pmatrix}
1 \ \beta\sigma
\end{pmatrix}.
\end{aligned}$$</p>
<p>The curve length can again be computed with the curve length integral. It will
be 2 again for the one who stayed on earth since that world-line did not
change. For the astronaut, the integral will involve $|\dot A(\sigma)|$ again.
Then I use the metric tensor to compute the scalar product, it just has a
$\sigma$ more than before in it. From there I can compute the integral.</p>
<p>$$\begin{aligned}
s_A
= \int_{-1}^1 \mathrm d \sigma \, |\dot A(\sigma)|
= \int_{-1}^1 \mathrm d \sigma \,
\left|
\begin{pmatrix}
\sigma \ \beta\sigma
\end{pmatrix}
\right|
= \int_{-1}^1 \mathrm d \sigma \, \sqrt{1 - [\beta\sigma]^2}
\end{aligned}$$</p>
<p>The exact value of this integral is not important. It is important to see that
for zero velocity of the twin ($\beta = 0$) the value is also 2, just as for
the twin on the earth. For any $\beta > 0$ the integrand will be smaller than 1
such that the integral will be less than 2. Therefore the astronaut will be
younger as well.</p>
<p>You see that the kink was not important, it really is the astronaut who will be
younger. Since the self-time $s$ is Lorentz-invariant, it must also hold in any
other interial frame.</p>
<h2 id="solution-with-general-relativity">Solution with general relativity</h2>
<p>In case you are still not convinced or just interested, there is also a way to
use the machinery of general relativity -- differential geometry -- to look at
this problem from the frame of the astronaut. The great power of the theory of
general relativity is that its symmetry group is GL(4), one can use any
diffeomorphism as a symmetry transformation. The curve that the astronaut takes
can be smoothly transformed to be at rest with such a diffeomorphism.</p>
<p>I will use the same parabola as before. Transforming the kink is not possible
with a diffeomorphism. And as I have shown the qualitative result is the same
for both kink and parabolic world-line.</p>
<p>So the transformation that will bring the astronaut from rest to its curve
looks like this:</p>
<p>$$\begin{aligned}
\phi(t, x) =
\begin{pmatrix}
t \ x + \frac12 \beta t^2
\end{pmatrix} \,.
\end{aligned}$$</p>
<p>That is the diffeomorphism that creates the coordinate transformation from the
system where the astronaut is at rest to the one we previously looked at. Now
the <em>pullback</em> operation will allow us to move from the tangent space of the
new system to the tangent space of the old system.</p>
<p>In the old system the metric tensor was given as:</p>
<p>$$\begin{aligned}
g_{\mu\nu} \simeq
\begin{pmatrix}
1 & 0 \
0 & -1
\end{pmatrix}
\end{aligned}$$</p>
<p>I wrote "$\simeq$" instead of "$=$" since $g$ is a covariant tensor while the
matrix is a mixed tensor. The notation is just convenient when one remembers
that it denotes the elements of $g_{\mu\nu}$.</p>
<p>The partial derivatives of the diffeomorphism (the Jacobi determinant) are:</p>
<p>$$\phi^0_{,0} = 1
\quad
\phi^1_{,0} = \beta t
\quad
\phi^0_{,1} = 0
\quad
\phi^1_{,1} = 1 \,.$$</p>
<p>The notation $\phi^\mu_{,\alpha}$ is a shorthand notation for writing $\partial
\phi^{\mu} / \partial x^\alpha$ which is common in general relativity.</p>
<p>With that in mind I can compute the transformed metric tensor with the
pullback:</p>
<p>$$[\phi^* g]<em _alpha_beta="\alpha\beta">{\mu\nu}
= g(\phi)</em> \phi^\alpha_{,\mu} \phi^\beta_{,\nu} \,.$$</p>
<p>Then the components of the transformed metric tensor are:</p>
<p>$$\begin{aligned}
\tilde g_{\mu\nu} \simeq
\begin{pmatrix}
1 - [\beta t]^2 & - \beta t \
- \beta t & -1
\end{pmatrix} \,.
\end{aligned}$$</p>
<p>The absolute magnitude of the velocity of both twins is now exchanged -- the
viewpoint is now with the astronaut.</p>
<p>For the twin staying on earth we now have the curve</p>
<p>$$\begin{aligned}
\begin{pmatrix}
\sigma \ - \frac12 \beta \sigma^2
\end{pmatrix} \,.
\end{aligned}$$</p>
<p>Its tangent vector (velocity) is given by the derivative with respect to
$\sigma$:</p>
<p>$$\begin{aligned}
\begin{pmatrix}
1 \ - \beta \sigma
\end{pmatrix} \,.
\end{aligned}$$</p>
<p>Then the same line integral can be used to compute the curve length. The
absolute magnitude must now use the new metric tensor. The magnitude of a
vector is always given as</p>
<p>$$|x| = \sqrt{x^\mu g_{\mu\nu} x^\nu} \,.$$</p>
<p>The square of the integrand (I do not want to write the square root everywhere)
now is:</p>
<p>$$\begin{aligned}
\begin{pmatrix}
1 \ - \beta \sigma
\end{pmatrix}^{\mathrm T}
\begin{pmatrix}
1 - [\beta \sigma]^2 & - \beta \sigma \ - \beta \sigma & -1
\end{pmatrix}
\begin{pmatrix}
1 \ - \beta \sigma
\end{pmatrix}
=
\begin{pmatrix}
1 \ - \beta \sigma
\end{pmatrix}^{\mathrm T}
\begin{pmatrix}
1 - [\beta \sigma]^2 + [\beta\sigma]^2 \ - \beta\sigma + \beta\sigma
\end{pmatrix} \,.
\end{aligned}$$</p>
<p>And this can further simplified:</p>
<p>$$\begin{aligned} =\begin{pmatrix}
1 \ - \beta \sigma
\end{pmatrix}^{\mathrm T}
\begin{pmatrix}
1 \ 0
\end{pmatrix}
=
1 \,.
\end{aligned}$$</p>
<p>Nothing has changed here although this is viewed from the astronaut's
perspective!</p>
<p>The squared integrand for the astronaut looks like the following:</p>
<p>$$\begin{aligned}
\begin{pmatrix}
1 \ 0
\end{pmatrix}^{\mathrm T}
\begin{pmatrix}
1 - [\beta \sigma]^2 & - \beta \sigma \ - \beta \sigma & -1
\end{pmatrix}
\begin{pmatrix}
1 \ 0
\end{pmatrix}
=
1 - [\beta \sigma]^2 \,.
\end{aligned}$$</p>
<p>That is the same integrand as before after taking the square root.</p>
<p>Therefore, nothing changes when one changes the reference to the astronaut.
This resolves the twin paradox completely and removes any ambiguity.</p></div>EnglishPhysicshttps://martin-ueding.de/posts/twin-paradox/Wed, 16 Sep 2015 22:00:00 GMT
- Symmetry Generatorshttps://martin-ueding.de/posts/symmetry-generators/Martin Ueding<div><p>Generators are a very important thing in group theory and therefore theoretical
physics. They are not hard to understand, though. You might have heard phrases
like the following:</p>
<blockquote>
<p>The quantum mechanical momentum generates translations.</p>
</blockquote>
<p>Or perhaps rather this:</p>
<blockquote>
<p>The quantum mechanical angular momentum generates rotations.</p>
</blockquote>
<p>What does that mean?</p>
<!-- END_TEASER -->
<p>I will start with a Taylor series. Say you have a simple function $f \colon
\mathbb R \to \mathbb R$. Its Taylor series around the point $x$ can then be
expanded like so:</p>
<p>$$f(x_0 + \alpha) = f(x_0) + f'(x_0) \alpha + \frac12 f''(x_0) \alpha^2 +
\mathrm O(\alpha^3) \,.$$</p>
<p>The whole series can be written with a summation sign. Then it looks like this:</p>
<p>$$f(x_0 + \alpha) = \sum_{n = 0}^\infty \frac{1}{n!}
\frac{\mathrm d^n f}{\mathrm d x^n} (x_0) \alpha^n \,.$$</p>
<p>I hope this is all fine up to this point. Now I will rewrite the same
expression by just slightly reordering the terms.</p>
<p>$$f(x_0 + \alpha) = \sum_{n = 0}^\infty \frac{\alpha^n}{n!}
\left[ \frac{\mathrm d^n}{\mathrm d x^n} f(x) \right]_{x = x_0}$$</p>
<p>One can even factor out the function from the sum. This means that the
derivative operator $\mathrm d/\mathrm dx$ has to act outside of the square
bracket. So be it.</p>
<p>$$f(x_0 + \alpha) = \left[ \sum_{n = 0}^\infty \frac{\alpha^n}{n!}
\left.\frac{\mathrm d^n}{\mathrm d x^n}\right|_{x = x_0} \right] f(x)$$</p>
<p>The sum is extracted from the function $f$ and the differential operator acts
outside of square bracket already. It is not too far fetched to write the sum
as an exponential function.</p>
<p>$$f(x_0 + \alpha) = \exp\left( \alpha
\left.\frac{\mathrm d}{\mathrm d x}\right|_{x = x_0} \right) f(x)$$</p>
<p>One now calls $\mathrm d / \mathrm d x$ the <em>generator</em>, sometimes denoted by
$T$ or so. The parameter $\alpha$ is the amount of the transformation desired.</p>
<p>All that was in the notation of mathematics. The notation used in physics has a
couple imaginary units in it. One does the following:</p>
<p>$$\exp(\alpha T) \mapsto \exp\left(\mathrm i \alpha [- \mathrm i T] \right) \,.$$</p>
<p>Then physicists still call $\alpha$ the parameter but the generator now is $-
\mathrm i T$. When you now look at the momentum operator in quantum mechanics,
$\hat p_x = - \mathrm i \hbar \partial_x$, you will see that this generates
translation in the $x$-direction with the Taylor series in mind.</p></div>EnglishGroup TheoryMathematicsPhysicshttps://martin-ueding.de/posts/symmetry-generators/Sat, 25 Jul 2015 22:00:00 GMT
- „Cool down St. Louis" kontraproduktiv?https://martin-ueding.de/posts/cool-down-st-louis/Martin Ueding<div><p>Heute ist es hier in Bonn sehr heiß und vereinzelt trifft man auf Gebäude mit
Klimaanlagen. Der Hörsaal gehört nicht dazu. Jedenfalls war ich in den USA in
der Nähe von St. Louis, wo es im Sommer noch deutlich wärmer ist als hier. Dort
gab es eine gemeinnützige Organisation „Cool down St. Louis", die Leuten, die
es sich nicht leisten konnten, eine Klimaanlage gegeben hat, damit sie auch bei
hohen Temperaturen keine Kreislaufprobleme bekommen.</p>
<p>Allerdings wärmen Klimaanlagen die Umgebung ja auch auf. Machen die das Problem
vielleicht <em>deutlich</em> schlimmer, als es vorher war?</p>
<p>Ich versuche das jetzt mal so grob zu überschlagen. Von der Sonnenstrahlung
kommen in der Entfernung, die die Erde zur Sonne hat, ungefähr 1300 Watt pro
Quadratmeter an. Also nehmen ich mal an, dass die Sonnenstrahlung 1000 Watt pro
Quadratmeter heizt, wo sie auftritt. Vielleicht wird ein Großteil reflektiert
und es ist deutlich weniger. Pro Quadratkilometer ist das dann ein Gigawatt an
Sonnenwärme.</p>
<!-- END_TEASER -->
<p>Irgendwann befindet sich das aufwärmende Haus und die Klimaanlage, die die
Wärme wieder nach draußen schafft, im Gleichgewicht. Zum Gesamtsystem wird am
Ende die Leistung, die die Klimaanlage aufnimmt, abgegeben. Irgendwo muss die
Energie ja hin.</p>
<p>Ich weiß nicht so recht, wie viel eine Klimaanlage so verbraucht. In Spanien
hatten wir auf dem Zimmer eine, die durch eine normale Steckdose betrieben
wurde. Von daher halte ich 3 kW für angemessen. Eine Klimaanlage in einem Auto
zieht wohl auch so um die 3 kW ab, und die muss ja nur den Innenraum des Autos
kühlen. Außerdem steht auf vielen Internetseiten, dass die Klimaanlage locker
den Großteil der Stromrechnung ausmachen kann.</p>
<p>Nun ist die Frage: Angenommen, jede Person in St. Louis hat eine Klimaanlage.
Wie viel heizen die Klimaanlagen die Umgebung im Verhältnis zur
Sonnenstrahlung?</p>
<p>Mit den Annahmen kann man dann einfach ausrechnen, wie viel Sonnenstrahlung im
großen und mittleren Stadtbereich sowie im Stadtkern einfällt. Dagegen kann man
die Leistung, die die Klimaanlagen verbrauchen, stellen. Das alles in in dieser
Tabelle zusammengefasst:</p>
<table>
<thead>
<tr>
<th>Bereich</th>
<th>Metro</th>
<th>Urban</th>
<th>City</th>
</tr>
</thead>
<tbody>
<tr>
<td>Fläche / km²</td>
<td>21910</td>
<td>2392</td>
<td>170</td>
</tr>
<tr>
<td>Einwohner</td>
<td>2810000</td>
<td>2150000</td>
<td>319000</td>
</tr>
<tr>
<td>Sonnenstrahlung / GW</td>
<td>21910</td>
<td>2392</td>
<td>170</td>
</tr>
<tr>
<td>Klimaanlagen / GW</td>
<td>8430</td>
<td>6450</td>
<td>95,7</td>
</tr>
<tr>
<td>Verhältnis</td>
<td>0,38</td>
<td>2,7</td>
<td>5,6</td>
</tr>
</tbody>
</table>
<p>Das ganze habe ich mit Mathematica ausgerechnet, hier das Notebook
<a href="https://martin-ueding.de/posts/cool-down-st-louis/Cool_down_St_Louis.nb"><code>Cool_down_St_Louis.nb</code></a>.</p>
<p>Wie zu erwarten, ist die Bevölkerungsdichte im Stadtgebiet größer. Also ist
nach meinen Annahmen auch das Verhältnis von Klimaanlagen zu Sonneneinstrahlung
größer. Im großen Stadtbereich erzeugen die Klimaanlagen nur ein drittel so
viel Abwärme wie die Sonne. Im Stadtkern sieht es allerdings ganz anders aus,
da käme allerdings die meiste Abwärme von den Klimaanlagen.</p>
<p>Wenn man deutlich weniger Kühlleistung pro Person annimmt, werden die
Verhältnisse alle etwas besser. Allerdings müsste man in der Innenstadt gut
unter 300 W pro Person annehmen, damit man argumentieren kann, dass es gegen
die Sonnenstrahlung nicht so schlimm ist. Oder dass nur jeder zehnte dort eine
Klimaanlage hat, was ich allerdings auch nicht gerade für realistisch halte.
Mit Einkaufszentren und anderen gewerblichen Räumen ist es pro Person
vielleicht sogar noch mehr.</p>
<p>Es ist schwer zu sagen, wie sich die Temperatur aufgrund der zusätzlichen
Wärmeleistung verändert. Allerdings könnte ich mir vorstellen, dass man mit
passiven Kühlmethoden (weiße Farbe, geschickt Bauen) ein deutlich angenehmeres
Stadtklima hinbekommt, als wenn jedes Gebäude eine Klimaanlage hat.</p></div>GermanPhysicshttps://martin-ueding.de/posts/cool-down-st-louis/Thu, 02 Jul 2015 22:00:00 GMT
- Crosstrainerhttps://martin-ueding.de/posts/stromverbrauch-crosstrainer/Martin Ueding<div><p>Wir haben hier seit kurzem einen Crosstrainer stehen. Und obwohl das Gerät ja
eigentlich dazu da ist Körperenergie zu verbrennen, muss man es in die
Steckdose stecken. Und zwar nicht mit so einem kleinen Steckernetzteil wie ich
es für den kleinen Computer darauf verstehen würde. Das Gerät hat ein recht
dickes Kabel und einen Schukostecker, der so aussieht, als würden da vielleicht
(mehrere) 100 Watt durchgehen. Aber wozu?</p>
<p>Als Physiker argumentiere ich natürlich über Energieerhaltung. Der
Sporttreibende führt dem System Crosstrainer Energie hinzu, diese muss das
System dann irgendwie loswerden, am einfachsten in Wärme umwandeln. Einen Teil
dieser Energie kann es auch nehmen und im Computer sinnvoll in Wärme umwandeln,
den also betreiben.</p>
<p>Das Dynamo am Fahrrad wandelt ein klein wenig der Leistung des Radfahrers in
elektrischen Strom um, damit die Beleuchtung betrieben werden kann. Die
restliche Leistung des Radfahrers geht in Reibung mit der Luft, den Boden und
in Lagern drauf.</p>
<p>Der Crosstrainer hat allerdings keinen großen Luftwiderstand und muss die
Energie daher anders loswerden. Ich habe verschiedene Ideen, wie man das
anstellen könnte. Am Ende kann ich wahrscheinlich erklären, warum das Teil ein
so fettes Stromkabel braucht.</p>
<h2 id="mechanische-bremse">Mechanische Bremse</h2>
<p>Natürlich könnte man einfach eine mechanische Bremse einbauen. So eine Bremse
hat aber den Nachteil, dass sie schnell verschleißt und vor allem unangenehm
skaliert. Eine Bremse auf Haft- und Gleitreibung übt immer das gleiche
Drehmoment auf die Achse aus. Wenn die Achse steht ist das Drehmoment noch
größer (Haftreibung), jedoch nur so viel um das Moment durch den Sportler zu
kompensieren. Man muss also erst einmal genug Drehmoment aufbringen, um über
die Haftreibung zu kommen, damit sich das System in Bewegung setzt. Dies hat
allerdings nichts mit Trägheit zu tun.</p>
<p>Die Regelung der Bremse braucht Mechanik, die in der Massenproduktion
wahrscheinlich teurer ist, als wenn man das ganze irgendwie durch Elektronik
steuern kann. Die Bremsbeläge nutzen sich ab, so dass man dies mit der Zeit
nachregeln und wechseln muss.</p>
<p>Und schließlich ist die Leistungsmessung auch nur mechanisch möglich. Man
müsste messen, welches Moment die Bremse tatsächlich auf die Achse bringt und
das mit der Drehfrequenz multiplizieren, um auf die Leistung zu kommen.</p>
<h2 id="wirbelstrombremse-mit-permanentmagnet">Wirbelstrombremse mit Permanentmagnet</h2>
<p>Viele der vorherigen Nachteile hat man nicht, wenn man das Ganze mit einer
Wirbelstrombremse konstruiert. Diese Art von Bremse ist (bis auf die Lager)
verschleißfrei. Der Widerstand wächst ungefähr linear mit der
Rotationsgeschwindigkeit an, man kann also ganz sanft anfahren.</p>
<p>Zuerst betrachte ich eine Wirbelstrombremse, die mit einem Permanentmagneten
und einer Metallplatte konstruiert ist. Dadurch braucht man keinerlei Elektrik
in dem System. Die Regelung des Widerstandes kann man durch den Abstand vom
Magnet und der Metallplatte steuern.</p>
<p>Es gibt einige Nachteile:</p>
<ul>
<li>Die Regelung braucht weiterhin Mechanik.</li>
<li>Das Gehäuse muss entsprechend breit sein damit man den Widerstand über
einen großen Bereich regeln kann.</li>
<li>Magnete sind wahrscheinlich teuer und man braucht hier einen eher Größeren.</li>
<li>Die Leistungsmessung muss man irgendwann mal mechanisch vornehmen und
kalibrieren damit man aus dem Abstand von Magnet zur Platte auf das
bremsende Drehmoment schließen kann.</li>
</ul>
<h2 id="elektrischer-generator">Elektrischer Generator</h2>
<p>Meine nächste Idee ist einfach einen Generator, der vielleicht 300 Watt
generieren könnte, als Bremse zu nehmen. Den Strom, der vom Generator kommt,
lässt man dann durch einen entsprechend großen variablen Widerstand gehen.</p>
<p>Den Widerstand kann man entweder mechanisch einstellen oder mit entsprechender
Elektronik sogar digital einstellen. Man braucht natürlich entsprechende
Schalttransistoren die die Leistung abkönnen. So etwas sollte es aber geben; so
etwas braucht man ja auch in Endstufen für Musikanlagen oder auch für die
nächste Idee einer Bremse.</p>
<p>Von den Kosten sollte das eigentlich machbar sein. Man braucht entweder einen
Magnet und Elektromagneten oder man nimmt zwei Elektromagneten. Beim Anfahren
sorgt die Remanenz in beiden Teilen dafür, dass ein wenig Strom erzeugt wird.
Dieser wird dann sofort als magnetische Erregung <em>H</em> dafür sorgen, dass die
magnetische Flussdichte <em>B</em> ansteigt.</p>
<p>Die Regelung kann komplett mit einem elektronisch gesteuerten Widerstand
passieren. Die Leistungsmessung ist sehr einfach elektronisch direkt am
designierten Verlustwiderstand machbar.</p>
<h2 id="wirbelstrombremse-mit-elektromagnet">Wirbelstrombremse mit Elektromagnet</h2>
<p>Ich fürchte allerdings, dass das ganze anders gebaut ist. Alle bisherigen
Möglichkeiten erklären nicht warum das Gerät so ein dickes Stromkabel braucht.
Dass man sich das Dynamo spart und dafür ein kleines Steckernetzteil beilegt
kann ich bei einem günstigen Gerät ja noch verstehen.</p>
<p>Jedenfalls könnte man für die Bremse auch eine Wirbelstrombremse mit einem
Elektromagneten und einer Metallplatte bauen, so dass man einfach den
Elektromagneten mit Strom versorgt und die Energie des Sportlers dann in
Wirbelströmen in der Metallplatte versenkt wird.</p>
<p>Elektromagneten, die nicht aus Supraleiter gebaut sind, haben allerdings das
Problem, dass sie durchlaufend Strom benötigen um ein konstantes Magnetfeld
aufrecht erhalten können. Bei diesem Aufbau muss man also Strom verbrennen,
damit die Energie des Sportlers verbrannt werden kann. Und so ein Elektromagnet
kann durchaus Strom fressen, wenn er groß genug sein soll. Im
<a href="http://martin-ueding.de/de/studies/bsc_physics/physik212/index.html">Elektromagnetismuspraktium</a>
hatten wir auch Helmholtzspulen für das Fadenstrahlrohr. Und da sind einige
Ampere Strom durchgegangen, damit wir ein ausreichendes Magnetfeld hatten.</p>
<p>Die Leistungsmessung ist hier allerdings wieder schwerer geworden. Man muss
jetzt wieder wissen welches Drehmoment die Wirbelstrombremse bei welchem Strom
auf die Achse ausübt. Man kann da wahrscheinlich eine Messreihe machen und das
mit einem Polynom ausreichend Ordnung phänomenologisch beschreiben und als
angepasste Funktion im Computerprogramm hinterlegen.</p>
<p>Mit dem Hintergrund ist es ziemlich irritierend, dass der Crosstrainer Strom
benötigt um betrieben zu werden. Aber vielleicht ist es ganz anders und das
Teil speist in das Stromnetz ein?</p></div>GermanPhysicshttps://martin-ueding.de/posts/stromverbrauch-crosstrainer/Sun, 28 Jun 2015 22:00:00 GMT
- Lagrange Exampleshttps://martin-ueding.de/posts/lagrange-examples/Martin Ueding<div><p>With Piwik, I noticed that <a href="https://martin-ueding.de/posts/euler-lagrange-equation-derivation/">my article about the Euler Lagrange
equation</a> is one of the most
popular pages on my site. During the <a href="https://martin-ueding.de/pages/physik441/">Numerik</a> course, I
created an animation for the double pendulum. Using that source code, I went
through some other simple mechanical problems that were covered in my
<a href="https://martin-ueding.de/pages/physik221/">classical mechanics lecture</a> and made animations for those.
The derivation of the differential equations are on the pages as well.</p>
<p>You can find the source code for both the numerical integration of the
differential equations, as well as the animation itself, on the project page:
<a href="https://github.com/martin-ueding/lagrange-simulator">https://github.com/martin-ueding/lagrange-simulator</a>.</p>
<script type="text/javascript" src="https://code.jquery.com/jquery-2.0.0.min.js"></script>
<script type="text/javascript" src="https://martin-ueding.de/posts/lagrange-examples/raphael-min.js"></script>
<script type="text/javascript" src="https://martin-ueding.de/posts/lagrange-examples/animation.js"></script>
<p><button onclick="start(double_pendulum);">Double Pendulum</button>
<button onclick="start(spring_pendulum);">Spring Pendulum</button>
<button onclick="start(simple_pendulum);">Simple Pendulum</button>
<button onclick="start(ball_in_cone);">Ball in Cone</button></p>
<div id="holder"></div>
<script type="text/javascript">start(double_pendulum);</script>
<!-- END_TEASER -->
<h2 id="double-pendulum">Double pendulum</h2>
<p>See <a href="https://en.wikipedia.org/wiki/Double_pendulum">this Wikipedia article</a> for
the derivation.</p>
<h2 id="spring-pendulum">Spring pendulum</h2>
<p>A mass is attached to a spring. It can swing (angle $\phi$) and change the
distance to the anchor (radius $r$). The spring is assumed to have no mass and
can be neglected in terms of kinetic energy.</p>
<p><img alt="" src="https://martin-ueding.de/posts/lagrange-examples/fbd-spring.png"></p>
<h3 id="derivation">Derivation</h3>
<p>The spring has an equlibrium position when $r = l$.</p>
<p>Kinetic and potential energy:</p>
<p>$$T = \frac m2 \left[\dot r^2 + r^2 \dot\phi^2 \right]$$$$V = - mg r \cos(\phi) + \frac k2 [r - l]^2$$</p>
<p>Lagrangian:</p>
<p>$$L = \frac m2 \left[\dot r^2 + r^2 \dot\phi^2\right] + mg r \cos(\phi) - \frac k2 [r - l]^2$$</p>
<p>Euler Lagrange equation for `r`:</p>
<p>$$\frac{\partial L}{\partial r}
= \frac{\mathrm d}{\mathrm dt} \frac{\partial L}{\partial \dot r}$$$$\frac{\partial L}{\partial r}
= m r \dot\phi^2 + mg \cos(\phi) - k [r - l]$$$$\frac{\mathrm d}{\mathrm dt} \frac{\partial L}{\partial \dot r}
= \frac{\mathrm d}{\mathrm dt} m \dot r
= m \ddot r$$</p>
<p>Euler Lagrange equation for `phi`:</p>
<p>$$\frac{\partial L}{\partial \phi}
= \frac{\mathrm d}{\mathrm dt} \frac{\partial L}{\partial \dot\phi}$$$$\frac{\partial L}{\partial \phi} = - mg r \sin(\phi)$$$$\frac{\mathrm d}{\mathrm dt} \frac{\partial L}{\partial \dot\phi}
= \frac{\mathrm d}{\mathrm dt} m r^2 \dot\phi
= m \left[2 r \dot r \dot \phi + r^2 \ddot \phi\right]$$</p>
<p>Bring it into the form of `y' = f(y)`:</p>
<p>$$\ddot r = r \dot \phi^2 + g \cos(\phi) - \frac km [r - l]$$$$\ddot \phi = - \frac gr \sin(\phi) - 2 \frac{\dot r}r \dot\phi$$</p>
<p>The input for the ODE solver is:</p>
<p>$$\begin{aligned}
\frac{\mathrm d}{\mathrm dt}
\begin{pmatrix}
r \ \phi \ \dot r \ \dot \phi
\end{pmatrix}
=
\begin{pmatrix}
\dot r \ \dot \phi \
r \dot \phi^2 + g \cos(\phi) - \frac km [r - l] \
- \frac gr \sin(\phi) - 2 \frac{\dot r}r \dot\phi
\end{pmatrix}
\end{aligned}$$</p>
<h2 id="ball-in-cone">Ball in cone</h2>
<p>A ball rolls within a circular cone. Friction and the moment of inertia of the
ball is neglected completely. You can think of the ball as a point mass.</p>
<p><img alt="Diagram of the ball in a cone" src="https://martin-ueding.de/posts/lagrange-examples/fbd-cone.png"></p>
<h3 id="derivation_1">Derivation</h3>
<p>Kinetic and potential energy:</p>
<p>$$T = \frac m2 \cdot (\dot r^2 + r^2 \dot\phi^2 + \dot z^2)$$$$V = mgz$$</p>
<p>Holonomic constraint:</p>
<p>$$r - z \tan(\alpha) = 0$$</p>
<p>To increase readability, I will use $\beta := \tan(\alpha)$.</p>
<p>I put that constraint into the energies and yield a Lagrangian without $r$:</p>
<p>$$L = \frac m2 \cdot \left((\beta^2 + 1) \dot z^2 + \beta^2 z^2 \dot\phi^2 \right) - mgz$$</p>
<p>Euler Lagrange equation:</p>
<p>$$\frac{\partial L}{\partial z} = \frac{\mathrm d}{\mathrm dt} \frac{\partial L}{\partial \dot z}$$$$\frac{\partial L}{\partial z}
= m \beta^2 z \dot\phi^2 - mg$$$$\frac{\mathrm d}{\mathrm dt} \frac{\partial L}{\partial \dot z}
= \frac{\mathrm d}{\mathrm dt} m \cdot (\beta^2 + 1) \dot z
= m \cdot (\beta^2 + 1) \ddot z$$$$m \beta^2 z \dot\phi^2 - mg
= m \cdot (\beta^2 + 1) \ddot z$$$$\frac{\partial L}{\partial \phi} = \frac{\mathrm d}{\mathrm dt} \frac{\partial L}{\partial \dot\phi}$$$$\frac{\partial L}{\partial \phi} = 0$$$$\frac{\mathrm d}{\mathrm dt} \frac{\partial L}{\partial \dot\phi}
= \frac{\mathrm d}{\mathrm dt} m \beta^2 z^2 \dot\phi
= m \beta^2 \cdot (2 z \dot z \dot\phi + z^2 \ddot\phi)$$$$0 = m \beta^2 \cdot (2 z \dot z \dot\phi + z^2 \ddot\phi)$$</p>
<p>Bring it into the form of $y' = f(y)$:</p>
<p>$$\ddot z = \frac{\beta^2 z \dot\phi^2 - g}{\beta^2 + 1}$$$$\ddot \phi = - 2 \frac{\dot z}{z}$$</p>
<p>The input for the ODE solver is:</p>
<p>$$\begin{aligned}
\frac{\mathrm d}{\mathrm dt}
\begin{pmatrix}
z \ \phi \ \dot z \ \dot \phi
\end{pmatrix}
=
\begin{pmatrix}
\dot z \ \dot \phi \
\frac{\beta^2 z \dot\phi^2 - g}{\beta^2 + 1} \
- 2 \frac{\dot z}{z}
\end{pmatrix}
\end{aligned}$$</p>
<h2 id="simple-pendulum">Simple pendulum</h2>
<p>A mass is attached to a string, which is anchored in the ceiling. The string is
assumed to be massless. Also, there shall be no friction.</p>
<p><img alt="" src="https://martin-ueding.de/posts/lagrange-examples/fbd-simple.png"></p>
<h3 id="derivation_2">Derivation</h3>
<p>Kinetic and potential energy:</p>
<p>$$T = \frac m2 l^2 \dot\phi^2$$$$V = - mg l \cos(\phi)$$</p>
<p>Lagrangian:</p>
<p>$$L = \frac m2 l^2 \dot\phi^2 + mg l \cos(\phi)$$</p>
<p>Euler Lagrange equation:</p>
<p>$$\frac{\partial L}{\partial \phi} = \frac{\mathrm d}{\mathrm dt} \frac{\partial L}{\partial \dot\phi}$$$$- mg l \sin(\phi) = ml^2 \ddot \phi$$</p>
<p>Bring it into the form of $y' = f(y)$:</p>
<p>$$\ddot \phi = - \frac gl \sin(\phi)$$</p>
<p>The input for the ODE solver is:</p>
<p>$$\begin{aligned}
\frac{\mathrm d}{\mathrm dt}
\begin{pmatrix}
\phi \ \dot \phi
\end{pmatrix}
=
\begin{pmatrix}
\dot \phi \ - \frac gl \sin(\phi)
\end{pmatrix}
\end{aligned}$$</p></div>EnglishPhysicshttps://martin-ueding.de/posts/lagrange-examples/Fri, 13 Sep 2013 22:00:00 GMT
- Math Abusehttps://martin-ueding.de/posts/math-abuse/Martin Ueding<div><p>When I shows a mathematician some of the homework problems I have done, he was
a little shocked. So I went on and looked for other notations that physicists
use differently or even naively.</p>
<!-- END_TEASER -->
<h2 id="multiple-integrals">Multiple Integrals</h2>
<p>This is a multiple integral in the regular notation:</p>
<p>$$\int_0^R \int_0^\pi \int_0^{2\pi} f(r, \theta, \phi) \, \mathrm d\phi \,
\mathrm d\theta \, \mathrm dr$$</p>
<p>Theoretical physicists often use the following notation:</p>
<p>$$\int_0^R \mathrm d r \int_0^\pi \mathrm d \theta
\int_0^{2\pi} \mathrm d \phi \, f(r, \theta, \phi)$$</p>
<p>The integrals is not $\int 1 \, \mathrm dx$, but the integrate everything
<em>after</em> the $\mathrm dx$! The advantage is that, just like with a summation
sign, you can see the bounds right away, and swap integrals easier.</p>
<h2 id="summation-convention">Summation Convention</h2>
<p>Let me start with the regular inner product. A mathematician would write it
either $\langle v, w \rangle$ or $(v, w)$.</p>
<p>Physicists might use the mathematician's notation, or write $\vec v \cdot \vec
w$, since we write vector arrows (or bold vectors, then $\boldsymbol v \cdot
\boldsymbol w$). I prefer bold vectors by now, so I will continue to use them.</p>
<p>If you assume a particular basis for your vector space, then you can access the
components of your vectors with upper index, if they are regular
(contravariant) vectors: $v^i$. If you transpose the vector, it will become a
covector (covariant) and has a lower index: $v_i$. If your "transpose" is a
"complex conjugate transpose", the inequality $v^i \neq v_i$ holds in the
general case.</p>
<p>With that in mind, you can write the scalar product like so:</p>
<p>$$\langle \boldsymbol v, \boldsymbol w \rangle
= \boldsymbol v^{\mathrm T} \boldsymbol w
= \sum_i v_i w^i$$</p>
<p>And physicists like to omit the summation sign and just write $v_i w^i$ for the
scalar product.</p>
<p>The transpose is induced by a metric tensor $\eta$, so the transpose works like
this: $x_i = \eta_{ij} x^j$.</p>
<p>As a corollary, the physicist distinguishes between Latin and Greek indexes in
the summation. A Latin index $i$ means $\sum_{i=1}^3$, where a Greek index
$\mu$ means $\sum_{\mu=0}^3$. This is important for the theory of special
relativity, where the zeroth coordinate is the time. That way, one can
construct the Laplace ($\triangle$) and d'Alambert ($\Box$) Operators:</p>
<p>$$\begin{aligned}
\triangle &= \partial_i \partial^i
= \sum_{i=1}^3 \frac{\partial^2}{\partial x_i^2} \
\Box &= \partial_\mu \partial^\mu
= \sum_{\mu=0}^3 \frac{\partial^2}{\partial x_i^2}
= \frac{\partial^2}{\partial t^2} - \triangle
\end{aligned}$$</p>
<p>Where I have used the metric tensor of special relativity:</p>
<p>$$\begin{aligned}
\eta = \begin{pmatrix}
1 & 0 & 0 & 0 \
0 & -1 & 0 & 0 \
0 & 0 & -1 & 0 \
0 & 0 & 0 & -1 \
\end{pmatrix}
\end{aligned}$$</p>
<h2 id="separation-of-variables">Separation of Variables</h2>
<p>Say a physicist is given the following ordinary differential equation: $f'(x) =
f(x)$. He might do the following:</p>
<p>$$\begin{aligned}
\frac{\mathrm df}{\mathrm dx} &= f \
\mathrm d f &= \mathrm d x f \
\frac{\mathrm d f}{f} &= \mathrm d x \
\int \frac{\mathrm d f}{f} &= \int \mathrm d x \
\ln(f) &= x + C \
f(x) &= C \exp(x)
\end{aligned}$$</p>
<h2 id="symbol-overloading">Symbol overloading</h2>
<p>In some programming languages, it is possible to overload a function with
different arguments:</p>
<pre class="code literal-block"><span></span><code><span class="kt">int</span> <span class="nf">f</span><span class="p">();</span>
<span class="kt">int</span> <span class="nf">f</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">);</span>
<span class="kt">int</span> <span class="nf">f</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">,</span> <span class="kt">int</span> <span class="n">y</span><span class="p">);</span>
</code></pre>
<p>Those all have the same name, but the compiler will be able to distinguish that
from your arguments. <code>f(1)</code> and <code>f(1, 1)</code> will call two distinct functions.</p>
<p>The same applies for different types. So this is also possible:</p>
<pre class="code literal-block"><span></span><code><span class="kt">int</span> <span class="nf">h</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">);</span>
<span class="kt">int</span> <span class="nf">h</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="n">x</span><span class="p">);</span>
</code></pre>
<p>When you call <code>h(1)</code> and <code>h("")</code>, it will pick out the right function to call.
Unless the functions do a roughly the same thing, this will lead to confusion
pretty easily.</p>
<p>Now physicists are doing it even worse! Say you have a function $f \colon
X \subseteq \mathbb R \mapsto Y \subseteq \mathbb R$. Then you use $x \in X$
and write stuff like $f(x)$ which is fine. Now you create the Fourier transform
that I will denote with a $\mathcal F$. With $\omega \in \Omega \subseteq
\mathbb R$ you could write:</p>
<p>$$g(\omega) = <a href="https://martin-ueding.de/posts/math-abuse/%5Comega">\mathcal F f</a> = \frac1{\sqrt{2\pi}} \int \mathrm dx \,
f(x) \exp(- \mathrm i x \omega)$$</p>
<p>This seems correct and unambiguous to me. Physicists like decorators more than
different letters, so they would write $\hat f$ instead of $g$, which should be
fine as well.</p>
<p>But even that seems to much writing. So they write $f(\omega)$ for the Fourier
transformed function and $f(x)$ for the original function. That means that even
if $\omega = x$, it does not need to follow that $f(\omega) \neq f(x)$, since
they are different functions! The function is overloaded, and the correct one
is chosen by the symbol you write the argument with. I think this is a gross
violation of scoping.</p>
<p>I have seen a case where somebody wrote $f(\omega = 0)$ to make sure the reader
understands that the transformed function is meant. Just write $\hat f(0)$!</p></div>EnglishMathNotationPhysicshttps://martin-ueding.de/posts/math-abuse/Wed, 31 Jul 2013 22:00:00 GMT
- Derivation of the Euler-Lagrange-Equationhttps://martin-ueding.de/posts/euler-lagrange-equation-derivation/Martin Ueding<div><p>We would like to find a condition for the Lagrange function $L$, so that
its integral, the action $S$, becomes maximal or minimal.</p>
<!-- TEASER_END -->
<p>For that, we change the coordinate $q(t)$ by a little variation
$\eta(t)$, although infinitesimal. Additionally,
$\eta(t_1) = \eta(t_2) = 0$ has to hold. The integral of the Lagrange
function becomes:</p>
<p>$$S = \int_{t_1}^{t_2} L\left(q(t) + \epsilon \eta(t), \dot q(t) + \epsilon
\dot \eta(t), t \right) \, \mathrm dt$$</p>
<p>This should be extremal with respect to $\epsilon$. So we need to
differentiate with respect to that and set equal to `0`:</p>
<p>$$\frac{\mathrm d}{\mathrm d \epsilon} \int_{t_1}^{t_2} L\left(q(t) +
\epsilon \eta(t), \dot q(t) + \epsilon \dot \eta(t), t \right) \, \mathrm
dt = 0$$</p>
<p>For this total derivative, the partial derivatives of $L$ and
$q(t) + \epsilon
\eta(t)$ and $\dot q(t) + \epsilon \dot \eta(t)$ have to be found.</p>
<p>$$\int_{t_1}^{t_2} \left( \frac{\partial L}{\partial q} \eta + \frac{\partial
L}{\partial \dot q} \dot \eta \right) \, \mathrm dt = 0$$</p>
<p>For the second summand, we use partial integration:</p>
<p>$$\int_{t_1}^{t_2} \frac{\partial L}{\partial \dot q} \dot \eta(t) \, \mathrm
dt = \underbrace{\left[ \frac{\partial L}{\partial \dot q}
\eta\right]<em>{t_1}^{t_2}}</em> - \int_{t_1}^{t_2} \frac{\mathrm d}{\mathrm
dt} \frac{\partial L}{\partial \dot q} \eta(t) \, \mathrm dt$$</p>
<p>The middle term is equal to $0$ since $\eta(0)$ vanished on the boundary
points. Therefore, the last term remains.</p>
<p>$$\int_{t_1}^{t_2} \left( \frac{\partial L}{\partial q} \eta(t) -
\frac{\mathrm d}{\mathrm d t} \frac{\partial L}{\partial \dot q} \eta(t)
\right) \, \mathrm dt = 0$$</p>
<p>Now we can factor out that $\eta(t)$. The integral vanished for all
variations $\eta(t)$ iff the parentheses vanishes.</p>
<p>$$\int_{t_1}^{t_2} \left( \frac{\partial L}{\partial q} - \frac{\mathrm
d}{\mathrm d t} \frac{\partial L}{\partial \dot q} \right) \eta(t) \,
\mathrm dt = 0$$</p>
<p>We yield the Euler-Lagrange-Equation:</p>
<p>$$\frac{\partial L}{\partial q} - \frac{\mathrm d}{\mathrm d t}
\frac{\partial L}{\partial \dot q} = 0$$</p></div>EnglishPhysicshttps://martin-ueding.de/posts/euler-lagrange-equation-derivation/Tue, 11 Jun 2013 22:00:00 GMT