it seemed profound at the time: Nonlinear Function
Created: August 28, 2021
Modified: August 28, 2021

it seemed profound at the time

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Notes copied from Google Docs https://docs.google.com/document/d/1G7Gxo-A3gQrlUx3G4BYmH-GjUWbB7eTOJZcuAs2ii0A/edit most of these things will mean something only to me.

  • 8/2/2008
  • it seems that one problem of thinking about things analytically is that even brilliant, profound conclusions can blind you to other simple aspects of how things work, so that your unique insight (even if true) actually makes you less capable at dealing with the real world. for example, the ai/computational idea of mind, that all information can come through a random search process. this can lead to visions of some brute force generator cranking away inside people's minds, not contributing to their consciousness except for when it stumbles upon a promising thought. clearly this is not what happens. the process isn't random; ideas are formed in response to external events, like when you go to the therapist/doctor/grocery store/etc. and someone says something that activates certain connections, which end up leading to something that you recognize as having the potential for a good cartoon. this is, of course, obvious, except when you're too blinded by other ideas. when all you've got is a hammer, everything looks like a nail.
  • 11/27/2008 - roughly 3am
  • reading the annotated turing - I finally have a proof of the non-computability of the halting problem that I can understand and hold in my mind all at once. Turing points that the computable numbers are enumerable because we can encode turing machines as natural numbers, which are of course enumerable. For any particular enumeration of Turing machines, given a particular natural number, it either (a) doesn't map to a valid turing machine, (b) maps to a machine which eventually halts, or (c) maps to a non-halting machine. Case (a) is easy to identify since the transformation between numbers and machine descriptions is purely typographical, and we could easily check to see if the results make sense.
  • Now suppose we had some method to distinguish between cases (b) and (c), i.e. machines which eventually halt and machines which don't halt. Then by simply iterating through the natural numbers and throwing away all invalid machines as well as all machines which eventually halt, we are left with a list of non-halting machines. For simplicity use turing's idea of "circle-free" instead of real non-halting-ness, thus restricting our machines to those which forever continue making forward progress in computing some real number (it remains to show that the ideas are equivalent)*. Since each non-halting (circle-free) machine generates a computable number, we could use this list of machines to generate a table of the computable numbers to any desired degree of precision. Given this table we could apply Cantor's diagonalization method to create a number B which was not in the table because it differs by one digit from every number in the table, and since the table lists all computable numbers and B is not in the table, B could not be computable - even though we just defined an algorithm to compute it! (i.e. the algorithm is to generate the table and then, for each digit of B, choose 0 if the corresponding diagonal is 1 and 1 if the diagonal is 0). Since this is a contradiction, we must have done something wrong somewhere. The process of diagonalization is clearly computable, since to generate the nth digit of B we just need to run the nth machine long enough to generate n digits (which we know it will eventually do since we've only chosen machines that print an infinite number of digits). So, it must be the process of generating the table which is not computable. To generate the table, we enumerated the natural numbers, chose those which represent valid turing machines, and then further chose only those which represent "circle-free" machines. Since enumerating the natural numbers is clearly computable, and discarding invalid machines is also clearly computable, we are left to conclude that identifying circle-free machines cannot be computable.
    • Now how to relate "circle-free" machines, which must adhere to Turing's strict conventions, to the general idea of non-halting machines? We reduce CIRCLE-FREE to HALT and thus show that if we could decide HALT, we could decide CIRCLE-FREE (this is my reduction, not turing's, since it seems he didn't really think about the halting problem in its current form). Given any (arbitrary) machine M, we define a machine M' which halts iff M is not circle-free. First we define a utility machine F:
  • Define F(M, c):
    1. Simulate M from a given starting configuration, and halt if and when M ever writes a number to a new F-square.
  • Then we define M' to simulate M, and at each configuration c, halt if M has just broken one of Turing's conventions, i.e. if it has printed a non 1 or 0 digit to an F-square or changed an F-square which had been previously written to, OR if F(M,c) does not halt (which we can of course decide since we're assuming we can decide the halting problem).
  • Clearly if M' halts, it is because M has either broken a convention or entered a hopeless state in which it will never write any further F-values, so M' halting implies M circular. Also clearly, since M' will halt if M ever breaks a convention or enters a hopeless state, that fact that it does not halt on any particular M means that M must be circle-free, i.e. we know it never breaks conventions or enters a hopeless state. Thus by deciding HALT, we can decide CIRCLE-FREE. Since that's impossible as shown above, the halting problem is undecidable.
  • (btw, we can also reduce HALT to CIRCLE-FREE, which is much simpler - define a machine, essentially Turing's U, which simulates M and prints its successive configurations, but enters a circular state if M ever halts).
  • q: the set of non-halting machines is enumerable in terms of size (i.e. it has the same cardinality as the natural numbers) but not in terms of computability, i.e. we can never actually perform the enumeration. What other sets are like that?
  • after some further reading, Turing's other proof of the non-decidability of the CIRCLE-FREE problem is interesting too. Imagine that there exists some circle-free machine D which decides if an input machine M is circle-free by running for a finite number of steps and then outputting some marker. Then we can define a circle-free machine H which iterates over the natural numbers and for the nth number, calls D to decide whether it represents a circle-free machine (this is similar to what we required above), and if so, outputs the r(n)th digit to the next F-square based on the result (where r(n) is the number of circle-free machines found so far, so we are essentially computing the number B above, or more precisely its negative image). Eventually H will reach its own description number K and must decide whether it represents a circle-free machine. The answer can't be "no" since H is defined to be circle-free, but the answer can't be "yes", because if the answer were "yes", the next step in H's operation would be to simulate itself for the first R(K) digits so that it could output the R(K)th digit to its tape. This works fine for the first R(K-1) digits but on the R(K)th it reaches its own number again, and so must begin the simulation again, and so on - this is an infinite loop, which means that H is circular!
  • 12/17/2008
  • a thought on AI and the distinction between search and production.I wonder if this sort of framing is "out there" already; maybe everyone considers it obvious.
  • pretty much anything can be framed as a "search" problem. for example, conversation: if I want to know what the best next word is to say at a given point, I could search through the space of all possible words and use some fitness function to rate each one as for how well it would serve my goals in the conversation (designing such a function would of course be hard). but this isn't what people actually do, even subconsciously (is this true? could it be false? do people actually consider every word they know every time they speak, in some massively parallel fashion? even if so, there's no way they could search the "game tree" very deeply) - instead we have some system which produces sentences based on input. I think this is part of what makes us intelligent.
  • Current chess programs use only search-based strategies -i.e. at least initially they consider every possible move (is this true?). They have success with this because the search space and evaluation function for chess are much more tractable than for conversation, but it's still terribly expensive and no one suggests that this is how humans naturally play. I rather doubt that a human player even subconsciously considers every possible move in a chess game. Instead we have some sort of "production" system (presumably developed through a learning process) maybe similar to that of conversation, which comes up with our next move, or sometimes a list of candidate moves. Figuring out how to duplicate that is a major challenge of AI.
  • I wrote this without looking at the 8/2 entry. funny how similar they are.
  • 12/18/2008
  • learning (at least in fuzzy subjects like philosophy) is in some ways like a lossy compression algorithm. a philosopher says lots of things, producing mountains of text using some small internal core of thought. since the "program" to produce their writings is smaller than the writings themselves (this is debatable, but assume that a philosopher can produce an arbitrary amount of text given enough time), there is information redundancy in the writings. when you read said writings, you create some small internal representation of what they say (i'm calling it "small" because it generally takes less space to store than the full text). you're judged in your understanding of the writings by how well the output of your representation - i.e. how well you can explain the authors views - matches what the original philosopher said (this is also only partly true - you're also judged by your ability to connect his ideas to other ideas you know about, argue against them, etc., which are other aspects of intelligence albeit maybe with some similarities. not sure how or if that fits in here). if your representation produces the exact same, or similar results*, then it seems plausible to assume that your internal representation is similar to his, and that you "understand" the other person. this is partly because in order to compress their large corpus of writings down to a small mental core which nonetheless keeps all the essentials, you must somehow be taking advantage of the fundamental structure of the information in the philosopher's writings. meanwhile, if you don't understand a philosopher, i.e. if everything they say seems like random gibberish to you, then presumably you don't see that fundamental structure and so you won't be able to effectively compress their writings.
    • you wouldn't expect word-for-word recreations because even the original philosopher doesn't repeat himself word-for-word, so even if you had perfectly reproduced his mechanism you wouldn't produce the same text, just text which cannot be differentiated from that which he would have produced. the question then becomes: how could someone judge your understanding, i.e. how similar your output is to the original, unless they themselves had a perfect understanding of whatever underlying regularities to look for?
  • for any given turing machine/input combination (really equivalent to just a given machine since I think we can limit ourselves to those which work on the empty string) you can construct a formal system which has the machine's initial configuration (the input, the start state, with the head at the left of the tape) as an axiom and the state transitions of the machine as transition rules. then every possible configuration of the machine is a "theorem" of the system. presumably some machines correspond to systems which can represent arithmetic, and some do not (e.g. the machine which immediately halts probably does not). what are the differences between these two categories of machines? can we decide which category a given machine is in? presumably a machine representing arithmetic must have an infinite number of theorems, so it cannot halt. a brief moment of doubt - how good is our paradigm for converting machines to formal systems? is it actually possible for any system corresponding to a tm as described here to really represent arithmetic? it seems as though it should be, for example the machine which begins with arithmetical axioms and then methodically proves theorems forever - I forget exactly what it means to "represent" arithmetic in the godel context, but it seems like such a machine would have to.
  • if a machine's corresponding system is as powerful as arithmetic, then godel says it contains true theorems which can never be proven. what does this tell us about turing machines? a provable theorem is a configuration of the machine which will eventually be reached, so Godel's G will never be reached. then what does it mean for G to be "true"? presumably at least one such G somehow corresponds to the statement "I will never be reached during the course of machine M's computation"  but how exactly does it do this? what is the nature of the correspondence? this isn't actually a profound question, just an instance of my limited understanding/recollection of godel-related matters.
  • turing and rice tell us we can never decide, in general, anything really interesting about turing machines. this includes, again in general, the issue of whether a given machine will ever reach a given configuration. if a system represent arithmetic, the corresponding machine cannot halt, whereas a system not representing arithmetic may or may not halt (is this true? maybe "represents arithmetic" is exactly the halting problem, but again I need to figure out exactly what "represents arithmetic" means formally). thus deciding "represents arithmetic" (hereafter RA) would decide at least some cases of the halting problem. (side question: can we do the reverse mapping, and for any given formal system create a machine whose computational states are exactly the theorems of the system? probably not unless we redefine our model of TM or limit our definition of formal system, since for example an arbitrary system might have rules involving changing multiple characters in a single step, which standard TMs cannot do.). i forget exactly where i was going with this, but it seemed interesting at the time.
  • again not a profound question, but something that i should investigate to clarify my understanding of godel. Can G be shown undecidable within the PM system? If no, in what system is the proof done? if yes, why couldn't we just add a new derivation rule that allows us to conclude, from proof that a proposition is undecidable, the proposition itself? the obvious answer is that an undecidable proposition could be either true or false, e.g. both G and ~G are undecidable but G is true and ~G is thus false. How do I reconcile this with the intuitive idea that any undecidable proposition must be true since being false implies the existence of a counterexample, which implies decidability? is the relevant distinction between universally quantified propositions like G, and existentially quantified props like ~G? but it can't just be that, since if it were we could decide an undecidable proposition just by looking at its quantifier. probably the answer is that being false doesn't imply a counterexample - only problems in coNP have practical counterexamples.
  • 12/20/2008
  • in psych we can identify different "fundamental" personality traits by giving questionnaires and then using principal component analysis (or related stat techniques) to find a few dimensions which explain most of the variance. can we similarly identify different fundamental types of intelligence? proposals for separate mathematical, musical, naturalistic, social, etc. intelligences seem plausible but the categories seem arbitrary and a little bit bullshit. something more rigorous is in order. the problem is that intelligence is a difficult trait to get the proper sort of measurements of, or even to define properly - is intelligence the ability to do something, the ability to do something given sufficient training, or something else? the first is easier to test but less satisfying as a definition. the second is hard to test, especially because to get the data required you'd have to test a lot of people in a lot of areas - i.e. give them music instruction and see how they improve, give them math instruction and see how well they improve, repeat for every area you think could possibly involved in a definition of some subtype of intelligence, i.e. basically every possible aspect of human competence. that would take more than a lifetime (and of course the order probably matters, e.g. teaching math before music might give better results than vice versa)
  • 12/21/2008
  • given a formal system which produces an infinite set of theorems, can we come up with some formal definition approximating how "central" or "important" a particular theorem is, in terms of how easily it can be used to prove other theorems (or maybe just other "important" theorems?) e.g. the fundamental theorems of arithmetic, algebra, and calculus are important. the question is - is importance or centrality a fundamental attribute of a theorem within the system which humans just discover, or is it something arbitrary that humans attribute to theorems based only on our own cognitive biases? if the former, and we can define it, we could use such a definition to guide automated theorem-provers. if the latter, can we model it?
  • 12/24/08
  • similar to previous - are complexity classes (P, NP, IP, etc.) invented or discovered? Assuming that the definitions of complexity classes can be formalized, which seems safe, then by enumerating all possible strings we can get formalizations of every possible complexity class (and a bunch of things that aren't complexity classes - it seems as though differentiating these is should be a simple matter of syntax?). assuming that we can indeed enumerate all possible complexity classes, then what we're really doing when we come up with something like P or IP is identifying a class from the pre-existing space and saying that it's somehow interesting - that certain desirable properties hold for it (e.g. P is very robust, much more so than TIME(n^2)), or that its definition has some kind of psychological salience (whatever that means) e.g. the concept of interactive proofs. so it's all very Platonic in the sense that the complexity classes exist before we identify them (I guess this is really true of any mathematical statement or structure following this logic, in the sense that we can construct a program which will eventually enumerate and thus "contain" them), but the question still remains of whether what we're identifying is some fundamental, human-independent property of what makes complexity classes interesting, or is it just again just an arbitrary side effect of the biases of human cognition? given that a mechanistic worldview presumably allows us to formalize the mind, those two possibilities seem to differ really only in simplicity - the question becomes whether there is some formal way to identify "interesting" complexity classes (or mathematical theorems, etc.) which is substantially simpler than simulating the brain of a mathematician (or even the world of mathematicians).
  • 1/2/09
  • Hofstadter (in "I am a Strange Loop") says something along the lines of, "once a machine reaches a certain level of complexity it becomes universal". Is this literally true, or was he just speaking colloquially? How exactly do we define a "universal" machine? Is it sufficient to be able to produce any computable number? (it's clearly necessary!) Is there a more intuitive formal definition based around the ability to simulate arbitrary machines? However we formulate the definition, is it decidable whether an arbitrary machine is universal? (intuition says probably not) Whatever definition we use (presumably they're all equivalent but some might be easier to work with than others), what can we then say about patterns of occurrence of universal machines? If we accept description length (number of states) as an approximation of "complexity", does the probability that a machine is universal increase as length increases? It almost certainly does; it can't decrease since every "short" universal machine corresponds to many "long" universal machines with lots of useless states added. Does the probability of universality approach 1 as length goes to infinity? At any length there are a lot of machines which can't be universal, e.g. the machine in which every state loops back on itself. So it's not true that there is a certain length past which all machine become universal.
  • What if we use a real measure of complexity, like Kolmogorov-Chaitin (what are the other options?), instead of just length? Is it then true that all machines above a certain level of complexity must be universal? If not, how does the probability of a machine being universal increase as complexity increases?
  • 3/15/09
  • Our recordings of sound are essentially one-dimensional (time), while our recordings of video are essentially three-dimensional (x, y, time). what's up with that?
  • 4/14/09
  • One idea of intelligence - the shortest program which reproduces a given string is probably the one doing so in the most "intelligent" way - think writing out an encyclopedia letter by letter, vs. gzipping it, vs. having some smaller conceptual understanding.
  • 4/21/09
  • What's the network structure of mathematics? By which I mean: Suppose we had a graph with nodes representing all (known) mathematical results (in ZFC, say), with edges connecting those that could be derived from each other via a single step (where a step is probably an application of an axiom). What does this network look like in terms of structure - clustering, connectedness, etc.? Could we redefine "step" as something bigger than an axiom - if so, what? (motivation for this being that sometimes there are two very nice results which are linked in a beautiful way by another result, but since these might be high-level results they could be separated by many axioms and so the axiom-level graph wouldn't necessarily show this relationship well).
  • 5/5/09
  • Train a supervised classifier to predict (heuristically) truth/falsity of mathematical theorems. People have some intuitions about what is true or false - this must come from some abstract, fuzzy mental model, which can be simulated by a machine classifier. Since we develop our intuitions from looking at high-level statements (rather than raw Principia Mathematica, ZFC, etc. expressions), you'd want to train on those, perhaps even expressed in english (of course we bring a lot of our intuitions in from elsewhere too, and developing good mathematical intuition is a hard problem, even for people). Problem is that grammar matters a whole lot - can flip the truth of a statement by changing the placement of a single word (or just adding a "not").
  • 7/7/09
  • how closely is it possible to approximate the halting number? define the halting number as a binary real number in which the nth digit is 1 iff the nth turing machine halts.  Obviously we can get the first few digits, but just as obviously we can never have a closed form for the entire thing. is there any theoretical result on where the uncrossable line is? is it just that we can only know finitely many digits? or can we know countably many (say, all the even digits)? All the halting problem says is that we can never know all of them.
  • 8/2/09
  • The process of finding a mathematical proof requires, as far as we know, a search through an exponentially large space (in terms of the length of the proof). If an absolute layman wanted to explore some conjecture, say, in group theory, they knew nothing about, but could formulate in ZFC, they could set up a (futile) computation which would test every possible proof in ZFC until it found one which proved the conjecture or its negation. An expert in group theory would still need to search that same exponentially large space, but because they are familiar with the structure of the space (at least, in the areas which are relevant to them*), and common abstractions that can be made from that structure, they can in essense transform the search into a much smaller space where the fundamental elements are not applications of the ZFC axioms, but applications of various theorems of group theory. The space is still exponential (although there is still structure remaining even after having collapsed the relevant pieces of ZFC into the theorems of group theory - hence the effectiveness of mathematical intuition; the mathematician knows which sorts of theorems and concepts tend to work together), but it is much smaller, because a proof of a group-theory proposition in group-theory terms is many orders of magnitude shorter than a proof of the same proposition in ZFC. So the strategy which mathematicians have settled on for searching an exponential space is, in essense:
  •     1) Try to find structure in that space (how can this be automated - is it a network property? read http://www.lorilorigo.net/mkm04.pdf )
  •     2) Use abstractions which exploit that structure to reduce the size of the search space
  • with all three of these steps being massively parallelized by the community of mathematicians.
    • The structure of the entire space is not necessarily homogeneous. Even a genius polymath cannot keep in their mind everything that is known about the structure of the ZFC search space; there are too many branches (every field of math essentially carves out its own branch, which is infinitely deep and exponentially large just like the whole set (could we think of ZFC as having fractal structure? whether it's self-similar on multiple scales depends on your metric for similarity - obviously the shape of the branch for analysis is different from that for algebra, etc. and, there's a lot of overlap/interconnection between branches.)). So any abstractions we use necessarily limit us to a portion of the space, because they represent a limited subset of its structure.
  • This might be one explanation of why difficult problems can often require new machinery to solve: their proofs lie in a section of the space which is not properly mapped by current abstractions.
  • When Erdos talks about "Proofs from the Book", what did he mean? Simplicity is one element of a beautiful proof, but it's not the only one. Every provable-in-ZFC statement has a shortest proof (which is even computable), but not all of those necessarily belong in the book. In some cases there may be a longer proof which "collapses" more readily, so that even though its ZFC-expansion is relatively long, the intuitive chain required to grasp it very short because it makes use of good abstractions. This could be one measure of beauty - the ratio of the length of a proof, expressed using the best abstractions we have for it, to the length of the same proof expanded into pure ZFC. A proof which collapses significantly must make good use of abstractions, tying together already-known ideas, and so might seem more beautiful than a proof which is shorter in ZFC but doesn't provide any new insight into our known abstractions (the best might be a proof which first defines a new abstraction, and then collapses deeply through that abstraction. however, the new abstraction has to be useful and relevant to other things for that to be interesting).
  • 10/5/09

  • Technically the brain is a finite automaton, not a turing machine, and the same is true for any real computer. But we treat it like a turing machine for almost all purposes. We'd like to agree with statements like "if the mind had infinite memory, it would be able to perform universal computation". obviously there are automata for which this seems less true, like a two-state parity-checker. Can this be formalized? Are there "universal automata"? What would that mean? (possible defn: automata such that, given some string of bounded length specifying a computation and then some string of bounded length specifying an input, we can then perform any operation on the input that could have been performed by some specific automaton) (another possible defn: automata such that, if we extend them with states in a certain way, so that these states can only be used as "memory" and not for new functionality (how would you do this?), will behave like a universal turing machine in the limit as we add arbitrarily many states)
  • 10/13/09

  • One interpretation of lower bound results such as (presumably) P!=NP is as upper bounds on intelligence. If it is proved that P!=NP, then no system can ever solve NP-complete problems in polynomial time. This includes humans and iPhones, but also any future mega-AI or the result of a technological singularity. PvNP tells us something fundamental about the nature of intelligence - if they're unequal, then there are many problems for which no clever solution is possible and the only solution is essentially guess-and-check; the only virtue of any approach over any other is a bias towards exploring a certain arbitrary part of the solution space. (of course this ignores approximation algorithms)
  • 11/2/09

  • It's hard to learn much about a field from an intro class, or from reading summaries of it or the most influential works in its literature. But, one thing you do learn very, very easily is the set of important things which are not understood. For example, in cognitive science, the question of whether physics is discrete seems somewhat important. I've never been told that no one knows the answer (Feynman implies that it's not, at least in the obvious lattice sense), but since this seems like an important issue, and no one ever gives a certain answer, then it's pretty easy to infer that it's not known (or at least agreed upon), without actually knowing many of the arguments on either side.
  • 11/8/09

  • How can we talk about the computational complexity of abstraction? Consider the hardware/software boundary. For any given assembly program we could create custom hardware to instantiate it most effectively, but instead we have generic hardware which exposes an instruction set, so that we can change the software without changing the hardware and when we optimize the hardware, it helps all software instead of just a single program. Suppose that the cost of designing hardware for a program of size k is given by H(k), while the cost of designing software is given by S(k). If I have to write n programs, the cost comparison is given by (sum{i=0 to n} H(size of program i)) vs. (sum{i=0 to n} S(size of program i)) + H(standard processor design). Is there any insight to be had here?
  • One idea (stolen from Domingos): if there are n different types of processors and n different pieces of functionality we want to achieve on each of them, then without abstraction we need to write n^2 different programs. With an abstraction layer (something that allows us to write hardware-independent programs) we need only write 2n programs  - n to implement the abstraction layer on each piece of hardware, and n to implement each piece of functionality in the abstraction layer.
  • 12/16/09

  • I doubt this is original, but I have a mathematical justification for Occam's razor, inspired by Hutter/Legg. Given an input sequence, why should you guess that the sequence was generated by the shortest program that would generate such a sequence? Suppose that program is of length n, and suppose that we have a way of padding the program with gibberish from a k-bit alphabet (this is true in most definitions of Turing machines - just add an extra state with no incoming transitions - and most programming languages). Then there are k functionally identical programs of length n+1, k^2 functionally identical programs of length n+2, and so on. If the shortest program which would compute a different sequence at any time in the future (that is, the shortest program which is meaningfully different from the one we identified) has length n+10, then the set of length n+10 programs includes k^10 functionally identical copies of our original program and only one of the new program, and indeed at any length there will be k^10 as many copies of the first program than the second. So if we selected a machine at random from the (infinite) space of all machines computing the observed sequence, we are much more likely to select a version of the first machine than the second.
  • One consequence of this is that the relative unlikelihood of a more complex explanation decreases exponentially with the amount of added complexity. If explanation 2 is m bits longer than explanation 1, than explanation 1 will be at least k^m times as probable.
  • 12/16/09

  • I'm sure this has been explored as well, but I'd like to know the answer. What is the distribution of Kolmogorov complexities of strings? If I took all strings of length 1000000 (or something long enough that the constant factors involved become negligable), clearly there are some with very low complexity (e.g. 1111….) and some with very high complexity (random strings). It seems like a good portion of them should be incompressible (and I would work out how many and why if I weren't feeling really lazy right now; it shouldn't be hard), but obviously many are not. What is the distribution? that is, if I plot a chart with compressed lengths on the x-axis and the number of strings having that length on the y-axis, what does it look like?
  • UPDATE FROM MUCH LATER: obviously the vast majority of strings must be relatively incompressible. similarly, the vast majority of real numbers are uncomputable.
  • 12/21/09

  • Human beings are finite automata, so clearly we can only recognize regular languages. But (according to chomsky and others) regular languages are not a good model of human language; it makes more sense to use context-free grammars and other more complex models. this is true even though context-free languages are not generally decidable by human beings. questions: are there other situations in which it makes sense to use a stronger computational model to reason about what a weaker system is doing? possible example: AIXI: using halting-problem-capable machines to model general intelligence.
  • 12/22/09

  • Possible modification to the AIXI framework: is the problem with incomputability of Kolmogorov complexity / the Solomonoff prior just that we could have a short program which takes arbitrarily long to run? If so, then what if we bound the execution time allowed for the program to produce a new time-step in the enrivonment? Then if we have observed n time-steps, we know that the program producing them cannot have taken longer than an, if a is the bound on each time-step, so we can easily try every possible program for an steps to see what works. This is plausible since our idea of the universe is that things are computed "as they happen" - the framerate never drops (would we know if it did?)
  • 1/18/10

  • Computability and chaos both put limits on determinism. The halting problem says that even if we know the current state of the universe, we can't predict the future without actually running a simulation (i.e. letting time pass, since there's no real other way to simulate a large-scale system perfectly). And chaos says that unless we know the current state exactly (which is of course impossible), our predictions will be wrong even if we could somehow run a simulation.
  • 1/23/10

  • Some statements can be proved in very few steps from the axioms of ZFC. Some cannot. Consider the Riemann Hypothesis. Suppose there is no length-n proof of the RH or its negation. That says something about the difficulty or the complexity of the problem. What does it say?
  • 1/29/10

  • If artificial general intelligence is possible in principle (i.e. if the Church-Turing thesis is true), then it will be possible in principle to build computers which can perform any task a human can do, and more. But this doesn't mean we'll be able to build artificial human beings, in the sense of robots which behave convincingly like real people. Why? Because we still don't understand how people work, and possibly we never will. We have lots of high-level abstractions that are useful in AI and psychology - concepts like memory, learning reasoning, emotion, etc. - but these are just simplifications which seem to capture some of the general regularities of human behavior. The actual human brain is a collection of hacks, and (in some sense) parameters which have been finely tuned over billions of years of evolution. We are unlikely to get similar behavior out of something with a radically different organizational structure, e.g. anything that's not a collection of hacks. But we will never be able to make exactly the same hacks that evolution has found, because a) many of them are arbitrary, particular to some special set of circumstances encountered in our evolutionary past, b) evolution is cleverer than we are, and c) a system composed of gazillions of hacks interacting in complex ways would never be understandable by a human scientist; it's not the sort of thing humans are capable of developing in any way that we can currently conceive of. So even though we may build other types of AI, which may be better or worse than humans at any given problem, our only feasible route to natural-seeming artificial minds will be in the direct simulation of human brains. That said, since our attempts to build AI will require us to explore the abstractions we use to think about the mind (learning, reasoning, etc.) they may be incredibly valuable in helping us understand how people work on the level that we can comprehend, even if the ultimate facts of the matter are inaccessible or incomprehensible to us.
  • added (2/28/10): why would we expect even an intelligent computer to pass the Turing test? Imagine a Turing dog-test, in which something is "as good as a dog" if and only if it cannot be distinguished from a real dog by a qualified observer (say, another dog). obviously the test couldn't be a textual conversation because dogs don't speak, and it couldn't be in barks because they aren't very expressive, but say we had some means for a human brain to control a dog body (so we are trying to see if we can distinguish the input-output function from percept sequences->actions of a human brain from that of a dog brain). Would a human be able to pass the Turing dog-test? probably not. Although we know the general sorts of things that dogs tend to do, our brains are not dog brains and it so might be impossible for us to imitate such a brain. similarly, a machine intelligence might not be able to pass a Turing test because it hadn't mastered our idiosyncracies, even if it was just as much more intelligent than us as we are than dogs.
  • In the human->dog case, we could imagine spending ages gathering enough information to approximate the dog-brain function. Even though it's possible that the structure of our brains wouldn't allow us to efficiently incorporate that information directly, we could always do a "chinese room" type simulation if we wanted to add a layer of indirection. Similarly for the machine->human case. But this approach (analogy: fitting a function to a set of points) requires an enormous amount of data about the function, assuming the function is as complex as we think it is (a set of hacks). Since we don't know what kind of function it might be, we don't know how to interpolate between the observed data points in a realistic way, so we need to collect enough data points that our function is essentially a lookup table. But a lookup table is not intelligence, so no good approach to building an intelligent machine would go that route.
  • 2/28/10

  • link between functional analysis and computation. Math spends a lot of time talking about functions, in which if we say that, if we were given some input, we would get some particular output. Computer science concerns itself with how to make functions happen in the real world - how to actually get the proper output for a given input. Since we don't have magical function-evaluating objects for every function (though it would be cool if those occurred naturally; there would be cosine mines and x^2 mines and so on, although of course you would need an infinite number of different function-evaluators to be really useful, so that might be difficult), computer science figures out how to construct arbitrary functions by composing simpler functions, which we do know how to make happen in the real world (e.g. we can make an AND happen by hooking up some wires and transistors in the proper way). circuit complexity theory is the study of (given some set of elementary functions) how many uses of the simple functions it takes to evaluate a more interesting function (where "how many uses" is measured in width, depth, etc of the circuit specifying how the output of each function feeds into the input of another).
  • I don't know exactly what this means for functional analysis, since I don't know that much about FA yet. But I know about the idea that we can find an (infinite but countable) basis such that every continuous function on a compact interval can be represented as a countable sum of the basis functions (examples: polynomials, trig functions e.g. fourier series). Computability theory says that we can find a (finite and small) "basis" such that every computable function on a countable set can be represented as a (branching) composition of the "basis" functions. There are a lot of sets which serve as "basis" sets (e.g. AND,NOT; OR,NOT; NAND; etc.), and we can define an "independent" (analogous to linear independence) basis as one in which no elementary function can be expressed as a circuit made up of the other functions.
  • the problem with composition, as opposed to summing, is it doesn't commute (is there more to say here?). thus instead of large clean sums, we get complicated graphs representing circuits giving the precise order in which different sub-functions need to be computed and fed into each other.
  • machine learning does a more direct kind of function approximation, e.g. with nested linear combinations of sigmoid functions.
  • 11/30/10

  • People often conflate artificial intelligence with artificial agency. If we think of intelligence as the ability to solve a particular class of problems, then of course we already have lots of programs which are intelligent in certain domains. Game-tree search for games like chess, ML/statistical techniques for classification/regression problems, SAT solvers and theorem provers, etc. perform their tasks in many cases as well or better than human beings. Clearly they're intelligent in some sense. But Deep Blue is never going to suddenly jump out of the machine and take over the world. This holds for "learning" algorithms as well. No matter how much data you feed to a neural net or SVM, it's never going to decide "okay, fuck this separating hyperplane shit, I'm gonna go create the Terminator", and no matter how much of the Web it sees, Pagerank is going to continue on just assigning numeric values to Web pages and is not, in fact, going to become Skynet. Research into new and better representations and algorithms for planning and learning will make computers better at solving difficult problems, which many humans think of as requiring intelligence, but will not by itself create a new class of "AI beings". In fact, problems solved by computers in this way will probably continue to be seen as not requiring "true" intelligence, since in the end it's all just calculations.
  • The most important thing about an optimizing process is the domain in which it works, and what you tell it to optimize within that domain. If we write a search algorithm, give it some a simple domain in which it can press buttons to manufacture paperclips, and ask for it to maximize the number of paperclips manufactured, it will just give us a series of button presses. If we give it a more complicated domain which simulates the real world the real world and ask it to do the same thing, well, it's a harder optimization problem, but if our system was good it would come up with a series of actions which would lead to the creation of paperclips. Depending on the sophistication of our model, these actions could be very basic or very sophisticated, but within the limits of the model they would be among the most effective possible. Now if we hook our optimizing process up to the real world, giving it real eyes and real ears and real arms and real legs, then the optimization problem it has to solve is overwhelming difficult (and can probably only be solved with a series of abstractions beginning in the visual system and going up to the higher levels at which human have language), but a solution would result in this system acting in the world to create paperclips. It would be bound by human laws in the sense that it would need to take into account human reactions to its actions, but it would be fundamentally amoral. As Elizier says, "the AI does not love you, nor does it hate you, but you are made of atoms which it can use for something else." This is where the worries about "Friendly AI" come up. We already unleash optimization processes in the real world- e.g. the Roomba which works blindly to make sure your entire floor is vacuumed - but these processes have limited ability to interact with the world (the Roomba can only move its wheels and flash its lights) and limited models of it (the Roomba has some internal representation of its spatial position and a basic dynamics model which predicts where it will be after certain wheel movements, but it certainly doesn't model people and no matter how long it roams around it never will), so it is unlikely that they will suddenly develop psychic powers with which to disintegrate human bodies and turn their atoms into paperclips. Roombas are seen as Agents, even if they are not very intelligent, and their simple model of world dynamics will never allow them to do anything more sophisticated than vacuum floors.
  • What would be required for an AI in a "box" to "escape the box"? No matter how brilliant its chess-playing, no matter how perfect its max-margin calculations, an agent working in a simulated domain will never reason about the real world; it doesn't even know that the real world exists. Even if you give it the text of the world's libraries, -- okay, I'm no longer inspired, and I need to go do real work now.
  • 12/27/10

  • It doesn't make a lot of sense to think of humans as decision-theoretic agents following a utility function. Well, okay, it makes a lot of sense in many situations, but it doesn't make enough sense to actually be true overall.
  • One reason: Pascal's mugging. If someone promises you that, in exchange for 10 utils now (in the form of money in your wallet), he will deliver you BB(N) utils tomorrow, then whatever your probability estimate of him actually following through, there is a value of N for which you should follow through on the deal (unless your estimate shrinks with N more quickly than the busy beaver reciprocal, which doesn't seem right - my degree of belief seems to be about the same for N=100000 as N=1000000000). But it seems like no sane person would take the deal. One reason is that it seems like there's a finite upper bound on the amount of utility that can be meaningful to me - obviously there's some value in having money, in having friends and family, in eating good food and listening to good music, having good sex, etc., but in the end the brain has only so much capacity to experience pleasure, so it's hard to imagine what BB(1000000) utils would get me that BB(5) wouldn't. Of course you could imagine having your consciousness removed from the physical brain and moved to some sort of computer substrate which would give it different properties, but that runs into all sorts of issues with identity and it's not clear you would even want that if it were offered to you for free (people seem to value our humanity a lot). So the basic point here seems to be that there's some upper limit to how much utility we can meaningfully discuss; it's not unbounded.
  • Along this theme. Utility can't be just a matter of pleasure, because if you offered people the opportunity to serve out their remaining natural lifespan in a heroin daze, most people wouldn't do it. We have evolved to mostly reject that sort of temptation, because those who take it end up not contributing to society or furthering their own genes or memes. The things I mentioned in the previous paragraph: money, family, music, sex, beautiful sunsets, etc are great, but imagine that you were given all of those things. Maybe you'd be happy, but it's human nature to always want more: maybe now you want power, or scientific discovery, or religious/moral insight, etc. and these things are addictive in themselves: the scientist is never satisfied until he understands everything about how the world works, the conqueror is never happy until he rules the whole universe, etc. But if you understood every law of physics and ruled the whole universe, would you be ultimately happy? Power is lonely, and many scientists find they love the journey as much as the destination - it will be a boring day when there is nothing left to discover.
  • I guess one of the points here is that people don't necessarily know what they want - surely the highest-utility thing I could offer you is the opportunity to reshape the universe entirely as you like, to give yourself whatever knowledge and power you desire, surround yourself with the dearest of friends, make yourself into a emotionally-well-adjusted genius, etc.
  • But if you did this and you had no room left to be happier, you would feel as though your life was meaningless. or would you?
  • What is the argument I'm making?
  • Part of it is that there's a finite upper bound on "how good you can have it". But the other part is that no matter how good you have it, you will always want to have it better. Happiness is about the journey, not the destination - you need to feel as if you're achieving things. How does this square with the idea of utils as a unit?
  • Apparently prospect theory deals with this. But prospect theory tries to be descriptive, rather than normative. Was prospect theory developed because they found that humans couldn't be modeled as decision-theoretic agents, for any utility function?
  • Properties of a rational utility function: (von neumann-morgenstern says that for any binary relation over lotteries which has these four properties, there exists a utility function assigning utilities to outcomes such that the preference relation will fall out out the expected utility of lotteries)
  •     - completeness
  •     - transitivity
  •     - independence
  •     - continuity (or more loosely the Archimedian property)
  • This has the same sort of flavor as Pascal's mugging. The mugger offers you something so amazingly good that any reasonable nonzero probability assignment makes it seem like a good decision. Of course it's not infinitely good, so it can still be represented by a utility function - but it's pretty close.
  • 5/11 (?)

  • Corporations are an attempt at AI / cognitive modeling. Their divisions reflect people trying to decide what functions need to be performed by a large scale intelligent entity (albeit special purposed).
  • 11/11

  • In the natural sciences, we have physics, and then everything else (chem, bio, etc.) is at attempt to come up with high-level theories to explain physical phenomena. In computer science we are studying artificial rather than natural systems, so we already know the "physics" of our world. But we don't know the chemistry or biology - the higher-level concepts that let us reason about things that happen in the system, the consequences of the physical laws.   So in some sense, computer science can be described as the chemistry (or biology) of artificial worlds.
  • I guess this kind of applies to math also, except math doesn't have a "physics" as such because unlike computer programs, math has no moving parts.
  • Also, what does this say about the connections between CS and physics? Maybe that the "artificial systems" we study are not totally orthogonal from real physical systems. so abstractions created to understand one can apply to the other.
  • 2/13

  • We think that simple theories are better in science. By Occam’s Razor, the simplest theory that fits the data is likely to be correct. But this doesn’t mean that any given phenomenon in our world has a simple description! It’s just that we’ve only discovered the simple theories, because those are the ones we’ve gotten sufficient data to identify. For the phenomena which have no simple theories -- e.g. stuff that arises from complex systems, like all of the social sciences -- we’re still at the point where the complexity (meaning “number of parameters” or whatever) of the data is smaller than the complexity of the simplest theory. Of course over time this line moves -- given enough data, eventually we should be able to come up with better theories for this stuff.