While writing a computer program recently, I needed a short
routine to check whether two arrays of numbers were
rearrangements of each other. Of course you can just sort them
and compare individual terms but this is very slow. It occurred
to me that I could just check that they had the same sum, sum of
squares, sum of cubes etc.
This of course is very similar to Richard Mycroft's problem about
means and variances.
What I would like to know is, how many different powers do I need
to sum to be sure of not getting spurious matches? I suspect I
would need to sum up to the nth power if there were n numbers but
I cannot prove this generally, only for n=2 and n=3. Also, how
does the situation change if the numbers are all known to be
positive integers? And what if I check their products are the
same too?
David Loeffler
P.S. Sorry, I forgot to say that both arrays are known to be
the same size.
David
| e1=- |
n å i=1 | xi |
| e2= |
å i < j | xi xj |
| e3=- |
å i < j < k | xi xj xk |
| sk= |
n å i=1 | xik |
The problem with this method is you will
get overflows for even quite small n. You're right that you would
need to know all of the sums up to n, and I'm not even sure that
this method would be not spurious even if you had n sums to work
with. You can see why you need the n sums by thinking about it
like this: If x1 +x2 +...+xn
=S1 then x1 =S1 -(x2
+x3 +...+xn ). In other words, you can
choose x2 to xn arbitrarily and get the
same sum. You've just lost one degree of freedom. Each additional
constraint you put on the problem will lose one more degree of
freedom until you have done the first n sums. The n sums will
certainly fix it for positive numbers, but they might not for
negative numbers, think of S2 =x1
2 +...+xn 2 , this gives
x2 =±sqrt(S2 -x1
2 +x3 2 +...+xn
2 ), i.e. two possible values.
My advice is to find another algorithm for comparing them,
because this one will not be practical if you have even
moderately large n (like 10 or so) and moderately large data.
Anyway, the order of operations it takes to generate this data is
approximately O(n3 ) which is very big (assuming that
is that xk is k elementary operations). There are sort
algorithms which can do better than O(nlog(n)), so it would be
quicker to sort and compare than to use the sum of powers method.
I suggest looking up bubblesort, shellsort, and quicksort on the
internet to see which would best sort your data.
A few thoughts...
Firstly, regarding the time complexity of the proposed algorithm,
you can quite easily do xk in O(log k) operations (I
think - ask if you want details), so this would mean that the
algorithm is actually 'only' O(n2 .log n).
Also, you could improve this method still by doing all the power
sums simultaneously, and using the fact that to generate
xk by the usual method, you have already found
x2 , ..., xk-1 . This would bring the
complexity down to O(n2 ).
However, this is still high, and is rather irrelavent anyway
since, as Dan and Amanda have already pointed out, the algorithm
won't necessarily guarantee that the two arrays are different,
and also the problem of overflow does rather mess things
up.
Secondly, I don't think that it's true that there are sorting
algorithms which 'can do better than O(n.log n)' - there are
certainly algorithms which achieve this in the worst case, but
unless you assume some knowledge of the input data (e.g. you know
that all the numbers come from a certain small set - 1,2,3,...,9
for example) then no sorting algorithm can do better than O(n.log
n) in the worst case.
A brief summary of sorting algorithms:
- Insertion sort is a very simple algorithm, particularly
good on data which is not very muddled up to start with, but it
is O(n2 ) in the worst case.
- Bubble sort should be avoided at all costs - although it
looks very simple it is slow. Insertion sort would almost
certainly be a better alternative.
- Merge sort and heap sort are two algorithms which
are O(n.log n) in the worst case, so are particularly good for
sorting large amounts of data.
- Quicksort is actually O(n2 ) in the worst
case, but on average it turns out to be O(n.log n) and in
practice usually beats merge sort or heap sort, so it's a very
good algorithm. There are sometimes (e.g. in C) standard library
implementations of this so you don't have to even write the code
yourself.
- I don't know much about shell sort , but it may well be
worth a look.
Finally, I should add that when designing your algorithm it is
often useful to consider what behaviour is expected most often.
For example, if you know that 99\% of the time your two arrays
will not be the same, then you could do some additional checks
before sorting the arrays and comparing them, e.g. you could take
the sum of the elements of each array and compare them. If the
arrays are the same then these sums will be the same and you will
have to go ahead and sort the arrays anyway, but if they are
different then almost certainly the sum will be different, and
you could save yourself a lot of time on average, although in the
worst case it would take longer due to the extra check. However,
if most of the time you expect the arrays to be the same then you
may as well just get on and sort them straight away.
I hope that some of this has made sense/been of
interest/useful,
James.
PS. If the arrays you are comparing are of real numbers rather
than integers then you also have the additional problem that due
to rounding errors you could get different values by adding up
the same set of numbers in a different order!
| E ' (t)/E(t)=(logE(t)) ' = |
n å i=1 | xi (1+xi t)-1=s1-s2 t+s3 t2+¼ |
Dear all...
Thanks - it will probably take me a while to digest all that
though. If I have any problems I will get back to you.
David Loeffler