S
Whereas graph.Connections size n and primaries size k, has an approach that runs in time o(n log(n) + m log(m) + (n + m)).This approach is divided into two stages:data preparation (o(n log(n) + m log(m)))go through the data (o(n + m))How does that magic happen? Well, let's go to the general idea, and then we go into the details.If you have an unordered set of cardinality n, to detect if we have repeated elements without making ordering it is necessary to do o(n^2 - n) operations. If the set is indexed, we can reduce that number by half. This gives a temporary relief, while data does not increase in order of magnitude... Imagine that you now have double elements now in the set (i.e. 2n); the amount of search operations is now of o( ((2n)^2 - 2n)/2) = o((4 n^2 - 2 n)/2) = o(2 n^2 - n). It was enough to increase the order of magnitude so that all optimization was lost.Now, what if the set was orderly? Well, it being orderly, the cost to order it is o(n log n) (if he is not previously ordered). Now, given an ordered indexed set, how many operations are required to do to detect any repetition? Only o(n). I'll explain better.When the set is indexed, I can access its members by the index. For example, C[i] take it i-eightth element of the set C, already C[i + 1] takes the following element to i-Even. Catching an element of a set index is equivalent to determining a function f: N -> E that maps a natural number (set N is the set of naturals) for an element of an arbitrary set E. If the elements in E are orderly and C is ordained increasingly, f then it will be a monotonously growing function. What does that mean? Well, basically, ensures that f(i) <= f(i+1) for any i within the function domain. Through induction, we can also demonstrate that f(i) <= f(i + k), k >= 0.One of the impacts of the function being monotonously growing is that if f(i) < f(i + 1), this ensures that f(i) < f(i + k), K >= 1. So if two elements of consecutive indices (i and i + 1) not identical, this ensures that no index element must be compared j < i with any other element of index k >= i + 1. In other words, the only necessary comparisons are between consecutive elements.But we're talking about two sets, not just one. How does that apply?Because the principle is the same. Imagine that we are working with two sets, A and B. Imagine that the sets are ordered and that they are of the same kind of data. Now, choose i a valid index A and j a valid index B. With these sets and these indices, we obtain the elements A[i] and B[j]. There are three possibilities of comparison between A[i] and B[j]:A[i] == B[j]: in this case, we found elements of the same value, which we desired;A[i] > B[j]: as we are dealing with ordered sets, we know that, for k < j, we have to B[k] <= B[j], therefore it makes no sense to compare A[i] with any element B with index lower than j;A[i] < B[j]: as we are dealing with ordered sets, we know that, for k < i, we have to A[k] <= A[i], therefore it makes no sense to compare B[j] with any element A with index lower than i.This provides us with a basis for a search algorithm. In a similar way, only elements (in a way) consecutive, as in the case of seeking repeated elements in an orderly set. It was to define here a small pseudo-code to write a function that detects equal elements in the sets A and B. The arguments for this function are:A, an orderly set of elements of the type EB, an orderly set of elements of the type Ecmp : E,E -> S, a function which, given two elements E, defines the relationship between them; returns - if the first element is smaller than the second; + if the first element is greater than the second or 0 if the elements are identical (S is the signal set: {-, 0, +})entrada:
A, conjunto ordenado de elementos do tipo E
B, conjunto ordenado de elementos do tipo E
cmp, função sinal que compara dois elementos de E
começo
i <- 0 # índice para iterar em A
j <- 0 # índice para iterar em B
enquanto i < A.tamanho && j < B.tamanho:
s = cmp(A[i], B[j])
se s == '0':
# elementos são iguais, faça alguma coisa
i <- i + 1
j <- j + 1
senão, se s == '-':
# A[i] < B[j], então próxima comparação será com A[i + 1] e B[j]
i <- i + 1
senão # caso trivial onde s == '+':
# A[i] > B[j], então próxima comparação será com A[i] e B[j + 1]
j <- j + 1
caso i ou j extrapolem o tamanho de A ou B, respectivamente, não há mais comparações a se fazer
fim
In all, there is a guarantee that the search will be delayed o(n + m) operations. Linear time, making your problem become tangible now.Just out of curiosity, https://pt.stackoverflow.com/q/235616/64969 about the compared performance of mergesort with the insertionsort, as the mergesort was taking approximately 2x the insertionsort time. After making a tunning of ordination taking full advantage of allocated memory, https://pt.stackoverflow.com/a/235636/64969 was ridiculously better (type, between 3,000 and 7 thousand times smaller). So, reducing the complexity of the problem at this level brings great changes.Until then, everything has been disengaged abstractly. How can we make this a little more feasible?The first step is to define the set E and the properties of its elements, so that you can define an ordination. If E consisting of only one ordering key, simply order by that key. For example, we may have E the set of verbetes of the dictionary; a verbete consists of the word of the verbete and its meaning, the key being the word of the verbete, and this key is capable of lexicographical ordination / alphabetical ordering. For example, we know that https://pt.wiktionary.org/wiki/an%C3%A1lise comes lexically before https://pt.wiktionary.org/wiki/complexidade . Thus, it is possible to order a dictionary using the lexicographical order.What about the case of multiple keys? Well, we chose an arbitrary key initially. She'll be the, say so, dominant key. If we order only by the dominant key, we will have partially ordered lists. In this case, if you choose another key to master over the other elements, so that would be the Second dominant key. We repeat this step until all keys are used, setting a hierarchy of dominance between keys. The most interesting thing here is that, for the search algorithm above, no matter what dominance hierarchy you choose, as long as it is consistent. (Perhaps the choice of this hierarchy of dominance affects the amount of work done in data ordination, however).In our case, the set we have to compare is that of two coordinates. Elements of this set contain four values: lat1 , long1 , lat2 , long2. As for the search algorithm the key dominance hierarchy does not matter, I will initially order by lat1, using lat2 as the first desmpate criterion, followed by long1 and then long2. Thickly, the comparison function would be something more or less like this:int cmp_coords(Connection c, Primary p) {
double delta = c.Initial.Coordinate.CoordY - p.LatitudeOne;
if (delta != 0) {
// caso em qua a chave dominante prevalece
return delta < 0? -1: +1;
}
// caso em que a chave dominante empatou
// primeiro critério de desempate: Lat2
delta = c.End.Coordinate.CoordY - p.LatitudeTwo;
if (delta != 0) {
// desempatou!!
return delta < 0? -1: +1;
}
// segundo critério de desempate: Long1
delta = c.Initial.Coordinate.CoordX - p.LongitudeOne;
if (delta != 0) {
// desempatou!!
return delta < 0? -1: +1;
}
// último critério de desempate/última chave: Long2
delta = c.End.Coordinate.CoordX - p.LongitudeTwo;
if (delta != 0) {
// desempatou!!
return delta < 0? -1: +1;
}
// ok, realmente empatou em tudo!
return 0;
}
So if we have the variables graphConnectionsOrdered created from graph.Connections applying the analog comparison function to the above function, and also the primariesOrdered the same way in relation to primaries, we can do this:int i = 0; // índice de iteração sobre graphConnectionsOrdered
int j = 0; // índice de iteração sobre primariesOrdered
while (i < graphConnectionsOrdered.Length && j < primariesOrdered.Length) {
int s = cmp_coords(graphConnectionsOrdered[i], primariesOrdered[j]);
if (s == 0) {
// elementos são iguais, faz algo?
i++;
j++;
} else if (s < 0) {
// elemento em graphConnectionsOrdered menor do que o primariesOrdered; incrementa índice do graphConnectionsOrdered
i++;
} else {
// elemento em graphConnectionsOrdered maior do que o primariesOrdered; incrementa índice do primariesOrdered
j++;
}
}
This performs in guaranteed time o(n + m). The time required to order each set is o(n log(n)) and o(m log(m)). So the total time is o(n log(n) + m log(m) + (n + m)).All right, but you always talked to work with a homogenized guy, but the comparison uses two distinct types. Why? And how is that possible?Well, right, I really jumped that part on purpose. Despite both Connection and Primary represent two coordinates, they're not the same type. In compensation, they are trivially converted one type into another: there is a bijection between Connection and Primary. Because of this bijection, if we respect the way it is made (and we respect it up there), then we have that the guys are equivalent. As they are equivalent, I used some implicit conversions between them two, saving temporary variables that would only exist to make this homogenization.