IPv6-ify and rewrite ip_range_to_subnet_array() [resubmit of #1709 (was #974)]

Function cannot handle IPv6 ranges, and is horribly inefficient, because it uses splitting+function call recursion for each "half". Even if extended for IPv6, it is probably far too inefficient for IPv6 on low power hardware. As written it's simply unable to handle an IPv6 environment or IPv6 ranges. As a result, if used in an IPv6 context, it would fail. It has other problematic issues: Validates both IPv4 and IPv6 as valid args, but then tries to process any IPv6 subnet bitwise as x32 LONG without further checking, potentially causing very incorrect return values. Doesn't detect if the start/end IPs are of same type (eg when validating user input related to an IP range), so a range such as "1.1.1.1-fe00::" is not detected as invalid. I've rewritten this function, but have had to effectively produce a better algorithm not just a code rewrite. The updated algorithm is extremely fast and gives identical results as far as I can tell from extensive testing. It also handles IPv6 much faster than the old code handled IPv4, and appears very robust. The algorithm is explained below. Changes: 1) IPv4/IPv6 works correctly. 2) detects mismatched start-end IP types 3) the algorithm written seems very robust (tested on 1 million random IPv4/IPv6 ranges and a number of edge cases, gives same results as existing code) 4) execution time is linear or better to number of bits, rather than exponential (due to lack of split+recurse). So it runs in about 4 - 6% of the time as the existing code for IPv4 (1ms vs. 20ms). On 128-bit IPv6 this would be a much greater saving. 5) it uses simple string pattern matching of low bit(s) to test needed subnets, so it's very efficient. (ip2long, or BCMATH, etc, don't actually add much if anything) 6) 3 functions never used anywhere else are removed (find_smallest_cidr(), ip_before() and ip_after()). Checked using Github search, can't find any other place using these, so removed and left a comment. RESUBMITTED FROM PR #1709 (WAS #974) TO ALLOW MERGING - NO CODE CHANGE. ALGORITHM DETAILS BELOW CHANGES SINCE PR #1709: (1) PHP bug related to "long numeric string compare" has been fixed for over a year now, since October 2014, which means all the === and strcmp() can revert to normal == and <. See https://bugs.php.net/bug.php?id=54547 (2) haven't removed redundant function "find_smallest_cidr_v4()" which can be done separately ALGORITHM: Documented on pfsense dev list 19-20 May 2013. PD'd by Stilez - please use as you like! A quick consideration of what subnets have to be present to span the endpoints of a range, shows that this can be done much faster, can be made to handle IPv6 (which present code never will!), and avoid function call recursion. SUMMARY: Algorithm looks at patterns of 0's and 1's in the least significant bit (or bits). These are all that needs checking, to identify the (guaranteed) correct, minimal and optimal subnet array. As a result, string/binary, with pattern matching built-in, is very efficient. It uses just 2 pattern-matching rules to chop off subnets at both ends, until nothing's left. (a) If any range has low bit 1 (in startip) or 0 (in endip), these endpoints are _always_ optimally represented by their own 'single IP' CIDR; the remaining range then shrinks by one IP up or down. Only one edge case needs checking: if the range contained exactly 2 adjacent IPs then these CIDRs will now exactly span it, and we're done. (b) Otherwise, if any range has low bits 0 (in ip1) _and_ 1 (in ip2), these low bits can *always* be ignored for subnet spanning. So provided we remember the bits we've place-shifted, we can _always_ right-shift and chop off those bits, and loop to span the remaining (place shifted) range as above, until after a few loops, the remaining (place shifted) range has become just one IP (ip1==ip2).
author: stilez <stilez@users.noreply.github.com> 2015-12-03 13:22:18 +0000
committer: stilez <stilez@users.noreply.github.com> 2015-12-03 13:22:18 +0000
commit: ed516fa7a78d38301746a6442059800581ca6ce4 (patch)
tree: 90ffd7ddd1354248cf5c96b4fde492dd9d5f574b /src/etc/inc/util.inc
parent: ae81c23b3790734db2553d706e2a8925ffcfbfed (diff)
download: pfsense-ed516fa7a78d38301746a6442059800581ca6ce4.zip
pfsense-ed516fa7a78d38301746a6442059800581ca6ce4.tar.gz
1 files changed, 104 insertions, 55 deletions
diff --git a/src/etc/inc/util.inc b/src/etc/inc/util.inc
index 88d48fa..793fb53 100644
--- a/src/etc/inc/util.inc
+++ b/src/etc/inc/util.inc
@@ -528,73 +528,122 @@ function ip_range_to_address_array($startip, $endip, $max_size = 5000) {
 	return $rangeaddresses;
 }
 
-/* Convert a range of IPv4 addresses to an array of subnets which can contain the range. */
-/* Note: IPv6 ranges are not yet supported here. */
-function ip_range_to_subnet_array($startip, $endip) {
-	if (!is_ipaddrv4($startip) || !is_ipaddrv4($endip)) {
-		return array();
-	}
-
-	if (ip_greater_than($startip, $endip)) {
-		// Swap start and end so we can process sensibly.
-		$temp = $startip;
-		$startip = $endip;
-		$endip = $temp;
-	}
+/* 	Convert an IPv4 or IPv6 IP range to an array of subnets which can contain the range.
+	Algorithm and embodying code PD'ed by Stilez - enjoy as you like :-)
+
+	Documented on pfsense dev list 19-20 May 2013. Summary:
+
+	The algorithm looks at patterns of 0's and 1's in the least significant bit(s), whether IPv4 or IPv6.
+	These are all that needs checking to identify a _guaranteed_ correct, minimal and optimal subnet array.
+	
+	As a result, string/binary pattern matching of the binary IP is very efficient. It uses just 2 pattern-matching rules 
+	to chop off increasingly larger subnets at both ends that can't be part of larger subnets, until nothing's left.
+
+	(a) If any range has EITHER low bit 1 (in startip) or 0 (in endip), that end-point  is _always guaranteed_ to be optimally 
+	represented by its own 'single IP' CIDR; the remaining range then shrinks by one IP up or down, causing the new end-point's 
+	low bit to change from 1->0 (startip) or 0->1 (endip). Only one edge case needs checking: if a range contains exactly 2 
+	adjacent IPs of this format, then the two IPs themselves are required to span it, and we're done.
+	Once this rule is applied, the remaining range is _guaranteed_ to end in 0's and 1's so rule (b) can now be used, and its 
+	low bits can now be ignored.
+	
+	(b) If any range has BOTH startip and endip ending in some number of 0's and 1's respectively, these low bits can 
+	*always* be ignored and "bit-shifted" for subnet spanning. So provided we remember the bits we've place-shifted, we can 
+	_always_ right-shift and chop off those bits, leaving a smaller range that has EITHER startip ending in 1 or endip ending 
+	in 0 (ie can now apply (a) again) or the entire range has vanished and we're done. 
+	We then loop to redo (a) again on the remaining (place shifted) range until after a few loops, the remaining (place shifted) 
+	range 'vanishes' by meeting the exit criteria of (a) or (b), and we're done.
+*/
 
-	// Container for subnets within this range.
-	$rangesubnets = array();
+function ip_range_to_subnet_array($ip1, $ip2) {
 
-	// Figure out what the smallest subnet is that holds the number of IPs in the given range.
-	$cidr = find_smallest_cidr_v4(ip_range_size_v4($startip, $endip));
+	if (is_ipaddrv4($ip1) && is_ipaddrv4($ip2)) {
+		$proto = 'ipv4';  // for clarity
+		$bits = 32;
+		$ip1bin = decbin(ip2long32($ip1));
+		$ip2bin = decbin(ip2long32($ip2));
+	} elseif (is_ipaddrv6($ip1) && is_ipaddrv6($ip2)) {
+		$proto = 'ipv6';
+		$bits = 128;
+		$ip1bin = Net_IPv6::_ip2Bin($ip1);
+		$ip2bin = Net_IPv6::_ip2Bin($ip2);
+	} else
+		return array();
 
-	// Loop here to reduce subnet size and retest as needed. We need to make sure
-	//   that the target subnet is wholly contained between $startip and $endip.
-	for ($cidr; $cidr <= 32; $cidr++) {
-		// Find the network and broadcast addresses for the subnet being tested.
-		$targetsub_min = gen_subnet($startip, $cidr);
-		$targetsub_max = gen_subnet_max($startip, $cidr);
+	// it's *crucial* that binary strings are guaranteed the expected length;  do this for certainty even though for IPv6 it's redundant
+	$ip1bin = str_pad($ip1bin, $bits, '0', STR_PAD_LEFT);
+	$ip2bin = str_pad($ip2bin, $bits, '0', STR_PAD_LEFT);
 
-		// Check best case where the range is exactly one subnet.
-		if (($targetsub_min == $startip) && ($targetsub_max == $endip)) {
-			// Hooray, the range is exactly this subnet!
-			return array("{$startip}/{$cidr}");
-		}
+	if ($ip1bin == $ip2bin)
+		return array($ip1 . '/' . $bits); // exit if ip1=ip2 (trivial case)
+	
+	if ($ip1bin > $ip2bin))
+		list ($ip1bin, $ip2bin) = array($ip2bin, $ip1bin);  // swap if needed (ensures ip1 < ip2)
 
-		// These remaining scenarios will find a subnet that uses the largest
-		//  chunk possible of the range being tested, and leave the rest to be
-		//  tested recursively after the loop.
+	$rangesubnets = array();
+	$netsize = 0;
+
+	do {
+		// at loop start, $ip1 is guaranteed strictly less than $ip2 (important for edge case trapping and preventing accidental binary wrapround)
+		// which means the assignments $ip1 += 1 and $ip2 -= 1 will always be "binary-wrapround-safe"
+
+		// step #1 if start ip (as shifted) ends in any '1's, then it must have a single cidr to itself (any cidr would include the '0' below it)
+		
+		if (substr($ip1bin, -1, 1) == '1') {
+			// the start ip must be in a separate one-IP cidr range
+			$new_subnet_ip = substr($ip1bin, $netsize, $bits - $netsize) . str_repeat('0', $netsize);
+			$rangesubnets[$new_subnet_ip] = $bits - $netsize;
+			$n = strrpos($ip1bin, '0');  //can't be all 1's
+			$ip1bin = ($n == 0 ? '' : substr($ip1bin, 0, $n)) . '1' . str_repeat('0', $bits - $n - 1);  // BINARY VERSION OF $ip1 += 1
+		} 
+
+		// step #2, if end ip (as shifted) ends in any zeros then that must have a cidr to itself (as cidr cant span the 1->0 gap)
+		
+		if (substr($ip2bin, -1, 1) == '0') {
+			// the end ip must be in a separate one-IP cidr range
+			$new_subnet_ip = substr($ip2bin, $netsize, $bits - $netsize) . str_repeat('0', $netsize);
+			$rangesubnets[$new_subnet_ip] = $bits - $netsize;
+			$n = strrpos($ip2bin, '1');  //can't be all 0's
+			$ip2bin = ($n == 0 ? '' : substr($ip2bin, 0, $n)) . '0' . str_repeat('1', $bits - $n - 1);  // BINARY VERSION OF $ip2 -= 1
+			// already checked for the edge case where end = start+1 and start ends in 0x1, above, so it's safe
+		}
+
+		// this is the only edge case arising from increment/decrement. 
+		// it happens if the range at start of loop is exactly 2 adjacent ips, that spanned the 1->0 gap. (we will have enumerated both by now)
+		
+		if ($ip2bin < $ip1bin)
+			continue;
 
-		// Check if the subnet begins with $startip and ends before $endip
-		if (($targetsub_min == $startip) && ip_less_than($targetsub_max, $endip)) {
-			break;
+		// step #3 the start and end ip MUST now end in '0's and '1's respectively
+		// so we have a non-trivial range AND the last N bits are no longer important for CIDR purposes.
+
+		$shift = $bits - max(strrpos($ip1bin, '0'), strrpos($ip2bin, '1'));  // num of low bits which are '0' in ip1 and '1' in ip2
+		$ip1bin = str_repeat('0', $shift) . substr($ip1bin, 0, $bits - $shift);
+		$ip2bin = str_repeat('0', $shift) . substr($ip2bin, 0, $bits - $shift);
+		$netsize += $shift;
+		if ($ip1bin == $ip2bin) {
+			// we're done.
+			$new_subnet_ip = substr($ip1bin, $netsize, $bits - $netsize) . str_repeat('0', $netsize);
+			$rangesubnets[$new_subnet_ip] = $bits - $netsize;
+			continue;
 		}
+		
+		// at this point there's still a remaining range, and either startip ends with '1', or endip ends with '0'. So repeat cycle.
+	} while ($ip1bin < $ip2bin);
 
-		// Check if the subnet ends at $endip and starts after $startip
-		if (ip_greater_than($targetsub_min, $startip) && ($targetsub_max == $endip)) {
-			break;
-		}
+	// subnets are ordered by bit size. Re sort by IP ("naturally") and convert back to IPv4/IPv6
 
-		// Check if the subnet is between $startip and $endip
-		if (ip_greater_than($targetsub_min, $startip) && ip_less_than($targetsub_max, $endip)) {
-			break;
-		}
-	}
+	ksort($rangesubnets, SORT_STRING);
+	$out = array();
 
-	// Some logic that will recursively search from $startip to the first IP before the start of the subnet we just found.
-	// NOTE: This may never be hit, the way the above algo turned out, but is left for completeness.
-	if ($startip != $targetsub_min) {
-		$rangesubnets = array_merge($rangesubnets, ip_range_to_subnet_array($startip, ip_before($targetsub_min)));
+	foreach ($rangesubnets as $ip => $netmask) {
+		if ($proto == 'ipv4') {
+			$i = str_split($ip, 8);
+			$out[] = implode('.', array( bindec($i[0]),bindec($i[1]),bindec($i[2]),bindec($i[3]))) . '/' . $netmask;
+		} else
+			$out[] = Net_IPv6::compress(Net_IPv6::_bin2Ip($ip)) . '/' . $netmask;
 	}
 
-	// Add in the subnet we found before, to preserve ordering
-	$rangesubnets[] = "{$targetsub_min}/{$cidr}";
-
-	// And some more logic that will search after the subnet we found to fill in to the end of the range.
-	if ($endip != $targetsub_max) {
-		$rangesubnets = array_merge($rangesubnets, ip_range_to_subnet_array(ip_after($targetsub_max), $endip));
-	}
-	return $rangesubnets;
+	return $out;
 }
 
 /* returns true if $range is a valid pair of IPv4 or IPv6 addresses separated by a "-"
author	stilez <stilez@users.noreply.github.com>	2015-12-03 13:22:18 +0000
committer	stilez <stilez@users.noreply.github.com>	2015-12-03 13:22:18 +0000
commit	ed516fa7a78d38301746a6442059800581ca6ce4 (patch)
tree	90ffd7ddd1354248cf5c96b4fde492dd9d5f574b /src/etc/inc/util.inc
parent	ae81c23b3790734db2553d706e2a8925ffcfbfed (diff)
download	pfsense-ed516fa7a78d38301746a6442059800581ca6ce4.zip pfsense-ed516fa7a78d38301746a6442059800581ca6ce4.tar.gz