Inside the PHP

Introduction

  1. Ever wonder what’s inside the PHP. How every language construct, function works.
  2. How optional parameters handled. Why certain functions behave weird and want know their inner working.
  3. That’s why I bring this tutorial to how easily navigate to the function declaration and understand it easily.
  4. Since PHP is opensource you can see all source code easily in github (clone of official git repository) but quickly finding the desired function in github is cumbersome but don’t worry we will find some alternative solution in this tutorial
  5. The PHP compiler/interpreter which is written in C and C++ which uses Lexical analyser, Yacc (Yet another compiler compiler), configurator to identify computer hardware and operating system finally virtual machine such as zend virtual machine shortly ZVM which runs our code

Lets Explore Directory Structure and Pattern

  1. Most of the PHP source which is in C language looks very similar to PHP so understanding the code won’t be difficult.
  2. If the official documentation is more abstract you can get this procedure as a handy tool to understand the function very well.
  3. To explore the PHP source we are going to use Adam Harvey‘s PHP source browser. Lexer link: https://php-lxr.adamharvey.name/source/
  4. The PHP lexer’s home page have a search form in the left and source selection box in the right for simplification going to select php-7.3 in selection box.
  5. In selection box please double click the php-7.3 to goto the source listing.
  6. In the directory listing you will find some important directory like ext which contains core functions with the function’s name as sub-directories name, main comprises memory allocation code, directory scanning code, etc., Zend is the zend engine code which contains compiler, language features and VM (Virtual Machine).
  7. As you explored the ext and Zend directory it’s clear that in official documentation items which are found under “Language Reference” are related to zend directory and “Function Reference” are related to ext directory.
  8. Let’s we go into the Zend directory’s file zend_builtin_functions.c please click here which will redirect you to function lists in that c file.
  9. As you note down the code, you will see list of very familiar functions like zend_version, func_num_args, strlen, property_exists, class_exists, etc.,
  10. To dive deep into the internal code understanding process we chosen the string function explode.
  11. Here is the explode documentation link: explode()

Explode() overview

  1. The function explode supports PHP 4, 5, 7 (Newbie Hint: There is no PHP 6 because core team planned to release PHP 6 long back with unicode support but not released. So to avoid confusion and marketing advantage PHP 7 is released after PHP 5).
  2. Explode function has two required parameters and one optional parameter totally three parameters.
  3. It return either boolean false or Array.
  4. From version 5.1.0 negative limit also supported.
  5. For more details about this function please review official document which is linked in last point of the previous section.
  6. Let’s we move to the internal working of this function.

Search using LXR utility

  1. Go to the lexer home page. Link: https://php-lxr.adamharvey.name/source/
  2. In HTML form’s first field (Field Label: Full Search) type explode and click the button Search.
  3. It will show many matched results to narrow down we have to change the search term to "PHP_FUNCTION explode". Note: must enclose search term with double quotes.
  4. The search result lists the two files php_string.h and string.c
  5. The file php_string.h is similar to interface/ abstract class for a complete class. The file extension h denotes it’s a header file for main c file.
  6. Lets move on to our main objective file string.c by clicking the line number in the search result it will redirect us to the explode function’s declaration. Link: Line no: 1155.

Explode() internal function call chain

  1. Explode function declaration starts with PHP_FUNCTION(explode) on line 1155.
  2. The declaration is enclosed inside between the markers {{{ and }}}. Note: Scroll up to see the marker.
  3. We can easily identify where the function declaration starts and ends using this markers.
  4. Here the PHP_FUNCTION is not a c inbuilt syntax it’s a C macro.
  5. The macro always start with #define
  6. So we will search in the lxr home page with search term "#define PHP_FUNCTION" or directly click here
  7. Once searched, the search list have the file name php.h in that click the line number 409
  8. In redirected source code page you will find that #define PHP_FUNCTION is points to other macro ZEND_FUNCTION
  9. Even though the hyperlink on ZEND_FUNCTION redirects you to the search form with filtered list it shows more records
  10. We will use alternative search term to filter more accurately for our need
  11. Once we used "#define ZEND_FUNCTION" as search term the utility page will list a file: zend_API.h in that list click the line number 64 or click here to redirect directly
  12. click here to redirect to lxr utility with search term
  13. The zend_API.h file’s line number 64 contains the following code
    #define ZEND_FUNCTION(name) ZEND_NAMED_FUNCTION(ZEND_FN(name))
  14. As you noted the 64th line have another two macros ZEND_NAMED_FUNCTION and ZEND_FN
  15. Lets we see inner function ZEND_FN
  16. In lxr search form add this search term "#define ZEND_FN" in full search field and click the search button or click here to go directly
  17. In result list you will find a file name zend_API.h with line no 61, please click the line number or click here to redirect
  18. Which have the following code #define ZEND_FN(name) zif_##name (in C the operator ## is for concatenation and it is called as token concatenation operator so it will return the concated string zif_explode)
  19. The ZEND_FN is passed as argument to the macro function ZEND_NAMED_FUNCTION
  20. In lxr search form add this search term "#define ZEND_NAMED_FUNCTION" in full search field and click the search button or click here to go directly
  21. It contains the following code #define ZEND_NAMED_FUNCTION(name) void ZEND_FASTCALL name(INTERNAL_FUNCTION_PARAMETERS)
  22. The function ZEND_FASTCALL will be interpreted as void ZEND_FASTCALL zif_explode(INTERNAL_FUNCTION_PARAMETERS)
  23. If we search for "#define ZEND_FASTCALL" it shows a list of search results in that click the first one’s line number or click here to go directly
  24. After redirected scroll up little and you will see the full block of C preprocessor conditional block
  25. This block is set of conditional call to choose the type of compiler as we know for windows, linux and other supported OS the compiler will be different
  26. Lets we arrange functions chain as a sequence block for easy mind map have a look on it
    PHP_FUNCTION(explode)ZEND_FUNCTION(explode)ZEND_NAMED_FUNCTION(ZEND_FN(explode))ZEND_NAMED_FUNCTION((zif_##explode))
  27. Readers thanks for your patience to follow the flow
  28. The reason to see the above steps before your eagerly expected internal flow of explode or name any core function is to understand that the PHP execute the core function in some better optimised way than the standard userland functions
  29. So our explode function starts the journey from PHP_FUNCTION(explode) to optimised call zif_explode

Explode() function definition in C language

  1. As in previous section the explode code starts at the line number 1153 with the markup {{{ and ends at the line number 1192 with the markup }}}
  2. Line 1155 : is the macro call and that call itself has a separate section which is explained in the previous section
  3. The lines 11571159: are the C local variables and some of the variables are used for arguments we passed in explode function’s parameters
  4. The lines 11611166: are used to copy the runtime input parameters to the local variables
    1. Line 1161: is a macro call ZEND_PARSE_PARAMETERS_START(2, 3) this function is used to initiate the process of copying runtime value into the local variable which defined just above this macro call
    2. This function has two parameters the first one for number of mandatory parameters and the second one is total number of parameters
    3. Line 1162: is macro function Z_PARAM_STR(delim) which accepts string value
    4. This macro copy the delimiter string value passed as first parameter in the explode function to the local variable delim
    5. Line 1163: Z_PARAM_STR(str) copy the string value passed as second parameter in the explode function to the local variable str
    6. Line 1164: Z_PARAM_OPTIONAL is used indicate from here to end of parse parameter block all the variables are optional one
    7. Line 1165: Z_PARAM_LONG(limit) is a function used to copy the PHP’s int datatype value to the C’s long datatype variable
    8. explode function’s third parameter $limit which is optional whose value is copied to the C variable limit
    9. If not passed then ZEND_LONG_MAX macro’s constant will be set, it’s based on 32 bit or 64 bit system the value’s size may vary
    10. Line 1166: ZEND_PARSE_PARAMETERS_END(); denotes the end of the parameters parsing and copying to local C variable and this is end of this block, note in this block this line only ends with semicolon to denote the block’s completion
  5. The lines 11681171 is a block which check delim is a empty string by checking the length and throws error
    1. Line 1168: if (ZSTR_LEN(delim) == 0) uses the macro function ZSTR_LEN which counts the character in the string and returns it’s length which compares with integer 0
    2. If boolean comparison is true, then code execution move into the block else if false then it skips the block completely
    3. Line 1169: php_error_docref(NULL, E_WARNING, "Empty delimiter"); is a error throwing macro function which throw warning error
    4. The first parameter of function php_error_docref is char datatype with NULL as a value and the remaining two parameters are self explanatory
    5. Line 1170: is a macro which compare it to constant macro value 2 which is for zend false type IS_FALSE you can find it by going upto the beginning of this macro’s call by clicking and follow the hyperlink from RETURN_FALSE which redirects it to ZVAL_FALSE and go on
  6. Line 1173: array_init(return_value) is an array API function which initialize a hash table for an array, additional info: internally PHP array itself a hash map (If possible in future will put a tutorial on Array’s internal!)
  7. The lines 11751181 is a block which is used to free up zval when the source string which need to be explode but passed as empty string and execute return to return result from the PHP_FUNCTION macro function
    1. Line 1175: if (ZSTR_LEN(str) == 0) checks if the passed string (to explode) length is equal to zero
    2. Line 1176: if (limit >= 0) checks the optional third parameter of the explode function (in C it’s local variable is limit) is greater than or equal to zero
    3. If limit variable’s boolean comparison is true then calls the inner block
    4. Line 1177: ZVAL_EMPTY_STRING(&tmp); which empties the tmp zval
    5. Line 1178: zend_hash_index_add_new(Z_ARRVAL_P(return_value), 0, &tmp); first using Z_ARRVAL_P is used to fetching the value of array type and passed it to the function zend_hash_index_add_new which is hash table API wrapper around the function _zend_hash_index_add_or_update_i for reference document doxygen document
    6. 1180: is a simple return to return empty array
  8. The lines 11831190 is a decision logic block based on the limit value it will split the string from left to right upto the limit lastly it have the whole left out string in last index or right to left upto the limit and remaining string will be neglected and the final else part is to handle limit of 0 or 1 which return the complete string as an array with whole string in first index
    1. Line 1183: if (limit > 1) which is straightforward boolean logic which checks is local variable limit is greater than integer 1
    2. Line 1184: php_explode(delim, str, return_value, limit); is the PHPAPI macro call with all parameters easy to understand except the third one return_value which is zval* provided by the macro function PHP_FUNCTION
    3. Line 1185: else if (limit < 0) which is straightforward boolean logic which checks is local variable limit is lesser than integer 0 i.e. negative number
    4. Line 1186: php_explode_negative_limit(delim, str, return_value, limit); is the PHPAPI macro call with all parameters same as php_explode but this API is for negative limit
    5. Line 1187: is the last block of this if-else ladder
    6. Line 1188: ZVAL_STR_COPY(&tmp, str); is a macro which is used to copy the string in the local variable str to zval generic pointer type local variable tmp
    7. Line 1189: is similar to the point 7.5 of this section

Conclusion

  1. Now you are ready to explore any PHP function easily by applying the same procedure we discussed
  2. You learned the directory structure of the PHP source
  3. You learned how to use lxr utility
  4. You learned how PHP_FUNCTION macro is working