Computers Windows Internet

Hone your cURL skills. CURL PHP: what is it and how to use it? Php curl get request examples

cURL is a special tool for transferring files and data using URL syntax. This technology supports many protocols such as HTTP, FTP, TELNET and many others. CURL was originally designed to be a command line tool. Luckily for us, the cURL library is supported by the PHP programming language. In this article, we will look at some of the advanced features of cURL, and also touch on the practical application of this knowledge using PHP tools.

Why cURL?

In fact, there are many alternative ways sampling the content of the web page. In many cases, mainly due to laziness, I have used simple PHP functions instead of cURL:

$ content = file_get_contents ("http://www.nettuts.com"); // or $ lines = file ("http://www.nettuts.com"); // or readfile ("http://www.nettuts.com");

However, these functions have virtually no flexibility and have a huge number of shortcomings in terms of error handling, etc. In addition, there are certain tasks that you simply cannot solve thanks to these standard functions: interacting with cookies, authentication, submitting a form, uploading files, etc.

cURL is a powerful library that supports many different protocols, options, and provides detailed information about URL requests.

Basic structure

  • Initialization
  • Assigning parameters
  • Executing and fetching the result
  • Freeing up memory

// 1.initialization $ ch = curl_init (); // 2. specify the parameters, including url curl_setopt ($ ch, CURLOPT_URL, "http://www.nettuts.com"); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ ch, CURLOPT_HEADER, 0); // 3. get HTML as result $ output = curl_exec ($ ch); // 4. close the connection curl_close ($ ch);

Step # 2 (that is, calling curl_setopt ()) will be discussed in this article much more than all the other steps, since at this stage, all the most interesting and useful things that you need to know happens. There are many different options in cURL that must be specified in order to be able to configure a URL request in the most careful way. We will not consider the entire list as a whole, but will focus only on what I think is necessary and useful for this lesson. You can study the rest yourself if this topic interests you.

Checking Errors

In addition, you can also use conditional statements to test for success:

// ... $ output = curl_exec ($ ch); if ($ output === FALSE) (echo "cURL Error:". curl_error ($ ch);) // ...

Here, please note a very important point for yourself: we must use “=== false” for comparison, instead of “== false”. For those who are not in the know, this will help us distinguish an empty result from a boolean false value, which will indicate an error.

Receiving the information

Another additional step is to get the cURL request data after it has been executed.

// ... curl_exec ($ ch); $ info = curl_getinfo ($ ch); echo "Took". $ info ["total_time"]. "seconds for url". $ info ["url"]; // ...

The returned array contains the following information:

  • "Url"
  • "Content_type"
  • "Http_code"
  • "Header_size"
  • “Request_size”
  • "Filetime"
  • Ssl_verify_result
  • "Redirect_count"
  • "Total_time"
  • "Namelookup_time"
  • “Connect_time”
  • "Pretransfer_time"
  • "Size_upload"
  • Size_download
  • “Speed_download”
  • “Speed_upload”
  • "Download_content_length"
  • "Upload_content_length"
  • "Starttransfer_time"
  • "Redirect_time"

Redirect detection based on browser

In this first example, we will write code that can detect URL redirects based on various browser settings. For example, some websites redirect the browsers of the cell phone, or any other device.

We're going to use the CURLOPT_HTTPHEADER option to determine our outgoing HTTP headers, including the user's browser name and available languages. Ultimately we will be able to determine which sites are redirecting us to different URLs.

// test the URL $ urls = array ("http://www.cnn.com", "http://www.mozilla.com", "http://www.facebook.com"); // testing browsers $ browsers = array ("standard" => array ("user_agent" => "Mozilla / 5.0 (Windows; U; Windows NT 6.1; en-US; rv: 1.9.1.6) Gecko / 20091201 Firefox / 3.5 .6 (.NET CLR 3.5.30729) "," language "=>" en-us, en; q = 0.5 ")," iphone "=> array (" user_agent "=>" Mozilla / 5.0 (iPhone; U ; CPU like Mac OS X; en) AppleWebKit / 420 + (KHTML, like Gecko) Version / 3.0 Mobile / 1A537a Safari / 419.3 "," language "=>" en ")," french "=> array (" user_agent " => "Mozilla / 4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB6; .NET CLR 2.0.50727)", "language" => "fr, fr-FR; q = 0.5")); foreach ($ urls as $ url) (echo "URL: $ url \ n"; foreach ($ browsers as $ test_name => $ browser) ($ ch = curl_init (); // specify url curl_setopt ($ ch, CURLOPT_URL, $ url); // set headers for the browser curl_setopt ($ ch, CURLOPT_HTTPHEADER, array ("User-Agent: ($ browser [" user_agent "])", "Accept-Language: ($ browser [" language "])" )); // we don't need the page content curl_setopt ($ ch, CURLOPT_NOBODY, 1); // we need to get the HTTP headers curl_setopt ($ ch, CURLOPT_HEADER, 1); // return results instead of output curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1); $ output = curl_exec ($ ch); curl_close ($ ch); // was there an HTTP redirect? If (preg_match ("! Location: (. *)!", $ Output, $ matches)) (echo " $ test_name: redirects to $ matches \ n ";) else (echo" $ test_name: no redirection \ n ";)) echo" \ n \ n ";)

First, we provide a list of site URLs that we will check. More precisely, we need the addresses of these sites. Next, we need to define browser settings to test each of these URLs. After that, we will use a loop in which we will go over all the results obtained.

The trick we use in this example to set the cURL settings will allow us to get not the page content, but only the HTTP headers (stored in $ output). Next, using a simple regex, we can determine if the "Location:" line was present in the received headers.

When you run given code, you should get something like the following:

Making a POST request to a specific URL

When forming a GET request, the transmitted data can be transferred to the URL through the "query string". For example, when you do a Google search, the search term appears in the address bar of the new URL:

Http://www.google.com/search?q=ruseller

To simulate given request, you don't need to use cURL facilities. If laziness gets you completely, use the function file_get_contents () in order to get the result.

But the point is that some HTML forms send POST requests. The data of these forms is transported through the body of the HTTP request, and not as in the previous case. For example, if you filled out a form on the forum and clicked on the search button, then most likely a POST request will be made:

Http://codeigniter.com/forums/do_search/

We can write PHP script which can mimic this kind of request url. First, let's create a simple file to accept and display POST data. Let's call it post_output.php:

Print_r ($ _ POST);

Then we create a PHP script to execute the cURL request:

$ url = "http: //localhost/post_output.php"; $ post_data = array ("foo" => "bar", "query" => "Nettuts", "action" => "Submit"); $ ch = curl_init (); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1); // indicate that we have a POST request curl_setopt ($ ch, CURLOPT_POST, 1); // add variables curl_setopt ($ ch, CURLOPT_POSTFIELDS, $ post_data); $ output = curl_exec ($ ch); curl_close ($ ch); echo $ output;

When you run this script, you should get a similar result:

Thus, the POST request was sent to the post_output.php script, which in turn produced the $ _POST superglobal array, the contents of which we retrieved using cURL.

File upload

First, let's create a file to generate and send to upload_output.php:

Print_r ($ _ FILES);

And here is the script code that performs the above functionality:

$ url = "http: //localhost/upload_output.php"; $ post_data = array ("foo" => "bar", // file to be uploaded "upload" => "@C: /wamp/www/test.zip"); $ ch = curl_init (); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ ch, CURLOPT_POST, 1); curl_setopt ($ ch, CURLOPT_POSTFIELDS, $ post_data); $ output = curl_exec ($ ch); curl_close ($ ch); echo $ output;

When you want to upload a file, all you have to do is pass it as a regular post variable, preceded by the @ symbol. When you run the script you wrote, you will get the following output:

Multiple cURL

One of the greatest strengths of cURL is the ability to create "multiple" cURL handlers. This allows you to open a connection to multiple URLs simultaneously and asynchronously.

In the classic version of the cURL request, the execution of the script is paused, and waiting for the completion of the request URL operation, after which the script can continue. If you intend to interact with a whole bunch of URLs, it will be quite time-consuming, since in the classic version you can only work with one URL at a time. However, we can remedy this situation by using special handlers.

Let's take a look at some example code I took from php.net:

// create some cURL resources $ ch1 = curl_init (); $ ch2 = curl_init (); // specify the URL and other parameters curl_setopt ($ ch1, CURLOPT_URL, "http://lxr.php.net/"); curl_setopt ($ ch1, CURLOPT_HEADER, 0); curl_setopt ($ ch2, CURLOPT_URL, "http://www.php.net/"); curl_setopt ($ ch2, CURLOPT_HEADER, 0); // create multiple cURL handler $ mh = curl_multi_init (); // add some handlers curl_multi_add_handle ($ mh, $ ch1); curl_multi_add_handle ($ mh, $ ch2); $ active = null; // execution do ($ mrc = curl_multi_exec ($ mh, $ active);) while ($ mrc == CURLM_CALL_MULTI_PERFORM); while ($ active && $ mrc == CURLM_OK) (if (curl_multi_select ($ mh)! = -1) (do ($ mrc = curl_multi_exec ($ mh, $ active);) while ($ mrc == CURLM_CALL_MULTI_PERFORM);) ) // closing curl_multi_remove_handle ($ mh, $ ch1); curl_multi_remove_handle ($ mh, $ ch2); curl_multi_close ($ mh);

The idea is that you can use multiple cURL handlers. Using a simple loop, you can keep track of which requests have not yet completed.

There are two main loops in this example. The first do-while loop calls the curl_multi_exec () function. This function is not blockable. It runs as fast as it can and returns the status of the request. As long as the returned value is the constant ‘CURLM_CALL_MULTI_PERFORM’, it means that the work has not been completed yet (for example, in this moment http headers are sent to the URL); This is why we keep checking this return value until we get a different result.

In the next loop we check the condition while the variable $ active = "true". It is the second parameter to the curl_multi_exec () function. The value of this variable will be "true" as long as any of the existing changes is active. Next, we call the curl_multi_select () function. Its execution is "blocked" as long as there is at least one active connection, until a response is received. When this happens, we return to the main loop to continue executing queries.

Now, let's apply the knowledge gained through an example that will be really useful for a large number of people.

Checking links in WordPress

Imagine a blog with a huge number of posts and posts, each of which has links to external Internet resources. Some of these links might already be "dead" for various reasons. The page may have been removed or the site may not work at all.

We're going to create a script that will analyze all links and find non-loading websites and 404 pages, and then provide us with a detailed report.

I must say right away that this is not an example of creating a plugin for WordPress. This is just a good testing ground for our tests.

Let's get started at last. First, we have to fetch all links from the database:

// configuration $ db_host = "localhost"; $ db_user = "root"; $ db_pass = ""; $ db_name = "wordpress"; $ excluded_domains = array ("localhost", "www.mydomain.com"); $ max_connections = 10; // initialize variables $ url_list = array (); $ working_urls = array (); $ dead_urls = array (); $ not_found_urls = array (); $ active = null; // connect to MySQL if (! mysql_connect ($ db_host, $ db_user, $ db_pass)) (die ("Could not connect:". mysql_error ());) if (! mysql_select_db ($ db_name)) (die ("Could not select db: ". mysql_error ());) // select all published posts with links $ q =" SELECT post_content FROM wp_posts WHERE post_content LIKE "% href =%" AND post_status = "publish" AND post_type = "post ""; $ r = mysql_query ($ q) or die (mysql_error ()); while ($ d = mysql_fetch_assoc ($ r)) (// fetch links using regular expressions if (preg_match_all ("! href = \" (. *?) \ "!", $ d ["post_content"], $ matches)) (foreach ($ matches as $ url) ($ tmp = parse_url ($ url); if (in_array ($ tmp ["host"], $ excluded_domains)) (continue;) $ url_list = $ url;)) ) // remove duplicates $ url_list = array_values ​​(array_unique ($ url_list)); if (! $ url_list) (die ("No URL to check");)

First, we generate configuration data for interacting with the database, then we write a list of domains that will not participate in the check ($ excluded_domains). We also define a number that characterizes the number of maximum simultaneous connections that we will use in our script ($ max_connections). We then attach to the database, select the posts that contain links, and store them into an array ($ url_list).

The following code is a little tricky, so understand it from start to finish:

// 1.multiple handler $ mh = curl_multi_init (); // 2. add a set of URLs for ($ i = 0; $ i< $max_connections; $i++) { add_url_to_multi_handle($mh, $url_list); } // 3. инициализация выполнения do { $mrc = curl_multi_exec($mh, $active); } while ($mrc == CURLM_CALL_MULTI_PERFORM); // 4. основной цикл while ($active && $mrc == CURLM_OK) { // 5. если всё прошло успешно if (curl_multi_select($mh) != -1) { // 6. делаем дело do { $mrc = curl_multi_exec($mh, $active); } while ($mrc == CURLM_CALL_MULTI_PERFORM); // 7. если есть инфа? if ($mhinfo = curl_multi_info_read($mh)) { // это значит, что запрос завершился // 8. извлекаем инфу $chinfo = curl_getinfo($mhinfo["handle"]); // 9. мёртвая ссылка? if (!$chinfo["http_code"]) { $dead_urls = $chinfo["url"]; // 10. 404? } else if ($chinfo["http_code"] == 404) { $not_found_urls = $chinfo["url"]; // 11. рабочая } else { $working_urls = $chinfo["url"]; } // 12. чистим за собой curl_multi_remove_handle($mh, $mhinfo["handle"]); // в случае зацикливания, закомментируйте данный вызов curl_close($mhinfo["handle"]); // 13. добавляем новый url и продолжаем работу if (add_url_to_multi_handle($mh, $url_list)) { do { $mrc = curl_multi_exec($mh, $active); } while ($mrc == CURLM_CALL_MULTI_PERFORM); } } } } // 14. завершение curl_multi_close($mh); echo "==Dead URLs==\n"; echo implode("\n",$dead_urls) . "\n\n"; echo "==404 URLs==\n"; echo implode("\n",$not_found_urls) . "\n\n"; echo "==Working URLs==\n"; echo implode("\n",$working_urls); function add_url_to_multi_handle($mh, $url_list) { static $index = 0; // если у нас есть ещё url, которые нужно достать if ($url_list[$index]) { // новый curl обработчик $ch = curl_init(); // указываем url curl_setopt($ch, CURLOPT_URL, $url_list[$index]); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_NOBODY, 1); curl_multi_add_handle($mh, $ch); // переходим на следующий url $index++; return true; } else { // добавление новых URL завершено return false; } }

Here I will try to put everything on the shelves. The numbers in the list correspond to the numbers in the comment.

  1. 1. Create a multiple handler;
  2. 2. We will write the add_url_to_multi_handle () function a little later. Each time it is called, it will start processing a new url. Initially, we add 10 ($ max_connections) URLs;
  3. 3. In order to get started, we must run the curl_multi_exec () function. As long as it returns CURLM_CALL_MULTI_PERFORM, we still have a lot to do. We need this mainly in order to create connections;
  4. 4. Next comes the main loop, which will run as long as we have at least one active connection;
  5. 5. curl_multi_select () hangs waiting for the URL search to complete;
  6. 6. Again, we have to make cURL do some work, namely fetch the data of the returned response;
  7. 7. This is where the information is checked. As a result of executing the request, an array will be returned;
  8. 8. The returned array contains a cURL handler. We will use it to fetch information about an individual cURL request;
  9. 9. If the link was dead, or the script execution time out, then we shouldn't look for any http code;
  10. 10. If the link returned us a 404 page, then the http code will contain the value 404;
  11. 11. Otherwise, we have a working link in front of us. (You can add additional checks for error code 500, etc ...);
  12. 12. Next, we remove the cURL handler because we no longer need it;
  13. 13. Now we can add another url and run everything that we talked about before;
  14. 14. At this step, the script finishes its work. We can delete everything that we do not need and generate a report;
  15. 15. Finally, we will write a function that will add the url to the handler. The static variable $ index will increment each time this function is called.

I used this script on my blog (with some broken links that I added on purpose in order to test it) and got the following result:

In my case, it took the script a little less than 2 seconds to go through 40 URLs. The performance gain is significant when dealing with even more URLs. If you open ten connections at the same time, the script can run ten times faster.

A few words about other useful cURL options

HTTP Authentication

If the URL has HTTP authentication, then you can easily use the following script:

$ url = "http://www.somesite.com/members/"; $ ch = curl_init (); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1); // specify the name and password curl_setopt ($ ch, CURLOPT_USERPWD, "myusername: mypassword"); // if redirection is allowed curl_setopt ($ ch, CURLOPT_FOLLOWLOCATION, 1); // then save our data to cURL curl_setopt ($ ch, CURLOPT_UNRESTRICTED_AUTH, 1); $ output = curl_exec ($ ch); curl_close ($ ch);

FTP upload

PHP also has a library for working with FTP, but nothing prevents you from using cURL tools:

// open the file $ file = fopen ("/ path / to / file", "r"); // url should contain the following content $ url = "ftp: // username: [email protected]: 21 / path / to / new / file "; $ ch = curl_init (); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ ch, CURLOPT_UPLOAD, 1 curl_setopt ($ ch, CURLOPT_INFILE, $ fp); curl_setopt ($ ch, CURLOPT_INFILESIZE, filesize ("/ path / to / file")); // specify ASCII mod curl_setopt ($ ch, CURLOPT_FTPASCII, 1); $ curl output = curl ($ ch); curl_close ($ ch);

Using Proxies

You can execute your url request through a proxy:

$ ch = curl_init (); curl_setopt ($ ch, CURLOPT_URL, "http://www.example.com"); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1); // specify the address curl_setopt ($ ch, CURLOPT_PROXY, "11.11.11.11:8080"); // if you need to provide a username and password curl_setopt ($ ch, CURLOPT_PROXYUSERPWD, "user: pass"); $ output = curl_exec ($ ch); curl_close ($ ch);

Callback functions

It is also possible to specify a function that will be triggered even before the cURL request completes. For example, while the content of the response is loading, you can start using the data without waiting for it to be fully loaded.

$ ch = curl_init (); curl_setopt ($ ch, CURLOPT_URL, "http://net.tutsplus.com"); curl_setopt ($ ch, CURLOPT_WRITEFUNCTION, "progress_function"); curl_exec ($ ch); curl_close ($ ch); function progress_function ($ ch, $ str) (echo $ str; return strlen ($ str);)

A function like this MUST return the length of a string, which is mandatory.

Conclusion

Today we got acquainted with how you can use the cURL library for your own selfish purposes. I hope you enjoyed this article.

Thanks! Have a good day!

This article assumes that you are familiar with networking basics and HTML.

Scripting is essential in building a good computer system. The extensibility of Unix systems with shell scripts and various programs that execute automated commands is one of the reasons why they are so successful.

The increasing number of applications that are moving to the web has made the topic of HTTP scripting more and more in demand. Important tasks in this area are automatic retrieval of information from the Internet, sending or downloading data to web servers, etc.

Curl is a command line tool that allows you to do all kinds of URL manipulations and transfers. This article focuses on making simple HTTP requests. It is assumed that you already know where to type

# curl --help

# curl --manual

for information on curl.

Curl is not a tool that will do everything for you. It creates requests, receives data, and sends data. You may need some kind of glue to put everything together, perhaps some scripting language (like bash) or a few manual calls.

1. HTTP protocol

HTTP is the protocol used when receiving data from web servers. It is a very simple protocol that is built on top of TCP / IP. The protocol also allows information to be sent to the server from a client using several methods, as will be shown below.

HTTP are strings of ASCII text sent from client to server to request an action. When a request is received, the server responds to the client with several service text lines, and then the actual content.

By using the curl -v switch, you can see what commands curl is sending to the server, as well as other informational text. The -v switch is perhaps the only way to debug or even understand the interaction between curl and the web server.

2. URL

The Uniform Resource Locator (URL) format specifies the address of a specific resource on the Internet. You probably know this, for example URLs: http://curl.haxx.se or https://yourbank.com.

3. Get (GET) page

The simplest and most common HTTP request is to get the content of the URL. The URL can link to a web page, image, or file. The client sends a GET request to the server and receives the requested document. If you run the command

# curl http://curl.haxx.se

you will get a web page displayed in your terminal window. The complete HTML document that is contained at this URL.

All HTTP responses contain a set of headers that are usually hidden. To see them along with the document itself, use the curl -i switch. You can also request only headers with the -I switch (which will make curl make a HEAD request).

4. Forms

Forms are the main way of presenting a website as an HTML page with fields in which the user enters data and then clicks on the "OK" or "Submit" button, after which the data is sent to the server. The server then uses the received data and decides how to proceed: search for information in the database, show the entered address on a map, add an error message, or use the information to authenticate the user. Of course, there is some program on the server side that accepts your data.

4.1 GET

The GET form uses GET method, for example as follows:

If you open this code in your browser, you will see a form with a text box and a button that says "OK". If you enter "1905" and click OK, the browser will create a new URL to follow. The URL will be represented as a string consisting of the path of the previous URL and a string like "junk.cgi? Birthyear = 1905 & press = OK".

For example, if the form was located at "www.hotmail.com/when/birth.html", then clicking OK will take you to the URL "www.hotmail.com/when/junk.cgi?birthyear=1905&press=OK" ...

Majority search engines work that way.

To make curl form a GET request, simply enter what is expected from the form:

# curl "www.hotmail.com/when/junk.cgi?birthyear=1905&press=OK"

4.2 POST

The GET method causes all entered information to be displayed in the address bar of your browser. This may be fine when you need to bookmark a page, but it is an obvious disadvantage when you enter sensitive information into form fields, or when the amount of information entered into the fields is too large (resulting in an unreadable URL).

The HTTP protocol provides a POST method. With it, the client sends data separately from the URL and therefore you will not see it in the address bar.

The form that generates the POST request is similar to the previous one:

Curl can form a POST request with the same data like this:

# curl -d "birthyear = 1905 & press =% 20OK% 20" www.hotmail.com/when/junk.cgi

This POST request uses "Content-Type application / x-www-form-urlencoded", which is the most widely used method.

The data you send to the server must be encoded correctly, curl won't do it for you. For example, if you want your data to contain a space, you need to replace that space with% 20, etc. Lack of attention to this issue is a common mistake, due to which data is transferred not as it should.

Back in 1995, an additional way to transfer data over HTTP was defined. It is documented in RFC 1867, which is why it is sometimes referred to as RFC1867-posting.

This method is mainly designed to better support file uploads. The form that allows the user to upload a file looks like this in HTML:

Note that the Content-Type is set to multipart / form-data.

To send data to such a form using curl, enter the command:

# curl -F [email protected]-F press = OK

4.4 Hidden fields

A common way to communicate state information in HTML applications is by using hidden fields in forms. Hidden fields are not populated, they are invisible to the user and are passed in the same way as regular fields.

A simple example of a form with one visible field, one hidden and an OK button:

To send a POST request using curl, you don't have to think about whether the field is hidden or not. For curl, they are all the same:

# curl -d "birthyear = 1905 & press = OK & person = daniel"

4.5 Find out what a POST request looks like

When you want to fill out a form and send data to the server using curl, you probably want the POST request to look exactly the same as the one executed using the browser.

An easy way to see your POST request is to save the HTML page with the form to disk, change the method to GET, and click the Submit button (you can also change the URL to which the data will be sent).

You will see the data appended to the URL, separated by "?" Characters, as expected when using GET forms.

5. PUT

Perhaps, The best way to upload data to an HTTP server, this is to use PUT. Again, this requires a program (script) on the back end that knows what to do and how to receive an HTTP PUT stream.

Send the file to the server using curl:

# curl -T uploadfile www.uploadhttp.com/receive.cgi

6. Authentication

Authentication - passing the username and password to the server, then it checks if you have the right to make the requested request. Basic authentication (which curl uses by default) is clear text, which means the username and password will not be encrypted, but only slightly obscured by Base64, which leaves it open to attackers on the way between you and HTTP server.

Telling curl to use a username and password:

# curl -u name: password www.secrets.com

The site may require the use of a different authentication method (see what the server writes in the headers), in these cases you can use the --ntlm, --digest, --negotiate, or even --anyauth keys. Sometimes access to external HTTP servers is through a proxy, as is often done in companies and firms. An HTTP proxy may require its own username and password to access the Internet. Corresponding curl key:

# curl -U proxyuser: proxypassword curl.haxx.se

If the proxy requires NTLM authentication, specify --proxy-ntlm, if the Digest method, then --proxy-digest.

If you do not specify a password in the -u and -U switches, then curl will ask you for it interactively.

Note that when curl is running, the startup line (along with keys and passwords) may be visible to other users on your system in the task list. There are ways to prevent this. More on this below.

7. Referer

An HTTP request may include a "referer" field that indicates from which URL the user came to this resource. Some programs / scripts check the "referer" field and do not execute the request if the user comes from an unknown page. Although this is a stupid way of checking, many scripts use it. With curl, you can put anything in the "referer" field and make it do what you want it to do.

This is done as follows:

# curl -e http://curl.haxx.se daniel.haxx.se

8. User Agent

All HTTP requests support the "User-Agent" field, which specifies the user's client application. Many web applications use this information to display the page in one way or another. Web programmers create multiple versions of a page for users of different browsers to improve appearance, using various scripts javascript, vbscript, etc.

Sometimes you may find that curl is returning a different page from what you saw in your browser. In this case, it is just appropriate to use the "User Agent" field in order to deceive the server once again.

Disguise curl as Internet Explorer on a Windows 2000 machine:

# curl -A "Mozilla / 4.0 (compatible; MSIE 5.01; Windows NT 5.0)"

Why not become Netscape 4.73 on a Linux box (PIII):

# curl -A "Mozilla / 4.73 (X11; U; Linux 2.2.15 i686)"

9. Redirects

Responding to your request, the server, instead of the page itself, can return an indication of where the browser should go next to get to the desired page. The header that tells the browser this redirect is "Location:".

By default, curl does not go to the location specified in "Location:", but simply displays the page as usual. But you can direct it as follows:

# curl -L www.sitethatredirects.com

If you are using curl for POST requests to a site that immediately redirects to another page, you can safely use the -L and -d / -F combination. Curl will form a POST request for the first page, and then a GET request for the next.

10. Cookies

With cookies, web browsers control client-side state. Cookie is a name with attached content. The server, by sending cookies, tells the client the path and hostname to which cookies should be sent next time, tells the cookie lifetime and some other parameters.

When a client connects to the server at the address specified in the accepted cookie, the client sends that cookie to the server (unless the lifetime has expired).

Many applications and servers use this technique to combine multiple requests into one logical session. For curl to perform this function as well, we need to be able to save and send cookies, just like browsers do.

The simplest way to send a cookie to the server when a page is received using curl is to add the appropriate switch on the command line:

# curl -b "name = Daniel" www.cookiesite.com

Cookies are sent as regular HTTP headers. This allows curl to save cookies while preserving the headers. Saving cookies using curl is done with the command:

# curl -D headers_and_cookies www.cookiesite.com

(by the way, to save cookies it is better to use the -c switch, more on that below).

Curl has a fully functional cookie handler, which is useful when you want to connect to the server again and use the cookies that were saved the last time (or worked up manually). To use the cookies stored in a file, call curl like so:

# curl -b stored_cookies_in_file www.cookiesite.com

The curl cookie engine is enabled when you specify the -b switch. If you want curl to only accept cookies, use -b with a file that does not exist. For example, if you want curl to accept cookies from a page and then follow the redirection (perhaps by returning the cookie you just accepted), you can call curl like this:

# curl -b nada -L www.cookiesite.com

Curl can read and write Netscape and Mozilla cookies. It is a convenient way to exchange cookies between browsers and automated scripts. The -b switch automatically detects whether this file cookie of the specified browsers and handles it appropriately, and by using the -c / - cookie-jar switch, you can force curl to write a new cookie when the operation completes:

# curl -b cookies.txt -c newcookies.txt www.cookiesite.com

11. HTTPS

There are several ways to secure your HTTP transfers. The most well-known protocol for this task is HTTPS, or HTTP over SSL. SSL encrypts all data sent and received over the network, which increases the likelihood of your information being kept secret.

Curl supports requests to HTTPS servers thanks to the free OpenSSL library. Requests happen the usual way:

# curl https://that.secure.server.com

11.1 Certificates

In the HTTPS world, you use certificates in addition to your username and password for authentication. Curl supports client side certificates. All certificates are locked with a passphrase that you need to enter before curl can start working with them. The passphrase can be specified either on the command line or entered interactively. Curl uses certificates like this:

# curl -E mycert.pem https://that.secure.server.com

Curl also verifies the identity of the server by checking the server's certificate against a locally stored one. If a mismatch is found, curl will refuse to connect. To ignore authentication, use the -k switch.

More detailed information about certificates can be found at http://curl.haxx.se/docs/sslcerts.html.

12. Arbitrary request headers

You may need to modify or add elements of individual curl requests.

For example, you can change the POST request to PROPFIND and send the data as "Content-Type: text / xml" (instead of the usual Content-Type):

# curl -d " "-H" Content-Type: text / xml "-X PROPFIND url.com

You can remove any header by specifying it without content. For example, you can remove the "Host:" header, thereby making the request "empty":

# curl -H "Host:" http://mysite.com

Also you can add titles. Your server may need a "Destination:" header:

# curl -H "Destination: http://moo.com/nowhere" http://url.com

13. Debugging

It often happens that a site responds to curl requests differently from browser requests. In this case, you need to resemble curl as much as possible to the browser:

  • Use the --trace-ascii switch to save a detailed report of the requests so that you can later examine them in detail and understand the problem.
  • Make sure you check for cookies and use them if necessary (read switch -b and save -c)
  • Enter one of the latest popular browsers in the "user-agent" field
  • Fill in the "referer" field as the browser does
  • If you are using POST requests, make sure that all fields are passed in the same order as the browser (see above, point 4.5)

A good helper in this difficult task is the LiveHTTPHeader plugin for Mozilla / Firefox, which allows you to view all the headers that this browser sends and receives (even when using HTTPS).

A lower-level approach is to capture HTTP traffic on the web using programs like ethereal or tcpdump and then analyze which headers were received and sent by the browser (HTTPS makes this approach ineffective).

RFC 2616 is a must-read for anyone wanting to understand the HTTP protocol.

RFC 2396 explains the syntax for URLs.

RFC 2109 defines how cookies work.

RFC 1867 defines the File Upload Post format.

http://openssl.planetmirror.com - Homepage OpenSSL project

http://curl.haxx.se - cURL project home page

The CURL (Client URLs) library allows you to transfer files to a remote computer using a variety of Internet protocols. It has a very flexible configuration and allows you to perform almost any remote request.

CURL supports HTTP, HTTPS, FTP, FTPS, DICT, TELNET, LDAP, FILE, and GOPHER protocols, as well as HTTP-post, HTTP-put, cookies, FTP uploads, continued interrupted file transfers, passwords, port numbers, certificates SSL, Kerberos and proxies.

Using CURL, a web server can act as a full client for any HTTP-based service, such as XML-RPC, SOAP, or WebDAV.

In general, using the library comes down to four steps:

  1. Creating a CURL resource using the curl_init function.
  2. Setting parameters using curl_setopt function.
  3. Executing a request using the curl_exec function.
  4. Freeing a CURL resource using the curl_close function.

A simple example of using CURL

// Initialize curl library
if ($ ch = @ curl_init ())
{
// Set the request url
@ curl_setopt ($ ch, CURLOPT_URL, "http://server.com/");
// If true, CURL includes headers in the output
@
// Where to put the query result:
// false - to the standard output stream,
// true - as the return value of the curl_exec function.
@
// Maximum waiting time in seconds
@
// Set the value of the User-agent field
@ curl_setopt ($ ch, CURLOPT_USERAGENT, "PHP Bot (http://blog.yousoft.ru)");
// Execute the request
$ data = @ curl_exec ($ ch);
// Display the received data
echo $ data;
// Specialization of the resource
@ curl_close ($ ch);
}
?>

An example of using a GET request

$ ch = curl_init ();
// GET request is specified in the URL string
curl_setopt ($ ch, CURLOPT_URL, "http://server.com/?s=CURL");
curl_setopt ($ ch, CURLOPT_HEADER, false);
curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt ($ ch, CURLOPT_CONNECTTIMEOUT, 30);

$ data = curl_exec ($ ch);
curl_close ($ ch);
?>

Sending a GET request is no different from receiving a page. It is important to note that the query string is formed as follows:

Http://server.com/index.php?name1=value1&name2=value2&name3=value3

where http://server.com/index.php is the page address, nameX is the name of the variable, valueX is the value of the variable.

An example of using a POST request

$ ch = curl_init ();
curl_setopt ($ ch, CURLOPT_URL, "http://server.com/index.php");
curl_setopt ($ ch, CURLOPT_HEADER, false);
curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, true);
// You need to explicitly indicate that there will be a POST request
curl_setopt ($ ch, CURLOPT_POST, true);
// Variable values ​​are passed here
curl_setopt ($ ch, CURLOPT_POSTFIELDS, "s = CURL");
curl_setopt ($ ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt ($ ch, CURLOPT_USERAGENT, "PHP Bot (http://mysite.ru)");
$ data = curl_exec ($ ch);
curl_close ($ ch);
?>

Sending a POST request is not much different from sending a GET request. All basic steps remain the same. Variables are also set in pairs: name1 = value1 & name2 = value2.

Example of HTTP Authorization

// HTTP authorization
$ url = "http://server.com/protected/";
$ ch = curl_init ();


curl_setopt ($ ch, CURLOPT_USERPWD, "myusername: mypassword");
$ result = curl_exec ($ ch);
curl_close ($ ch);
echo $ result;
?>

Sample FTP session

$ fp = fopen (__FILE__, "r");
$ url = "ftp: // username: [email protected]: 21 / path / to / newfile.php ";
$ ch = curl_init ();
curl_setopt ($ ch, CURLOPT_URL, $ url);
curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ ch, CURLOPT_UPLOAD, 1);
curl_setopt ($ ch, CURLOPT_INFILE, $ fp);
curl_setopt ($ ch, CURLOPT_FTPASCII, 1);
curl_setopt ($ ch, CURLOPT_INFILESIZE, filesize (__FILE__));
$ result = curl_exec ($ ch);
curl_close ($ ch);
?>

If you have problems using cURL, you need to add the following lines before calling curl_close to get a report on the last executed request:

print_r (curl_getinfo ($ ch));
echo "cURL error number:". curl_errno ($ ch). "
" ;
echo "cURL error:". curl_error ($ ch). "
" ;
curl_close ($ ch);
?>

Why we need PHP CURL?
To send HTTP GET requests, simply we can use file_get_contents () method.

File_get_contents ("http: // site")

But sending POST request and handling errors are not easy with file_get_contents ().

Sending HTTP requests is very simple with PHP CURL. You need to follow the four steps to send request.

step 1)... Initialize CURL session

$ ch = curl_init ();

step 2)... Provide options for the CURL session

Curl_setopt ($ ch, CURLOPT_URL, "http: // site"); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, true); // curl_setopt ($ ch, CURLOPT_HEADER, true); // if you want headers

CURLOPT_URL-> URL to fetch
CURLOPT_HEADER-> to include the header / not
CURLOPT_RETURNTRANSFER-> if it is set to true, data is returned as string instead of outputting it.

step 3). Execute the CURL session

$ output = curl_exec ($ ch);

step 4). Close the session

Curl_close ($ ch);

Note: You can check whether CURL enabled / not with the following code.

If (is_callable ("curl_init")) (echo "Enabled";) else (echo "Not enabled";)

1.PHP CURL GET Example

You can use the below code to send GET request.

Function httpGet ($ url) ($ ch = curl_init (); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, true); // curl_setopt ($ ch, CURLOPT_HEADER, false) curl_exec ($ ch); curl_close ($ ch); return $ output;) echo httpGet ("http: // site");

2.PHP CURL POST Example


You can use the below code to submit form using PHP CURL.

Function httpPost ($ url, $ params) ($ postData = ""; // create name value pairs seperated by & foreach ($ params as $ k => $ v) ($ postData. = $ K. "=". $ v. "&";) $ postData = rtrim ($ postData, "&"); $ ch = curl_init (); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, true); curl_setopt ( $ ch, CURLOPT_HEADER, false); curl_setopt ($ ch, CURLOPT_POST, count ($ postData)); curl_setopt ($ ch, CURLOPT_POSTFIELDS, $ postData); $ output = curl_exec ($ ch); curl_close ($ ch); return output;)

How to use the function:

$ params = array ("name" => "Ravishanker Kusuma", "age" => "32", "location" => "India"); echo httpPost ("http: //site/examples/php/curl-examples/post.php", $ params);

3.Send Random User-Agent in the Requests

You can use the below function to get Random User-Agent.

Function getRandomUserAgent () ($ userAgents = array ("Mozilla / 5.0 (Windows; U; Windows NT 5.1; en-GB; rv: 1.8.1.6) Gecko / 20070725 Firefox / 2.0.0.6", "Mozilla / 4.0 (compatible; MSIE 7.0; Windows NT 5.1) "," Mozilla / 4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30) "," Opera / 9.20 (Windows NT 6.0; U; en) "," Mozilla / 4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.50 "," Mozilla / 4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.1) Opera 7.02 "," Mozilla / 5.0 (Macintosh; U; PPC Mac OS X Mach-O; fr; rv: 1.7) Gecko / 20040624 Firefox / 0.9 "," Mozilla / 5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit / 48 (like Gecko) Safari / 48 "); $ random = rand (0, count ($ userAgents) -1); return $ userAgents [$ random];)

Using CURLOPT_USERAGENT, you can set User-Agent string.

Curl_setopt ($ ch, CURLOPT_USERAGENT, getRandomUserAgent ());

4.Handle redirects (HTTP 301,302)

To handle URL redirects, set CURLOPT_FOLLOWLOCATION to TRUE. Maximum number of redirects can be controlled using CURLOPT_MAXREDIRS.

Curl_setopt ($ ch, CURLOPT_FOLLOWLOCATION, TRUE); curl_setopt ($ ch, CURLOPT_MAXREDIRS, 2); // only 2 redirects

5.How to handle CURL errors

we can use curl_errno (), curl_error () methods, to get the last errors for the current session.
curl_error ($ ch)-> returns error as string
curl_errno ($ ch)-> returns error number
You can use the below code to handle errors.

Function httpGetWithErros ($ url) ($ ch = curl_init (); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, true); $ output = curl_exec ($ ch=); if ($ output = = false) (echo "Error Number:". curl_errno ($ ch). "
"; echo" Error String: ". curl_error ($ ch);) curl_close ($ ch); return $ output;)