How do you redirect the page with jsoup and continue to remove content in ListView?



  • There's a website where the main content is the list of text posts. Each post is covered by the HTML code.

    <div class="col-xs-12" style="margin:0.5em 0;line-height:1.785em">Some text</div>
    

    To dissolve the posts and take them to the ListView, I implemented this AsyncTask.

    class NewPostsAsyncTask extends AsyncTask<String, Void, String> {
    
    @Override
    protected void onPreExecute() {
        super.onPreExecute();
    
        progressDialog = new ProgressDialog(MainActivity.this);
        progressDialog.setTitle("Новые");
        progressDialog.setMessage("Загрузка...");
        progressDialog.setIndeterminate(false);
        progressDialog.show();
    }
    
    @Override
    protected String doInBackground(String... params) {
        Document doc;
    
        try {
            doc = Jsoup.connect(URL).get(); 
    
            content = doc.select("[style=margin:0.5em 0;line-height:1.785em]");
            titleList.clear();
    
            for (Element contents : content) {
                if (!contents.text().contains("18+")) {
                    titleList.add(contents.text());
                }
            }
    
        } catch (IOException e) {
            e.printStackTrace(); 
        }
    
        return null;
    }
    
    @Override
    protected void onPostExecute(String s) {
        super.onPostExecute(s);
        listView.setAdapter(adapter);
        progressDialog.dismiss();
    }
    

    }

    But I have one problem. All messages are not on the same page. Down on the page after all the posts, there's a page number block and links.
    Пример блока ссылок

    And this block is the HTML code.

    <div class="row"><div class="col-xs-12">
    <div class="paginator">

                &lt;span class="pagina"&gt;1683&lt;/span&gt; " | " 
    
                &lt;span class="pagina"&gt;&lt;a href="/page/1682"&gt;1682&lt;/a&gt;&lt;/span&gt; " | " 
    
                &lt;span class="pagina"&gt;&lt;a href="/page/1681"&gt;1681&lt;/a&gt;&lt;/span&gt; " | " 
    
                &lt;span class="pagina"&gt;&lt;a href="/page/1680"&gt;1680&lt;/a&gt;&lt;/span&gt; " | " 
    
                &lt;span class="pagina"&gt;&lt;a href="/page/1679"&gt;1679&lt;/a&gt;&lt;/span&gt; " | " 
    
                &lt;span class="pagina"&gt;&lt;a href="/page/3"&gt;3&lt;/a&gt;&lt;/span&gt; " | "
    
                &lt;span class="pagina"&gt;&lt;a href="/page/2"&gt;2&lt;/a&gt;&lt;/span&gt; " | " 
    
                &lt;span class="pagina"&gt;&lt;a href="/page/1"&gt;1&lt;/a&gt;&lt;/span&gt;
    
        &lt;/div&gt;
    &lt;/div&gt;
    

    </div>

    How am I supposed to go on another page, take out other posts and take them out to ListView after previous posts? As a result, I want to have all the posts on this website in one ListView. Can you show me how to do this?



  • If url Maintains on the main page of the site (i.e. not on the page):

    Make a method that takes Document JSoup and inserts appropriate lines in ListView (what you've already done, titleList.clear() Don't put it in. Let her. parseDocument(Document doc)

    1. Got it. Document for the first page
    2. Process the page with the challenge. parseDocument
    3. Get the page number of the type selter. div.paginator > span:first-childTurn it into number. int pageCount)
    4. In cycle from pageCount - 1 before 1 And on the page, you'll get it. Document and call. parseDocument

    There may be a problem with the following of page lines from previously processed pages when new articles are added to the site. Maybe we should check the list's tail.

    Page number supplement:

    element.toString() reset the element along with the tags, element.text() - contents inside the gate. To get the page number, do it:

    // выбрать всех первых детей элемента div с классом paginator
    Elements pageSpan = doc.select( "div.paginator > span:first-child" );
    // взять текст из первого найденного элемента
    int pageCount = Integer.parseInt( pageSpan.first().text() );
    

    It's still worth checking that the elements were found, challenged. !pageSpan.isEmpty()because first() for an empty list is returning null




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2