Skip to main content
Practice

Project Planning and Design

To successfully execute a web crawling project, systematic planning and design are necessary.

In this process, you'll need to clarify the project's purpose, determine the type and amount of data required, and consider legal and ethical implications.


Clarifying the Purpose of Data Collection

  • Purpose: Define the main goal of the project and the necessity of data collection.

  • Expected Outcome: Describe the specific outcomes you aim to achieve with the collected data.

Criteria for Selecting Target Websites

  • Target Selection: Choose websites relevant to the data you intend to collect.

  • Criteria Setting: Specify the criteria to consider when selecting websites (e.g., richness of data, accessibility, legal constraints).


Data Collection Plan

Type and Amount of Data Needed

  • Data Type: Clearly define the kind and format of data to be collected.

  • Data Quantity: Estimate the amount of data needed to achieve the project's goals.

Setting Crawling Schedule and Frequency

  • Schedule Planning: Plan a schedule and frequency for data collection.

  • Consider Flexibility: Anticipate unexpected situations and consider flexibility in the plan.


Review Terms of Use of Target Websites

  • Terms of Use: Thoroughly review the terms of use of the target websites.

  • Legal Restrictions: Check for legal restrictions on data collection according to the websites' terms of use.

  • Copyright and Usage Rights: Understand the copyright and usage rights of the collected data.

  • Ethical Considerations: Establish ethical standards related to data collection and usage.


Practice

Click the Run Code button on the right side of the screen to check the crawling results or modify the code!

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.